Context Engineering

Context engineering is about dynamically managing the information in an LLM’s context window, ensuring it has the most relevant information for the task it’s performing.

By default, LLMs are stateless — the only information they have access to is the information provided in the context. So if we want to make stateful AI tools, we need to use context engineering.

Context engineering ensures the model gets everything it needs, optimized for the current task. This can include:

Tool definitions
Few-shot examples to guide reasoning
External data (pulling relevant info from memory or data stores)
Information accessed via Retrieval Augmented Generation
Inputs from tools the model has called
Conversation history

Context Management

The diagram below (taken from this whitepaper) shows the flow of context management for agents.

Context Rot

Context rot refers to the phenomenon when the context window is too full, which impairs the ability of the tool to achieve the task at hand and degrades performance. One of the goals of context engineering, then, is to strike the balance between providing not enough information in the context and too much.

Some straightforward (but crude) methods for preventing context rot include using only the n newest tokens in the conversation or using only the k most recent conversational turns.

A more complex (but also probably better) approach is using recursive summarization, where the LLM summarizes the conversation to reduce the context. If we go this route, we have to do this asynchronously in the background, potentially using a smaller/less capable LLM (since summarization tends to be an easier task).

Sessions

A session is essentially a container for a conversation with an LLM. It’s the immediate stuff that an LLM has access to.

Sessions are specific to a given user and hold the context for a given conversation.

Sessions typically keep track of two different things:

Events, which are distinct things that have happened (e.g. called Tool A, sent user response X, etc.)
State, which is the essentially a scratchpad it can use to store and update details during the conversation.

See the Agent Session Management with ADK notes for a worked example managing a session with Google Agent Developer Kit.

Memory

Memory is how you capture a process and consolidate memories across multiple sessions. Memory provides capabilities that sessions alone can’t:

Capability	What It Means	Example
Cross-Conversation Recall	Access information from any past conversation	”What preferences has this user mentioned across all chats?”
Intelligent Extraction	LLM-powered consolidation extracts key facts	Stores “allergic to peanuts” instead of 50 raw messages
Semantic Search	Meaning-based retrieval, not just keyword matching	Query “preferred hue” matches “favorite color is blue”
Persistent Storage	Survives application restarts	Build knowledge that grows over time

When storing data in session logs, we typically need to remove PII or other sensitive data (for compliance reasons).

Just like people, we might have both declarative memory and procedural memory.

Practically, memories are stored in a combination of Vector Databases and knowledge graphs.

Memory Generation

Memory generation is an ETL (extract, transform, load) pipeline with a few extra considerations.

Extract. Figure out what information is most meaningful (potentially for a given person) and then extract that.
Consolidate. Figure out if we need to create a new memory, update an old memory, forget a previous memory based on new information, etc. There’s a lot that goes into this that I’m not getting into here.
Transform.
Load.

Memory generation is a compute-intensive process, so it has to happen in the background as an async process.

Memory Extraction

Finding the most semantically similar memory isn’t always the best approach. Instead, when determining which memories to retrieve, we might want to blend scores across 3 factors:

Relevance
Recency
Importance

Memories can be retrieved either proactively or reactively. In the proactive approach, we fetch potentially relevant memories at the start of each conversational turn. The advantage is that memories are then always available, but one disadvantage is that it potentially increases latency if we fetch memories we don’t end up needing.

In the reactive approach, we might think of memory as a tool that an agent can call if it decides it’s necessary. This can reduce latency, but it requires a smarter agent and potentially introduces an additional LLM call.

Additional Links

Google Whitepaper on Context Engineering

Brain

Explorer

Context Engineering

Context Management

Context Rot

Sessions

Memory

Memory Generation

Memory Extraction

Additional Links

Kaggle Notebooks

Graph View

Table of Contents

Backlinks

Recent Notes

Agent Memory with ADK

Agent Session Management with ADK

Context Engineering

Custom AI Agent Tools

Model Context Protocol (MCP)