Context Engineering

What is Context?
- Context refers to all information a Large Language Model (LLM) processes before generating a response. It shapes the model’s understanding and output quality, acting as the foundation for effective task performance.
Components of Context
- System Prompt:
- Instructions guiding the LLM on its behavior, including how to respond and restrictions to follow.
- User Prompt:
- The specific query or instruction provided by the user.
- Conversation History (Short-Term Memory):
- The ongoing dialogue between the user and the LLM within a session.
- Long-Term Memory:
- Accumulated data from multiple conversations, including user preferences or historical interactions.
- Retrieval-Augmented Generation (RAG):
- Up-to-date, external, relevant information fetched to enhance response accuracy.
- Tool Definitions:
- Descriptions of tools available to the LLM, specifying their functionality and usage.
- Output Schema Definition:
- Specifications for the format of the LLM’s output (e.g., JSON, text).
Importance of Context
- The quality of an LLM’s response is directly tied to the quality of its context. A rich context enables more accurate and relevant outputs, while poor or absent context leads to suboptimal responses.
- Example: AI Personal Assistant
- A context-rich assistant can:
- Access calendar data to check availability for scheduling.
- Review past email history to match communication tone with a recipient.
- Use an email-sending tool to execute actions. In contrast, an assistant with minimal context may misinterpret queries or provide generic, less useful responses.
Context Engineering vs. Prompt Engineering
- Prompt Engineering:
- Focuses on crafting a single, precise instruction set within a text prompt.
- Static approach, emphasizing clarity and specificity in one input.
- Context Engineering:
- Involves designing a dynamic system that curates and delivers relevant information to the LLM for task completion.
- Operates as a preprocessing layer before the LLM call, tailoring context dynamically based on the query.
- Example: For one request, the system might fetch calendar data; for another, it might perform a web search.
Challenges with Long Contexts
- While comprehensive context improves responses, excessively long contexts (e.g., millions of tokens, multiple tools, or extensive documents) can degrade performance. Issues include:
- Distraction: Irrelevant information diverts the LLM’s focus.
- Confusion: Overloaded context overwhelms the model, reducing clarity.
- Context Clashes: Contradictory information within the context leads to inconsistent outputs.
- Source: How Long Contexts Fail | Drew Breunig
Context Management Tactics
-
Effective context management follows the principle of “garbage in, garbage out”: high-quality, relevant context yields accurate responses. Below are key tactics to optimize context, inspired by How to Fix Your Context | Drew Breunig.
-
Retrieval-Augmented Generation (RAG)
- Description: Selectively fetches relevant, up-to-date external information to enrich the LLM’s context.
- Benefit: Enhances response accuracy without overloading the model.
- Example: For a query about recent news, RAG retrieves current articles instead of relying on static knowledge.
-
Tool Loadout Selection
- Description: Chooses only relevant tool definitions to include in the context, avoiding unnecessary complexity.
- Approaches:
- RAG for Tools: Use a retrieval system to match tools to the user’s query (e.g., selecting a calendar tool for scheduling queries).
- LLM-Powered Recommender: Prompt the LLM to identify which tools are needed for a task.
- Consideration: The number of tools impacts performance:
- Some models handle 20–30 tools effectively but struggle with hundreds.
- Example: For a query about scheduling, include only the calendar tool definition, not irrelevant tools like web search.
-
Context Quarantine
- Description: Isolates different contexts in separate processing threads, each with specific tools or data.
- Benefit: Prevents cross-contamination of contexts and reduces the risk of conflicting instructions.
- Example: Break a complex query into parallel tasks (e.g., one thread handles calendar data, another handles email drafting).
-
Context Pruning
- Description: Removes irrelevant or redundant information from the context to maintain focus.
- Benefit: Reduces noise, improving model efficiency and response accuracy.
- Example: Exclude outdated conversation history unrelated to the current query.
-
Context Summarization
- Description: Condenses large contexts (e.g., long conversation histories) into concise summaries.
- Benefit: Maintains essential information while reducing token count and complexity.
- Example: Summarize a multi-turn conversation into key points for the LLM to reference.
-
Context Offloading
- The "think" tool: Enabling Claude to stop and think \ Anthropic
- Description: Stores information externally via tools, retrieving it only when needed, rather than including it in the LLM’s context.
- Benefit: Reduces context size, especially for data not immediately relevant to the query.
- Example: Use a “think” tool to store intermediate data, as described in The "think" tool: Enabling Claude to stop and think | Anthropic.
- Use Case: When the model requires external data (e.g., database results) to formulate a response, offload processing to a tool that manages the data.