Innovation
Context engineering is all you need (to build agents)
Yeah, I know, cheesy title. It’s a nod to the Transformer paper that kicked off this whole agentic era, and no, context engineering alone won’t rescue a poorly scoped product. But after building a dozen or so agents over the past two years, I’ve learned that when things break, it's almost never the model’s fault…it’s the context (Dun dun duuun!). That view isn’t unique to me; you’ll hear the same from teams at LangChain, Anthropic, Cognition, and basically anyone shipping production agents today.
The agent ecosystem is starting to mature. We’ve moved beyond just slapping a chatbot UI on top of an LLM and calling it a product. In building across a bunch of different domains, I’ve seen the same patterns show up over and over: RAG, prompt engineering, tool calling, control flow logic.
At some point, you realize that behind the hype and fancy terms, this isn’t just "like" systems design. It is systems design. It’s software engineering, just with LLMs as a new kind of runtime component. Agents simply call APIs in non-deterministic ways. That’s the core of it.
So what actually makes or breaks an agent? What determines whether it completes its task or spirals into garbage? It’s how well you engineer the context. Not just clever prompts, but the full information state you construct for the model to reason over.
That’s what this post is about: why context engineering is the foundation of agentic design, and how to start treating it like the software discipline it really is.
What is Context Engineering?
We all love labeling things. And once we do, it gives us a way to start associating ideas with them. Without labels, concepts just kind of float around. And that’s pretty much what’s been happening in this evolving space.
If you’ve been building agents, chances are you’ve reached a point where context engineering became necessary. Think about something as simple as a QA chatbot powered by an LLM. What happens when your conversation exceeds the context window? Suddenly, you're faced with the question: what’s important enough to keep? What needs to stay in the context for the agent to continue being useful?
For a long time, we didn’t even have a proper name for this. That changed when Shopify CEO Tobi Lütke mentioned he preferred the term context engineering over prompt engineering two weeks ago. He felt it better captured the actual skill: the art of giving the model everything it needs to plausibly solve the task at hand.
That comment sparked a broader discussion. Andrei Karpathy, a major voice in the agentic space and the one behind terms like LLM OS and vibe coding, offered what might be the best definition yet:
"Context engineering is the delicate art and science of filling the context window with just the right information for the next step."
At the end of the day, context engineering is what makes or breaks an agent. It's not just about clever prompts, it’s about shaping the environment so that intelligence has a chance to emerge.
How is Context Engineering different from prompt engineering?
While the terms may sound similar, prompt engineering can be seen as a subset of the broader practice of context engineering.
Prompt Engineering involves crafting and structuring the input (or prompt) given to a language model to achieve a specific response. It focuses on aspects like:
- The wording and clarity of instructions
- Input/output formatting
- Use of examples (e.g., few-shot prompting)
- System and user role definitions
- Desired tone, style, and constraints
However, Context Engineering goes beyond prompt construction. It refers to the holistic design of the environment and information surrounding the model’s interaction. This includes:
- Retrieval-Augmented Generation (RAG)
- Long-term memory management
- Tool selection and orchestration
- And, of course, prompt engineering itself
In short, prompt engineering is just one component within the larger framework of context engineering.

A good example Of Context Engineering
Some of the most interesting examples of context engineering are the ones that don’t rely on the usual suspects like RAG or prompt engineering. Instead, let’s talk about something more fundamental: compression. What happens when we hit the context limit in a long, multi-turn conversation.
Imagine an agent and a human have been going back and forth for a while. Eventually, you’ll run out of space. So what can you do?
One option is to truncate: drop the oldest messages and keep only the most recent ones. That works, but you risk losing long-term context that might still be relevant to the task.
Another approach is to offload that history into an embedding store and semantically retrieve important bits later. That gives your agent a kind of working memory. But again, you’re limited to whatever the retrieval surface captures, you might miss useful context.
A third option: summarize. Condense the conversation history and carry on with a smaller, compressed version of what happened so far.
That’s exactly what Claude does. Anthropic designed their Claude Code agent with a context compression mechanism called auto-compact. When the context window hits about 95% of its configured limit, a secondary LLM kicks in to summarize the history on the fly. It’s a smart, simple, production-ready example of context engineering in action.
Core patterns of Context Engineering
Now that we’ve walked through a concrete example, it’s easier to recognize where we’ve been doing context engineering all along, even if we didn’t call it that. And once you start looking through that lens, some clear patterns emerge.
The team at LangChain actually outlined a helpful framework for this, breaking context engineering into four key categories. They all work together, and each plays a specific role:
Write
This is all about memory, scratchpads, and state. Places where your agent can store information outside of the immediate context window. Think of it like long-term memory: not everything has to stay in the prompt, just be available when needed.
Select
Selection is the art of choosing what actually goes into the context. This often involves some form of RAG. For example, say your agent has access to a huge toolkit, rather than dumping everything in, you can use semantic filtering to narrow down the relevant tools based on the task at hand. This reduces noise and keeps the model focused.
Compress
We covered this in our earlier example, compressing is all about shrinking the context while preserving meaning. Summarization, trimming, and abstraction all live here. The goal is to maintain continuity without exceeding token limits.
Isolate
Isolation is about scoping and separation of concerns. It’s especially relevant in multi-agent setups, where each agent specializes in a specific domain. By isolating context, you keep things clean, focused, and easier to reason about.
Having these patterns laid out makes it easier to talk about context engineering in a shared language, and more importantly, to apply it to real problems.
Personally, I’m especially interested in how these techniques can help build resilient agents, ones that can recover from errors or gracefully handle missing information (like in corrective RAG), all without polluting the context with failed attempts. There’s a ton of room here for creativity and clever design.
When should I invest In Context Engineering?
Like any architecture decision, it’s a big old “it depends”. For single-turn chatbots, prompt templates suffice. But the moment you need the following, context engineering repays the effort:
- Multi-step plans (≥10 turns)
- Tool selection among dozens of APIs
- Personalised memory across sessions
- Cost/latency control at scale
The tricky part is measurement: token counts and latency are easy, but holistic evals of task success remain fuzzy, often a “vibe-check,” to borrow Karpathy’s other meme.
Keep in mind that context issues aren’t always obvious. Sometimes it's not about size, it’s about coherence. You might run into context clashes, where conflicting bits of information confuse the model, especially in long-running processes where the state evolves over time. What was true at step one might no longer apply at step 100. Or maybe your context has been subtly poisoned, a small hallucination sneaks in early, gets baked into the prompt, and quietly derails everything downstream. These failures don’t always show up in logs or metrics, they just feel off. And that’s exactly why context engineering matters.
A practical checklist
I’ll end this off with a bit of a practical checklist for context engineering. A little reminder when building agents that might require some CE:
- Define the contract – What must be in context each turn?
- Automate selection – Use embeddings or rules to pull only what’s relevant.
- Set compaction thresholds – Summarise aggressively before you hit limits.
- Log everything – Store raw, selected, and compressed context variants for analysis.
- Version prompts & memories – Treat them like code; review and roll back.
Conclusion
Some people might be tired of all the new terminology, but personally, I love it. Naming things helps crystallize ideas. It gives us a shared language and pushes the whole ecosystem forward. It’s how innovation accelerates.
It’s a great time to be a developer, and an even better time to be building agents.
None of these concepts are entirely new, but what’s exciting is that we now have a growing domain to anchor them in. We’re not just hacking prompts anymore, the space has matured and we’re designing intelligent systems now.
Curious about agentic systems or pushing LLMs further? Let’s connect. At Osedea, we help turn ambitious ideas into production-ready software.


Did this article start to give you some ideas? We’d love to work with you! Get in touch and let’s discover what we can do together.