From prompt engineering to context engineering

1 | Understanding the evolution: from static prompts to dynamic contexts

1.1 What is prompt engineering?

Prompt engineering describes the craft of shaping a single text input ("prompt") so that a large language model (LLM) responds as well as possible. Since the arrival of GPT-3.5, developers have spent countless hours fine-tuning wording, optimising the order of prompt elements, or refining system instructions.

Classic prompt engineering techniques:

Few-shot learning: provide examples in the prompt
Chain-of-thought: trigger step-by-step reasoning
Role playing: assign the model a specific role
Structured output: define format requirements for the response

Prompt engineering techniques

1.2 The successes and the limits

Prompt engineering has carried us a long way — it enabled the first productive applications of LLMs. But as task complexity rises — especially when bringing in extensive external data, for example from company databases — we hit fundamental limits.

2 | The limits of pure prompt engineering

2.1 Static and fragile

A carefully optimised prompt often only works under very specific conditions. Even small changes — a different user with a different writing style, new data formats, or a model update — can degrade performance dramatically. What worked yesterday fails today, or no longer delivers optimal results.

2.2 Context window limits

Modern LLMs do have larger context windows (Claude: 200k tokens, GPT-4: 128k tokens, Gemini: 2MB), but in practice the requirements explode quickly — especially when using coding tools like Cursor, Windsurf, Claude Code, and friends:

Longer conversation histories
Multiple tool calls with responses
Extensive document analyses
Complex code repositories

2.3 Missing tools and stale information

Many real-world tasks need:

External knowledge sources: company databases, APIs
Current information: web search, news feeds
Actions: send emails, create tickets, deploy code

A static prompt often can't capture this dynamic.

2.4 Scalability and maintainability

A cleverly worded prompt may work for one specific task, but:

How does it adapt to different user groups?
How does it integrate new features without breaking changes?
How does it stay stable across model updates?

2.5 Observability and debugging

Without systematic traceability, troubleshooting becomes guesswork — as so often in IT:

Why did the model make that decision?
Which information was missing?
At which step in the process did something go wrong?

3 | The paradigm shift: context engineering

3.1 The new definition

According to a concise definition by LangChain (see sources), context engineering means:

"Building dynamic systems that give the LLM exactly the right information and tools, in the right form at the right time, so that it can solve the task reliably and efficiently."

It's no longer just about the input, but about the entire information ecosystem in which the model operates.

3.2 Why context > prompt

Recent studies show: when AI agents fail, in over 80% of cases it's because they received the wrong, incomplete, contradictory, or outdated context — not because the model itself was inadequate.

A practical example: a support agent has to answer a technical customer query. With pure prompt engineering, we'd try to cover every possible scenario in the prompt. With context engineering:

The agent analyses the prompt
Checks the conversation history ("short-term memory")
Dynamically loads relevant documentation (e.g. via RAG before the LLM call)
Checks the customer history (e.g. via an MCP client, "long-term memory")
Consults the current system status, e.g. via "tool use" or "function calling"
Picks suitable response templates

Prompt engineering is just one aspect of context engineering.

3.3 The four core strategies of context engineering

LangChain (see references below) identifies four fundamental patterns for effective context management:

Strategy	Purpose	Practical example	Implementation
Write	Persist context outside the token window	Scratchpad for intermediate results in complex calculations	LangGraph Memory, Redis, PostgreSQL
Select	Load only relevant information dynamically	Top-3 relevant code snippets via embedding search	Databases for fast, relevant retrieval (e.g. Qdrant, Pinecone, Weaviate), RAG pipelines
Compress	Token efficiency through intelligent summarisation	Automatic conversation summaries at 80% token usage (of the context window)	LLM-based summarisation, extractive compression
Isolate	Split complex tasks into specialised sub-agents	Research agent → analysis agent → writing agent	Multi-agent orchestration, LangGraph

Context management strategies

4 | Best practices for modern context engineering

4.1 Recommendations

A few options that are (relatively) easy to use to fill an LLM's context window well:

Separate system and role instructions clearly (e.g. via a YAML block).
Retrieval-augmented generation (RAG) for current or proprietary data — for example documentation.
Describe tool calls declaratively (e.g. via JSON schemas), so the model uses them reliably.
Introduce memory layers — short-term (thread) vs. long-term (user profile).
Watch out for non-contradictory information — for example, don't mix versions when documenting a software library so that the LLM has access to docs v4 and v5 at the same time.
Use telemetry and evals (e.g. LangSmith) to make token costs and error rates visible.

A few examples of metrics that can be useful:

Context utilisation rate: how much of the provided context is actually used?
Retrieval precision: was the loaded information relevant?
Token efficiency: ratio of information to token consumption
Context switching overhead: time taken for context updates
Error attribution: which part of the context led to errors?

5 | A practical guide: getting started with context engineering

A few starting points for a structured approach:

As preparation, identify the context you actually need:

Which information does your system really need?
Which data sources are available?
How current does the data have to be?

What are the use cases in detail?

Identify the top five use cases
Document the context required per use case
Recognise overlaps and patterns

Steps for implementation:

Collect relevant data ("collect")
Prepare data for the LLM ("transform")
Check for completeness, correctness, and consistency ("validate")
Decide how to deliver the data — for example via RAG or via an MCP server?
Assemble the final prompt / context ("assemble")

Data preparation for LLMs

Iterative improvement:

Start with static context
Add dynamic elements step by step
Measure the impact of each change where possible

6 | Looking ahead: where is this heading?

Current trends include, for example:

Adaptive context windows: models that adjust their context window dynamically
Multi-modal context: integration of text, images, audio, video
Federated context: distributed context sources — while preserving data protection
Self-organising context: AI systems that optimise their own context

These come with several challenges:

Context coherence: consistency across multiple context sources
Privacy-aware context: GDPR-compliant context processing
Real-time context updates: millisecond-fast context updates
Cross-model context: context sharing between different AI models
and more

7 | Conclusion: the way forward

Prompt engineering was the first step — it taught us how to communicate with AI models. Context engineering is the next evolutionary leap: it's about building intelligent information ecosystems in which AI can reach its full potential.

The future doesn't belong to the cleverest prompt, but to the smartest context system. Companies that understand and apply this shift will have a decisive competitive advantage.

The core message: stop thinking in individual prompts and start thinking in dynamic context systems. Your AI is only as good as the context you give it.

Note: we'll soon publish another article that implements context engineering in a .NET solution with MCP servers.