Context Window: How Harness Engineering Manages Agent Attention

16 阅读7分钟

Context Window: How Harness Engineering Manages Agent Attention

This is the eighth article in the series "Practices and Reflections on Harness Engineering in Enterprise Applications."

Why One Window Cannot Run an Entire Project

Model context windows are getting larger, and tools are starting to provide compacting capabilities. It can look as if all we need is a long enough window: put the requirements, code, documents, historical discussions, and execution process into it, and let the agent work from beginning to end.

Behind this idea is an implicit assumption: once something enters context, the agent will remember it, understand it, and use it correctly whenever it becomes relevant later.

That assumption is not unreasonable. Many people first encountered generative AI through conversational products such as ChatGPT. These products deliberately hide the complexity of the context window, allowing people to use AI naturally through conversation. For question answering, writing, summarization, and lightweight tasks, that is a good experience.

But there is an important difference: ChatGPT is chatting, while agentic development tools are working through chat.

For the former, conversation itself is the core experience. A user asks a question, the model answers, and in many cases that conversation is the complete usage loop. Agentic development tools may also use chat as the interface, but chat is only the entry point for work. The agent reads the project, modifies files, calls tools, runs commands, generates reports, and moves the workflow forward. It is not merely completing an answer; it is participating in an engineering process that keeps evolving.

Therefore, usage habits formed around ChatGPT cannot be directly transferred to agentic development. In ordinary conversation, users do not need to understand the context window. In enterprise application development, however, the context window is the agent's attention space for the current work, and it must be managed deliberately.

This is where the problem begins.

A context window is not a database or a hard drive. It is more like the attention space of the agent's current work. The fact that a piece of information appears in the window only means the model may see it. It does not mean the information will be weighted equally, referenced accurately, or kept at the same priority throughout all later steps.

Humans are similar. Hearing something in a meeting does not mean one can recall it correctly three hours later in a complex discussion. Having ten documents spread out on a desk does not mean every detail in every document receives equal attention during a judgment. Information existing and information working at the right moment are two different things.

Agents are no different. The longer the conversation, the more document fragments and historical records are present, the easier it is for the current task to lose focus. Recent content, repeated instructions, clearly structured information, and fragments that are visibly related to the current task tend to influence the agent more. Early, implicit, scattered, or unstructured information may still be in the window, but in practice it can be weakened, missed, or misused.

So the real question is not whether the context window can contain everything. It is whether the agent can focus on the right content at the right moment. Harness Engineering manages context windows not because the window is too small, but because agent attention needs to be managed.

A Context Window Cannot Carry Project Memory and Knowledge

Project knowledge and memory cannot exist only inside the context window of a single conversation. From this perspective, the need for document engineering becomes clearer.

A context window is only the agent's workspace for the current session. It is not project memory. It has capacity limits, disappears when the session ends, and may lose detail and nuance after compacting. More importantly, it is not naturally shareable. The content inside one agent's current window does not automatically become stable project knowledge that another agent, another human, or the next session can rely on.

Enterprise applications require memory and knowledge that can cross sessions, agents, humans, and time. Why a decision was made, what constraints an API has, why a process cannot be changed, what state a story is in, what risks a review found: none of this should exist only inside one conversation.

Project memory therefore has to be carried by document engineering. Specs, reports, status, handbooks, project knowledge, and history turn knowledge into engineering assets that can leave one window and become shareable, transferable, traceable, and reusable.

But that does not mean all documents should be dumped into the context window. That would only make the agent lose focus. What document engineering really does is progressive disclosure: first let the agent know what knowledge exists, then expand the right content at the right stage, for the right task.

In this way, the context window contains what the current task actually needs, while long-term project memory remains stable outside the window. The agent can stay focused and efficient, and the project can continue to collaborate, hand off work, and remain traceable across sessions, agents, and humans.

Context Isolation Is the Basis of Agent Division of Labor

If long-term project memory and knowledge cannot all live inside the context window, the next question is: when work begins, can we simply put all relevant knowledge and memory into one window?

The answer is still no.

Enterprise applications contain too many responsibilities, and each responsibility requires its own context. Business rules, architectural constraints, API documents, design files, testing strategies, review rules, release processes, historical decisions, and current status may all be useful information, but they should not all enter the same agent's context at the same time.

If everything is placed together, knowledge for different responsibilities interferes with one another. A Dev Agent needs code, architecture, APIs, and implementation constraints. A QA Agent needs acceptance paths, test data, boundary cases, and verification tools. A Code Reviewer Agent needs code quality rules, architectural boundaries, team conventions, and risk patterns. BA and TL Agents need business rules, technical constraints, and analysis material. A PM Agent needs workflow, plans, stories, status, and agent coordination information.

The issue is not which context is larger or smaller. It is which context is relevant to the current responsibility. If an agent receives a large amount of information unrelated to its current responsibility, it does not become more reliable. It becomes easier to distract. It may be pulled by knowledge from other responsibilities, casually do work it should not do, or miss important details in the task it should be focused on.

Agent division of labor is therefore also context division. Responsibility boundaries determine what knowledge, tools, inputs, and outputs an agent should have. They also determine what should not enter its current context.

This also explains, from the perspective of context, why agents must stay within their responsibilities. A QA Agent should not casually modify code not only because it is outside its responsibility, but also because its context is not organized for implementation. A Dev Agent should not make business trade-offs for humans not because it lacks raw capability, but because its context does not carry the complete business, risk, and responsibility information required for that judgment.

Context isolation is not about building walls. It is about keeping each agent focused within its responsibility. What should be seen enters the current window. What should not be seen stays outside. When needed, handoff happens through documents, reports, status, and human gates.

Workflow Lets Context Enter by Stage

From the responsibility perspective, different agents need different context. From the workflow perspective, the same plan or story should not use the same full context at every stage.

The complete context of a story may include requirements, discussions, designs, code, historical decisions, analysis results, implementation process, review records, test results, and human gate judgments. If all of that is placed into one window, the agent still loses focus.

From the perspective of context, workflow is not only state transition management. It is also stage-based context segmentation. Each stage has its own inputs, processing mode, and outputs. Those inputs and outputs are essentially context selection, compression, and handoff.

In the analyzing stage, agents need more raw material and background knowledge, and the goal is to form the spec. The spec compresses scattered raw context into context that the implementing stage can rely on.

In the implementing stage, agents should not sink back into all the raw material. They should mainly rely on the confirmed spec, relevant code, and implementation constraints. Implementing is not only Dev Agent code writing. It also includes Code Reviewer Agent review and QA Agent verification. Development, review, and testing all revolve around the same spec, but understand it through different responsibilities and different contexts.

The Dev Agent sees implementation requirements in the spec. The Code Reviewer Agent sees code review standards in the spec. The QA Agent sees verification paths and acceptance evidence in the spec. Their contexts are different, but none of them should detach from the confirmed spec. Development reports, review reports, and test reports then compress the execution, review, and verification results of the implementing stage into context that the human gate can use.

So specs and reports are not extra documentation burden. They are context handoff artifacts generated by workflow at different stages. The value of workflow is not only that tasks have status. It is that context enters by stage: what the current stage needs gets expanded, and what the next stage needs gets produced.

Humans Must Actively Manage Agent Attention

Isolating context by responsibility and segmenting context by workflow does not mean the context window should be as small as possible. It needs to land in the sweet spot of attention.

Too little, and the agent guesses. Too much, and the agent loses focus. Too scattered, and responsibilities become confused. Too mixed, and stages interfere with one another. Too long, and earlier content becomes weakened. Too dependent on compacting, and detail, nuance, and decision process are lost.

A good context window should be large enough that the agent does not need to guess, small enough that the agent is not dragged away by irrelevant content, and relevant enough that the agent stays focused on the current responsibility and current stage. It should also avoid depending on compacting to sustain long-running work.

This is exactly the problem Context Engineering needs to solve. It does not happen automatically. Humans must actively manage agent attention: record memory worth preserving, avoid treating temporary conversations as project memory, end sessions at appropriate stages instead of extending conversations forever, avoid relying on compacting, prevent agents from crossing responsibility boundaries, and decide which documents, specs, reports, and tool results the current task truly needs.

When an agent starts repeating itself, losing focus, hallucinating, or drifting away from the task, humans should recognize that this is not necessarily only a model capability problem. The context itself may be broken. Continuing to prompt the agent may simply add more confusion on top of a confused context. A better move may be to stop, record the necessary memory, cut off the current session, and reconstruct a clean, focused context.

This is also part of Human in the Loop. Humans do not only judge results. They also manage the attention space in which agents produce those results.

Back to Harness: Managing the Context Window Is Managing Attention

Back to harness: the context window is not a container that is better simply because it is larger, nor is it a workspace that can be maintained indefinitely through compacting. It is the attention space of the agent's current work.

What the harness should do is not place all project knowledge into that space. It should let correct and appropriate content enter the right agent's context window at the right time. Correct means relevant to the current task, responsibility, and stage. Appropriate means neither too much nor too little: enough to support the agent's work, without making it lose focus.

What humans should do is actively manage that space through the harness: what should be recorded, what should be expanded, what should remain outside, when a session should end, when context should be cut off and rebuilt, and when an agent should be stopped from continuing inside the wrong context.

So managing a context window ultimately means managing neither tokens nor file count, but agent attention. A good harness does not let the agent see everything. It lets the agent see the correct and appropriate content for the current stage, current responsibility, and current task.