Document Engineering: How Memory and Knowledge Enter the Agent's Context
This is the third article in the series "Practices and Reflections on Harness Engineering in Enterprise Applications."
The Context Window Is Not Project Memory
Every time an AI agent works, it starts from the current context window. What it can see shapes how it understands the task. What it cannot see, it has to guess, rediscover, or continue on top of a wrong assumption.
But a real project is not a single session. An enterprise application may have long-term goals, historical decisions, completed and unfinished stories, technical conventions, business boundaries, review conclusions, lessons learned from past failures, and background that was only explained in a previous conversation.
Once a session restarts, those things do not automatically remain in the agent's context. Even if they appeared in a previous conversation, that does not mean the next agent can use them. This creates a very practical problem: each time work starts again, the agent may ask questions that were already answered, explore paths that were already explored, or repeat mistakes that were already corrected.
Of course, this problem appears to be getting easier. Models are becoming stronger, context windows are getting larger, and many tools now provide compact, summarize, and resume capabilities. For a personal project, a small demo, or a short-lived feature experiment, it is increasingly possible to keep most of the relevant context in one window and let the agent work from beginning to end.
But in enterprise applications, that is still not enough.
An enterprise application is not a short-term demo. Its lifecycle is usually measured in years, not in one session or one sprint. It carries years of business accumulation, historical decisions, architectural evolution, permission boundaries, exception rules, team conventions, review conclusions, and production issues. Many decisions have long chains of reasoning behind them: why another option was not chosen, why a strange-looking implementation cannot be changed casually, or why a business rule only applies in a specific scenario.
These things cannot truly be held by a single context window. Even as windows grow larger, that only eases the question of "how much can fit." It does not solve the questions of what should appear, when it should appear, how it should be understood, and how it should be shared by a team.
Even in an extreme case where an infinite context window existed, putting years of documents, all historical conversations, every review, and every coding convention into it would not mean the agent can stay focused. Putting a library into context does not mean the agent can find the right page on the right shelf at the right moment.
More importantly, an enterprise application is not the private workspace of one agent. It often involves collaboration across people, and sometimes across teams. The window of one session cannot become the shared project memory of a team, nor can it carry long-term governance, handoff, and retrospection.
So what a harness needs to solve is not simply giving the agent a larger window, nor stuffing everything into context. The real question is how to preserve and organize the memory and knowledge accumulated through long-term project evolution, across sessions, tasks, and teams, and let them enter the agent's current context at the right time.
What Is Memory, and What Is Knowledge
Before discussing how to manage them, it helps to distinguish two concepts that are often mixed together: memory and knowledge.
Memory is closer to "what happened." It is usually tied to time, process, and a specific task, and it changes more frequently. When a story started analysis, why it was paused, what a review found, why an implementation was sent back, why an approach was abandoned, and how a production issue was later handled are all forms of memory.
These records may not immediately become rules, but they preserve context and causality. Without memory, the agent can only see the surface of the current task. It cannot see how the task arrived here.
Knowledge is closer to "what should be done next time." It is usually more stable than memory. It is extracted from repeated experience into rules, conventions, methods, and judgments: how a workflow should move, how a certain type of page should be migrated, how a component should be used, what conditions a state transition requires, or how a certain technology stack should be tested.
Memory and knowledge are not completely separate. Memory, once organized, summarized, and validated, can settle into knowledge. Knowledge, once stabilized, also becomes part of the long-term experience of a project.
There is another layer that is easy to miss: knowledge itself does not need to stay in the agent's context all the time, but the agent needs to know that it exists, where it is, and when it should be read. In other words, the content of knowledge can live outside the current context, but the path and timing for reaching that knowledge must also be preserved.
In an AI harness, memory and knowledge are therefore a continuous body of project experience. Memory preserves process; knowledge extracts patterns. The former tells the agent "why things became this way," while the latter tells the agent "what should be done next time."
How Humans Manage Memory and Knowledge
Humans have faced similar problems for a long time.
The capacity of a human brain is limited, and working memory is especially limited. Long-term memory can preserve a great deal of experience, but it is compressed, forgotten, reconstructed, and blurred over time. Even an experienced person cannot remember every detail of a project, an organization, or a field.
More importantly, knowledge does not serve only one person. It needs to move across people, teams, and time. A project needs to pass its experience to later maintainers. A team's conventions need to be understood by newcomers. Knowledge in a field can even be inherited across centuries. None of this can depend only on one person's brain.
So humans have always externalized memory and knowledge.
Content closer to memory becomes diaries, notes, meeting minutes, work logs, issue records, and retrospectives. These preserve what happened, what people thought at the time, why a decision was made, and what happened afterward.
Content closer to knowledge becomes books, manuals, standards, tutorials, archives, and operating guides. These do not merely record one specific experience; they organize experience into a form that others can learn, cite, and reuse.
These externalized materials do not all remain in a person's current attention. Nobody needs to remember every page of every book in a library. The effective approach is to put memory and knowledge into the right carriers and places, such as manuals and archives as carriers, shelves and libraries as locations, and then use tables of contents, indexes, tags, and classification systems to find the right content when needed.
This is the lesson Harness Engineering can borrow: memory and knowledge need to be preserved, but they do not need to enter current attention all at once. They should be stored in the right places and reached on demand.
Using Documents to Carry Memory and Knowledge
In the practice of Harness Engineering, memory and knowledge management naturally borrows from how humans manage memory and knowledge: do not keep everything in current attention, but place it into stable carriers, then use location, structure, and reachability rules to let it enter the agent's context when needed.
That carrier is the document.
Here, a document is not only a traditional explanatory manual. It may be a workflow guide, story file, conversation record, review report, spec, rule, handbook, schema, configuration file, status record, or some form of structured data. If it carries project memory or project knowledge and affects future engineering judgment, it belongs to document engineering.
But documents themselves need to be engineered. Without engineering, documents quickly become another kind of noise: something was written, but the agent does not know it exists; the agent knows it exists, but cannot find it; it finds it, but cannot tell whether it is current; it reads it, but does not know when to use it.
A simple diary can illustrate what a well-engineered document looks like.
If someone wants to know the weather on their sixteenth birthday, they may not remember the weather that day, but they know the answer can be found in their diary. That requires several things: the diary is in a known place, such as a bookshelf; the diary records entries in a stable format, such as date and weather at the beginning of each entry; and the person knows the lookup path: find the diary, find the entry for the sixteenth birthday, then read the weather.
This example works because it answers three core questions of document engineering.
The first question is: where should the document live?
Critical project memory and knowledge should not exist only outside the repository. External documents can certainly exist, but if key knowledge lives only in an external system, the agent's access depends on permissions, connectors, search entry points, and tool availability. More importantly, the agent may not know when or where to look.
They also should not exist only inside a tool-provided agent memory. An enterprise application is not the private workspace of one agent. Project memory and knowledge need to be shared, reviewed, versioned, migrated, and handed off by a team. They cannot belong only to one session, one tool, or one private memory system.
More generally, critical project memory and knowledge should not live in places visible only to one person, one device, or one private session. Nor should they live in areas that are not versioned, synchronized, or shared by default. Ignored local draft folders, temporary files on one machine, and a tool's private memory store are not suitable as the only carriers of long-term project memory and knowledge.
Location is not just about "putting documents into the project." Different types of memory and knowledge should live in different places. Task memory should stay close to plans and stories. Long-term knowledge should live in a stable documentation area. Tool rules should live where the corresponding tool can load them. Temporary analysis artifacts should live where the current task can trace them. Location itself is an index: it tells humans and agents where to look when they enter a certain stage.
The second question is: what structure should the document use?
Different kinds of memory and knowledge need different carriers. Some content works well in Markdown because it needs explanation and narrative. Some content works better in JSON or YAML because it needs to be read and updated by programs. Some content belongs in tables because it needs comparison and tracking. Some relationships are better expressed as diagrams because they need to show flow or dependency.
Beyond format, documents need schemas. Different purposes need different structural rules: what fields a story should carry, how status history should be recorded, what conclusions a review report should include, how a workflow guide should organize steps, and how a handbook should describe its trigger conditions. Without structure, documents are hard to read reliably and hard for agents to interpret correctly.
The third question is: how is the document reached?
A document does not automatically enter the agent's context just because it was written. The agent needs to know that the document exists, when to read it, which layer to read, and how to use what it reads in the current task. Otherwise, even if the document exists, it remains a static file, not usable context.
Document engineering is therefore not about "writing more documents." It organizes project memory and knowledge through the right location, format, structure, and reachability rules so that humans and agents can use them at the right time. Location answers "can it be found?" Structure answers "can it be understood?" Reachability rules answer "can it enter context at the right time?"
The Core Principle of Document Engineering: Progressive Disclosure
The three core questions of document engineering all point to the same principle: progressive disclosure.
Return to the diary example. A person who wants to know the weather on their sixteenth birthday does not begin by reading every diary from cover to cover. They first know that the answer may be recorded in a diary. Then they know the diary is on the bookshelf. Then they open the diary, use the date to locate the right entry, and finally read the weather at the beginning.
The process unfolds layer by layer. First the entry point, then the carrier. First the scope, then the detail.
In an AI harness, progressive disclosure means not exposing all memory and knowledge to the agent at once. Instead, at each stage of the workflow, only the layer needed by the current stage should be exposed, while keeping a path open for deeper tracing when needed.
This has two meanings.
The first is that the agent can find the right document at the right time. When entering a plan, it should see the goal, status, and stories related to that plan. When entering a story, it should see the story itself, dependencies, historical conversations, and necessary specs. When entering implementation, it should see coding conventions, testing methods, and related modules. When entering review, it should see acceptance criteria, implementation reports, and quality-check results.
The second is that the current context window is not overwhelmed by excessive content. Long-term knowledge, historical records, tool rules, and project conventions are all important, but they should not fully enter the current context at every moment. Too much content makes the agent lose focus, and it buries the information that actually matters.
Progressive disclosure is not about hiding knowledge. It is about making knowledge appear at the right time. It lets the agent avoid both amnesia and being crushed by the entire memory and knowledge of the project.
Practice One: Layering and Distilling Memory
In the practice of Harness Engineering, memory cannot be just an ever-growing conversation. It needs to live at the right layer.
Memory can be divided into at least four layers: cross-project memory, project memory, plan memory, and story memory.
Cross-project memory preserves experience that can be reused across projects. Project memory preserves the long-term background of the current project. Plan memory preserves the goal, progress, and key decisions of a line of work. Story memory preserves the analysis, implementation, review, and QA conclusions of a specific task.
The point of layering memory is to let the agent load memory at the level where it is working. When entering a project, it should not need to read the details of every story. When entering a story, it should not rely only on project-level principles and ignore the history of the current task.
Memory also needs to be written down promptly. Whenever an important decision, scope change, quality issue, test finding, review conclusion, pause/resume event, or session handoff occurs, anything worth reusing in the future should be recorded in the document at the right layer. Anything that can save a future explanation, prevent a repeated mistake, or avoid re-exploration is worth recording.
But memory cannot grow forever. In a completed plan, many implementation details were important at the time, but over the long term they become context burden. After completion, plan memory needs to be compressed and archived: key milestones, important decisions, risks, and conclusions are preserved, while raw process details can remain in git history, archive records, or historical logs, and only be read when tracing is needed.
If a kind of memory is mentioned repeatedly, it should not remain memory forever. Repeated operation steps can become a handbook. Repeated review problems can become a checklist. Problems shared across projects can become a guide. Memory, once organized, summarized, and validated, becomes knowledge.
Memory management, then, is not about preserving everything forever. It is about recording memory at the right layer, reading it at the right time, compressing and archiving it after completion, and distilling it into knowledge when it appears repeatedly.
Practice Two: Layering Knowledge
When memory has been validated repeatedly, it can settle into knowledge. But if knowledge exists only as a long document, it is still hard for an agent to use. The agent should not read all knowledge every time it works, and it cannot magically recall a piece of knowledge if it does not even know it exists.
So in the practice of Harness Engineering, knowledge needs to be layered.
The outermost layer is the description. It usually lives in YAML front matter and describes when this knowledge should be triggered. The description is not the full knowledge; it is the entry point. It lets the agent know that in the current situation, a certain piece of knowledge may need to be considered.
The second layer is the handbook. A handbook turns knowledge into step-by-step action. It does not try to explain all background. It tells the agent what sequence to follow when the knowledge is triggered.
The third layer is the signpost. Steps in a handbook may point to another handbook or a detailed document. When a step requires more background, another workflow, or fuller rules, the agent can follow the signpost deeper.
The deepest layer is the detailed document. It preserves full explanations, background, design rationale, boundary conditions, and exceptions. It should not enter context in full at the beginning of every task, but when the agent truly needs to understand the reason or handle a complex case, it must be findable.
This layering lets knowledge unfold as needed. The main agent can automatically receive descriptions of potentially relevant knowledge through agent hooks at the right time. When a subagent is spawned, it can also receive descriptions related to its responsibility. The agent does not need to read all knowledge at the beginning, but it knows what knowledge exists and when to continue downward.
This is progressive disclosure in knowledge management: know that something exists, expand it at the right time, read action steps first, and enter deeper explanations only when needed.
This structure has another benefit: a stable handbook can be converted into an agent skill more easily. The description can become the trigger condition, the step-by-step content can become the execution flow, and signposts can become on-demand knowledge entry points. In other words, good knowledge layering serves current document reading and leaves a path for future automation and tooling.
Back to Harness: Document Engineering Ultimately Serves Humans
On the surface, document engineering serves the agent: it lets the agent know what memory and knowledge exist, where they are, when to read them, and how they enter the current context.
But that is not the ultimate purpose of document engineering.
What matters most is that document engineering reduces human cognitive load. Humans do not need to remember every project detail, repeatedly explain background, manually recall each workflow step, or keep all conventions, historical decisions, and lessons learned in their heads.
These memories and knowledge are placed into the harness, where the agent reads, organizes, executes, and checks them at the right time. Humans can then focus more on direction, judgment, trade-offs, authorization, and verification.
Document engineering deliberately places the heavy burden of memory and knowledge on the agent, not so the agent can replace humans. On the contrary, it frees humans from repeated explanation, repeated search, and detailed memory work, so they can focus on deciding where the system should go.
The ultimate purpose of document engineering is to reduce the burden on humans. The real meaning of memory and knowledge entering the agent's context is not that the agent owns everything, but that humans can use the harness to steer the agent more easily and more reliably.