Spec: Letting AI Truly Understand the Problem Before Implementation

3 阅读13分钟

Spec: Letting AI Truly Understand the Problem Before Implementation

This is the fourth article in the series "Practices and Reflections on Harness Engineering in Enterprise Applications."

Why the Agent Should Not Start Coding Directly

With workflow in place, an agent can enter the right plan or story. With document engineering in place, the project's memory and knowledge can enter the agent's Context at the right time. At this point, it may seem that everything is ready and the next step should be to let the agent start coding.

In real enterprise applications, doing that usually breaks down quickly.

Finding the right work unit does not mean the problem has been understood. Having the relevant context does not mean the implementation is clear. The agent may still behave much like it does when building a demo: filling in missing information on its own, guessing business rules, designing unconfirmed interactions, ignoring implicit boundaries, skipping exception paths, and rapidly producing a large amount of plausible-looking code in the wrong direction.

That code may run. It may not look obviously wrong. The real problem is that it may not be what the business needs, or a change the system should accept.

This is not a new problem created by AI. Software engineering has long emphasized the same lesson: the shorter the feedback loop, the earlier a problem is found, and the cheaper it is to fix.

If a misunderstanding is found before implementation, fixing it may only require changing a few sentences, adding a few boundary conditions, or confirming an interaction again. If the same misunderstanding is found after the code has been written, tests have been added, review has been completed, or acceptance has already started, the cost is completely different. At that point, what needs to change is no longer just the understanding, but the implementation, tests, documentation, state, and even the rhythm of the whole story.

Harness Engineering is not inventing an entirely new kind of software engineering. It is more often summarizing, absorbing, and trimming practices that have already been validated, then applying them to AI agents. This case is no different: do not wait until implementation is finished to confirm whether the agent understood the problem correctly. Make the agent's understanding explicit before implementation begins.

This pre-implementation confirmation ultimately serves humans. Before an agent starts generating code at scale, people need to see how it understands the current problem: what it thinks should be done, what should not be done, where the boundaries are, where the risks are, and what remains uncertain. Only when this understanding is made explicit can people steer the agent instead of passively accepting its guesses after code has already been generated.

In enterprise applications, the first confirmation of understanding should not happen after implementation. Before the agent enters implementation, there must be a clear analysis phase.

The Output of Analysis: Spec

The purpose of the analysis phase is not to make the agent wait longer, nor to make the process look more complete. It solves a more concrete problem: how to turn a story from something that can be looked at into something that can be built.

A story gives the work an entry point, but it is usually not yet a complete problem definition. It may contain requirements, enhancements, acceptance criteria, and some context, but those still need to be interpreted, decomposed, and completed. What must be done? What is only background? Which boundaries must not be crossed? Which exceptions must be handled? Where are the ambiguities? Which questions require human confirmation? These need to be organized before implementation.

More importantly, this understanding cannot live only inside the current conversation. An agent saying "I understand" is not enough for an enterprise application. If the understanding exists only inside one session, it cannot support collaboration, cannot be traced later, cannot explain why the implementation took a certain shape, and cannot be managed through state transitions.

The analysis phase must therefore externalize understanding into a clear carrier. That carrier is the Spec.

It is important to be explicit about what Spec means here. In this article, a Spec is not "a Markdown file written before development." If a so-called spec merely restates the requirement, or briefly lists what the agent plans to do, but cannot constrain implementation, support review, guide QA, or be reverse-synced during implementation, then it is only a document, not the Spec discussed here.

A spec that cannot constrain implementation, withstand review, guide verification, or support reverse sync may look like a spec, but it is really just a pre-implementation note.

In Harness Engineering, a Spec is the engineered problem definition produced by the analysis phase. It must organize the current story's requirements, context, boundaries, constraints, exceptions, acceptance criteria, implementation impact, and open questions into a shared basis that can drive development, support review, guide verification, and be updated as facts change.

A Spec is therefore not an extra documentation burden. It is the engineering artifact between story and code. Without a Spec, implementation can only depend on the agent's immediate understanding. With a Spec, implementation, review, and QA can continue around the same problem definition.

Spec Is Where Consensus Forms

Because a Spec is an engineered problem definition, it naturally becomes the place where different people and different agents form consensus around the current problem.

People and Agents Align on the Problem Through Spec

In enterprise applications, human judgment never comes from only one role.

Behind a story, there may be a PO's judgment about business goals, functional scope, and acceptance criteria; a UX designer's judgment about interaction, visual details, and usability; an SA's judgment about architectural boundaries, system flow, and integration; a DevSecOps team's judgment about deployment, permissions, security, networking, and operational constraints; and input from testing, support, historical maintainers, or real business users about exception paths, past issues, and actual usage.

These judgments are usually scattered across prototypes, designs, architecture documents, meeting notes, chat records, historical issues, review records, and sometimes someone's experience. They matter, but they do not naturally enter the agent's current Context, nor do they automatically become executable task boundaries.

The role of Spec is to collect the human judgments relevant to the current story and turn them into a problem definition the agent can understand, execute, and verify.

In other words, Spec does not replace human judgment. It organizes human judgment and unfolds it for the agent. People should not need to squeeze all background into a single prompt, nor wait until the agent has written code to discover that it misunderstood the business, interaction, architecture, or infrastructure constraints. Human judgment should enter the Spec before implementation, so the agent starts work inside the right problem boundaries.

People Complete Spec with the Help of Analytical Agents

Forming a Spec is not a matter of copying various judgments into one file. The difficult part is that these judgments come from different roles, documents, and historical contexts. There may be gaps between them, and sometimes conflicts.

In this practice, the person completing the Spec is not a spectator, but the Human in the Loop: the person responsible for understanding, judgment, and trade-offs with the help of BA Agent and TL Agent.

BA Agent focuses more on business understanding. It reads from PO input, business users, prototypes, historical stories, acceptance criteria, and real usage scenarios to clarify what the current story is actually solving: what the business goal is, what the user path is, which rules must be followed, which exception paths must not be missed, and which expressions remain ambiguous.

TL Agent focuses more on technical understanding. It reads from architecture documents, existing code, system boundaries, dependencies, technical conventions, and DevSecOps constraints to clarify the engineering constraints of the current story: which modules may be affected, which boundaries must not be broken, which interfaces and state transitions must be respected, and which implementation paths carry higher risk.

This process is not just a few rounds of conversation. It involves repeated clarification, document reading, code search, web search, reasoning, discussion, and trade-offs. Agents need to bring findings back continuously, and people need to keep deciding which information is valid, which assumptions should be rejected, and which boundaries must be confirmed again.

This is also one of the most demanding parts of working under the Harness Engineering paradigm. It deals not with code details, but with direction, scope, semantics, and constraints. If the judgment here is wrong, or a key scenario or boundary is missed, faster implementation only carries the error farther.

The hard part is that no agent can truly take this responsibility away from people. BA Agent and TL Agent can help read, organize, question, and reason, but they cannot decide whether the business goal is correct, whether the trade-off is acceptable, or who bears final responsibility. The burden of this stage does not disappear because agents are assisting. It still depends on human experience, judgment, and attention.

People are responsible for judgment and trade-offs in this process. Agents can find issues, prepare material, propose assumptions, and generate candidate approaches. But when business goals conflict with technical constraints, when scope needs to expand or shrink, or when acceptance criteria need to be redefined, the final decision must still be made by people.

Because the goal of this stage is to understand the problem, not to modify the implementation, no agent should modify implementation code during this phase. Agents may read code, search documents, run necessary read-only checks, ask questions, and update the Spec, but they must not enter implementation early. Otherwise, analysis degenerates into another form of "thinking while coding."

So a Spec is not something written independently by one agent. It is the solidification of understanding completed by people with the help of analytical agents. It gathers judgments scattered across the organization, documents, history, and code into a problem definition the current story can rely on.

Spec Still Needs Review After It Is Written

Once a Spec has been written, the agent still should not immediately start coding. The Spec itself may have problems: it may miss scenarios, preserve vague expressions, contain internal conflicts, include unverifiable acceptance criteria, or fail to truly align technical constraints with business goals.

So the Spec also needs review.

Spec Reviewer Agent checks whether the problem definition itself holds. It is not reviewing code quality. It is checking whether the Spec is clear, complete, consistent, implementable, and verifiable: whether ambiguities remain unresolved, key boundaries are missing, requirements contradict each other, acceptance criteria cannot be verified, or existing system boundaries, technical conventions, or required context are being violated.

Human review is responsible for final judgment. Spec Reviewer Agent can point out risks, raise questions, and suggest changes, but whether the business goal is correct, whether the scope is reasonable, whether the trade-off is acceptable, and whether the acceptance criteria are sufficient to represent completion still require human confirmation.

Only after the Spec has passed review does the current story truly become ready for implementation. In other words, the analyzing -> implementing human gate is essentially the final human confirmation of the analysis phase. It does not confirm that "the agent has written a document." It confirms that "the current problem has been understood well enough to begin implementation." The carrier of that confirmation is the reviewed Spec.

Dev, QA, and Code Reviewer Use Spec as Their Basis

After the Spec passes review, the story truly enters implementation. At this point, the role of the Spec changes. It is no longer only the output of analysis, but the shared source of truth used by Dev, QA, and Code Reviewer.

Interestingly, they read the same Spec, but they do not see the same thing.

Dev Agent sees implementation tasks in the Spec: which modules need to change, which behaviors to implement, which boundaries to respect, which exceptions to handle, and which side effects to avoid.

QA Agent sees verification criteria: which user paths must be covered, which boundary conditions must be verified, which exception scenarios must be tested, and what result counts as truly complete.

Code Reviewer sees code requirements: whether the implementation deviates from the problem definition, misses conditions in the Spec, crosses scope boundaries, or introduces unnecessary complexity or risk.

This is the value of Spec as a carrier of consensus. It is not a document written for one role. It lets different agents work around the same problem definition. Each agent may read the Spec through the lens of its own responsibility, but no agent should interpret the task independently from the Spec.

Without a Spec, multi-agent collaboration easily becomes an accumulation of temporary understandings: Dev implements based on one understanding, QA verifies based on another, and Code Reviewer reviews based on a third. Each step may look reasonable, but they may not be answering the same question.

Spec lets multi-agent collaboration depend not on "each agent's own understanding," but on the same confirmed problem definition.

Spec Is the New Collaboration Interface in the AI Era

At this point, we can see that Spec connects analysis and implementation.

For the analysis phase, Spec is the output. People, BA Agent, TL Agent, and Spec Reviewer Agent use it to clarify the problem, organize constraints, confirm boundaries, and finally decide whether the story is ready for implementation.

For the implementation phase, Spec is the input. Dev Agent, QA Agent, and Code Reviewer read different responsibilities from the same Spec and turn the same problem definition into code, tests, and review.

This makes Spec a new collaboration interface in the AI era.

In traditional software engineering, code has been the core interface where humans and machines work together. Humans express intent as code, and compilers, runtimes, and test systems understand and execute that intent through code. Code is both the object of human collaboration and the object executed by machines.

But when AI agents begin participating in development, this interface moves earlier. People no longer need to personally translate every implementation detail into code. More of their work happens at the Spec layer, where they express goals, boundaries, constraints, acceptance criteria, and trade-offs. The agent then uses the Spec to generate code, add tests, perform review, and report implementation findings back.

For people, Spec is increasingly the most important work interface before implementation begins. Many problems that used to be expressed in code, corrected in review, or exposed in testing should now be discussed and confirmed in Spec as early as possible.

This does not mean people no longer need to care about implementation results. People still need to read reports, verify functional behavior, and take responsibility for the final result. But their primary creative input should happen as much as possible at the Spec.

Three Development Disciplines of Spec-Driven Development

When Spec becomes the output of analysis and the input to implementation, the development style changes: implementation should no longer be driven by the agent's immediate understanding, but by the reviewed Spec. That is the Spec-Driven Development discussed here.

Spec-Driven Development here is not a documentation ritual. It is a set of development disciplines: code comes from Spec, Spec takes precedence when code conflicts with it, and changes in code direction must first go back to Spec.

These disciplines are not only for agents. They are even more important for people. Making an agent follow rules is relatively easy. The hard part is making people resist the urge to bypass Spec and change code directly. Once that happens, Spec is no longer the source of truth.

No Spec, No Code.

Without a confirmed Spec, there should be no code. Every piece of code must be explainable through the Spec: why it is written, what problem it solves, which acceptance criterion it satisfies, and which boundary it respects. Otherwise, code loses its source of explainability, verifiability, traceability, and manageability.

Spec is Truth.

When code conflicts with Spec, Spec takes precedence. This is not because Spec is magically correct. It is because Spec is the confirmed source of truth. If the implementation deviates from Spec, the code should be changed first so that it returns to the problem definition.

Reverse Sync.

If implementation reveals that the code should not continue following the original Spec, the answer is not to bypass Spec and change code directly. The correct path is to reverse-sync the Spec first: update it with the new facts, constraints, boundaries, and trade-offs, confirm it again, and only then modify the implementation.

The part most easily overlooked is that people also feel the urge to change code directly. When implementation looks wrong, a boundary is missing, or an interaction detail is unreasonable, it is natural to want to "just fix the code," especially when the change looks small. But if that change modifies the problem definition, acceptance criteria, boundary, or implementation constraint without syncing back to Spec, the discipline of Spec-Driven Development has been broken.

The code may become temporarily correct, but it has separated itself from the confirmed problem definition. Later, others cannot explain why that code exists, QA cannot know how to verify it, Code Reviewer cannot judge whether it crossed scope boundaries, and future agents cannot trace where the change came from. The project's explainability, verifiability, traceability, and manageability are all harmed.

These three disciplines form a complete chain: No Spec, No Code ensures that code has a source; Spec is Truth ensures that code has an alignment target; Reverse Sync ensures that Spec does not become a false authority when reality changes.

What a Good Spec Must Provide

If Spec is the source of truth in Spec-Driven Development, then the Spec itself must be reliable enough. Otherwise, Spec-Driven Development only stabilizes the propagation of errors into implementation, review, and QA.

Clarity.

A Spec must not rely on vague language. Phrases like "improve the experience," "support permissions," or "adapt the styling" are not enough. It must describe concrete behavior, boundaries, and outcomes: what happens in which scenario, what the user sees, how the system responds, and what state counts as completion.

Enough detail.

A Spec must be detailed enough that Dev and QA do not need to guess. A Spec for a frontend project should not merely say "make the component larger." It should describe concrete size, spacing, color, state, responsive behavior, and, when necessary, even CSS-level details. A Spec for a backend project should not merely say "return success." It should specify whether the successful HTTP status code is 200, 201, or 204, what the response body schema is, and what structure is returned on error. If a detail affects implementation or verification, it belongs in the Spec.

Completeness.

A Spec must not describe only the happy path. Key scenarios, boundary conditions, exception paths, permission branches, and data states all need to be covered. In enterprise applications, missing a critical branch is often more dangerous than getting a field wrong.

Appropriate granularity.

Not every change needs to become a full requirement. A simple rename, copy change, or small refactor may only be an enhancement or task inside a story. Conversely, a change that crosses modules, roles, or state transitions cannot be compressed into one sentence like "support this feature." The granularity of the Spec must match the work's risk, verification complexity, and collaboration cost.

Compressed and filtered.

The analysis phase may produce a large amount of material: conversations, search results, code-reading notes, option comparisons, screenshots, design details, and historical background. Not all of this is suitable for Dev, QA, or Code Reviewer to consume directly. A good Spec is not the complete record of the analysis process. It is the engineered expression of the analysis result. It should compress the analysis material into an executable problem definition and leave noise in the conversation, appendix, or original records.

Raw materials are not the Spec.

Figma, OpenAPI, database DML, architecture diagrams, API documents, and historical issues can all be inputs to a Spec, but they cannot replace the Spec. Dev and QA do not only need to know that these materials exist. They need to know what those materials mean for the current story: which UI details must be implemented, which API fields must be used, which database constraints affect behavior, and which acceptance criteria follow from them.

Raw materials are evidence, not the problem definition.

Implementability.

A Spec must be able to map to concrete engineering actions. It cannot only describe a vision or a business goal. After reading it, Dev Agent should at least understand roughly which modules may be affected, which constraints apply, which implementation risks exist, and which paths should not be taken.

Verifiability.

A Spec must provide a basis for acceptance and testing. After reading it, QA Agent should know how to prove that the current story is complete: which user paths must be covered, which data states must be constructed, which permissions and exceptions must be verified, and which outcomes count as passing.

Boundaries.

A Spec must explain not only what to do, but also what not to do. AI agents are very good at filling in seemingly reasonable gaps, and boundaries are what prevent self-directed design. A good Spec tells the agent where to stop.

Context.

A Spec needs to explain why this is the right direction. It does not need to include all history, but it should preserve enough background for future people and agents to understand where a judgment came from, why it holds, and which trade-offs have already been confirmed.

Back to Harness: Spec Is Where Collaboration Converges

From the perspective of Harness Engineering, Spec occupies a special position.

More precisely, Spec is where several lines converge inside the harness: memory and knowledge are compressed into the problem definition of the current story; people and agents align there; requirements move toward implementation there; analysis hands off to development there.

It is not an isolated document or a pre-development formality. Business goals, user scenarios, design constraints, architectural boundaries, historical decisions, system rules, exception experience, and technical limitations all need to be reorganized in the context of the current story and enter the Spec.

Raw requirements usually contain ambiguity, omissions, and uncertainty. The work of analysis is to use reading, searching, discussion, clarification, review, and final human judgment to compress that uncertainty into a problem definition that can be implemented, verified, and managed.

Spec is also the interface where people and agents work together. People express direction, boundaries, trade-offs, and acceptance criteria there. Agents read tasks, decompose responsibilities, enter implementation, generate tests, and receive review from there. Spec is both a human work interface and an agent work interface.

Under the discipline of Spec-Driven Development, code starts here. Code should not come from the agent's immediate guesses, nor from humans bypassing Spec with temporary code changes. It should come from a problem definition that has gone through analysis, review, and confirmation.

That is why, in enterprise applications, the most important thing before letting AI truly begin implementation is not making it write code faster. It is making sure it truly understands the problem first.