Workflow: The Skeleton of Harness Engineering
This is the second article in the series "Practices and Reflections on Harness Engineering in Enterprise Applications."
Why an Agent Should Not Start Implementing Immediately
In many vibe coding sessions, the path looks familiar: give an agent a vague requirement, let it think for a while, and then watch it generate code continuously. After a long wait, it may return something that looks genuinely usable: the page opens, the flow runs, and the feature more or less works.
That experience can create a powerful illusion: if the model is strong enough, a requirement can go straight into implementation.
But in enterprise applications, that path often creates problems. What the agent builds may contain a large amount of guessing, invention, and self-directed design. It may not be obviously wrong. It may even look complete, reasonable, and usable. But it may still not be what the business actually needs, and it may not respect the boundaries, rules, and collaboration patterns of the existing system.
This is one of the problems software engineering has always tried to solve. Real development rarely jumps directly from a vague idea to code. Instead, it moves through a workflow: large business goals, product intents, or use cases are gradually decomposed into engineering units that can be understood, tracked, collaborated on, developed, managed, and verified.
Those stages are not empty ceremony. They turn a complex requirement from "what is wanted" into "what needs to be done," "who is responsible," "how far it should go," and "how completion will be judged." Once a requirement is decomposed, collaboration has boundaries, development has an entry point, verification has a basis, and management has something concrete to manage.
The same applies to AI harnesses. They are not meant to bypass software engineering, nor to invent a new management vocabulary from scratch. They reapply the decomposition, transition, confirmation, and verification mechanisms of workflow to AI agents. The point of workflow is not to slow the agent down. It is to make sure the agent enters the right problem structure before it accelerates.
Plan: Turning Long-Term Goals Into Governed Scope
In traditional software engineering, once a requirement grows close to the size of an epic, progress tracking becomes necessary. At that scale, the work can no longer be completed by one person, one or two commits, or one simple verification step. It usually contains multiple features, several rounds of implementation, repeated feedback, and a span of time across multiple contexts.
An AI harness needs a similar upper-level unit of work. It needs a container that can pull a long-term goal out of an open-ended idea and give it boundaries, state, and traceable progress. That container cannot be so large that it becomes only a vision, and it cannot be so small that it is merely an ordinary task. It should be just large enough to hold an engineering goal that can be decomposed, advanced, and verified.
In this practice, that unit is called a plan.
The choice not to directly use the word epic is not about inventing a new concept for its own sake. It is about avoiding a mechanical copy of traditional project-management language. A plan emphasizes "a goal scope that needs to be continuously advanced and governed." It may correspond to a set of features, an architectural change, a migration effort, or a phase of engineering work.
A plan is also the main scale at which a spec applies. In other words, a plan is not merely a wrapper around a task list. It usually carries a larger engineering intent that needs to be described, decomposed, implemented, and verified.
But a plan is not the smallest unit an AI agent acts on directly. Its more important role is to create, organize, and manage a set of stories. It defines which goal those stories belong to, how the overall work is progressing, which stories are complete, and which are still being analyzed, implemented, reviewed, or paused.
Put differently, a plan does not manage a linear to-do list. It manages a set of execution units organized around a shared goal. Through the plan, the long-term goal gains a boundary. Through stories, that goal becomes work that can actually enter the workflow.
Story: Turning Intent Into an Executable Engineering Unit
If a plan organizes a long-term goal into a governed scope, a story breaks that scope down into engineering units that can enter implementation, review, and verification.
The concept of story did not appear out of nowhere. It comes from the story card in software engineering practice: a unit of work small enough to be understood, discussed, implemented, and accepted.
In an AI harness, a story inherits the core of that concept. It still carries requirements, enhancements, acceptance criteria, scope boundaries, priority, dependencies, and necessary context. Its purpose is also similar: to turn a larger engineering intent into a unit of work that can actually enter development.
But story needs to be reshaped for AI agents.
Traditional stories often include story points or human days. Those measurements are built around human team capacity, scheduling, and cost. They are meaningful in team management, but they are not the most important information for an AI agent's execution. An agent does not understand a task through human-day estimates, and a smaller story point value does not automatically make a task more reliable for the agent.
At the same time, stories in an AI harness add content that is not always explicit in traditional story cards, especially analysis results from the spec or requirement description. Whether the requirement is clear enough, whether ambiguities remain, whether anything is missing, and whether the story is ready for implementation all need to be recorded at the story level.
This makes a story more than a container for "what to do." It becomes a place where analysis results are preserved. Before entering implementation, the AI agent needs to understand the problem at the story level: what the goal is, where the boundary is, which assumptions it must not make on its own, which questions still need to be asked, and which acceptance criteria remain unclear.
Those analysis results can be highly specific. For a frontend UI story, they may go down to button color, size, spacing, border radius, shadow, interaction states, and responsive behavior. For a business workflow story, they may cover field validation, permission branches, error messages, and boundary data. If the information affects implementation or acceptance, it belongs in the story.
For an AI agent, the key purpose of a story is not to estimate effort. It is to define a clear task boundary: what problem the story solves, what scope it must not cross, what context it depends on, how the result will be judged, and where the work should return if it fails.
This is why stories in an AI harness focus more on being understandable, executable, reviewable, and verifiable. They are not there to measure how much time a human will spend. They are there to tell the agent what the current task is, how far it should go, what development needs to know, and how completion will be proven.
In that sense, a story is not merely an item under a plan. It is the engineering unit that actually enters execution inside the workflow. But whether the unit is a plan managing scope or a story carrying execution, anything inside a workflow needs a way to express which stage it is currently in.
Status: Putting Work Into the Right Stage
Plans and stories both need a way to express where they are in the workflow. That mechanism is status.
Status is not a decorative label, nor merely a field for UI display. Its purpose is to give humans and agents a shared understanding of the current stage of work: what should happen now, what should not happen, where the work may go next, and when it must stop.
The most natural statuses are todo, implementing, and completed. A unit of work has either not started, is in progress, or has finished. For a simple one-off task, those three statuses may seem sufficient.
But in enterprise applications, and especially in collaboration with AI agents, they quickly become insufficient.
First, analysis and implementation must be separated. A story should not jump directly from todo to implementing, because vague requirements should not immediately lead to code changes. It needs to enter analyzing first: reading context, understanding the requirement, discovering ambiguity, filling gaps, forming an approach, and deciding whether the story is ready for implementation.
This separation is also consistent with spec-driven work. A spec or requirement description does not automatically become an executable task. There must be an analysis stage that turns "what the requirement says" into "what implementation needs to know." Only after that stage is complete does implementation have a clear boundary.
Second, although plans and stories both move through the workflow, they care about different things. A plan manages a larger line of work, so its completion is closer to confirming and closing a scope. At the plan level, confirmation and archival matter. A story carries a concrete implementation unit, so its completion depends more directly on review and verification. That is why a story needs a status such as in-review, where the implementation result enters human review instead of being declared complete by the agent itself.
Beyond the normal path, real projects also need exceptional states. Work may be paused because priorities change. It may become not-required because the requirement changes. Completed work may return to analyzing when a new problem is found. These states may look like edge cases, but they are exactly what show that the workflow is dealing with real engineering rather than an always-smooth assembly line.
With status, plans and stories can be paused, resumed, handed off, collaborated on, and managed. When a human or another agent takes over, they do not have to infer progress only from chat history. They can first see the current stage, then read the corresponding context and artifacts.
Status should also have history, not just a current value. The current status answers "where is this now?" Status history answers "how did it get here?" It gives the workflow a time dimension: when analysis started, when implementation began, when work paused, when it reworked, and when it completed. Those records make collaboration, tracking, and retrospection possible.
So the meaning of status is not to label tasks. It is to put work into the right stage. It tells the agent whether it should analyze, implement, wait for review, pause, or return to an earlier step and understand the problem again.
Human in the Loop: Authority and Responsibility Must Match
In an AI harness, humans do not appear only at a few approval points. Humans should participate at every stage: choosing the next unit of work, adjusting priority, adding context, interrupting a wrong direction, requesting re-analysis, confirming implementation results, and deciding whether to pause or abandon work.
The basic principle is simple: authority and responsibility must match.
Humans are ultimately responsible for the system, so humans must hold the highest authority in the workflow. That authority appears mainly in two places.
The first is the human gate at critical status transitions. Agents can advance many routine steps, but some gates cannot be opened by the agent itself. In particular, transitions such as analyzing → implementing and in-review → completed must be confirmed by a human.
analyzing → implementing is a direction gate. At this stage, the human needs to read the requirements, enhancements, acceptance criteria, spec-related content, and analysis results preserved in the story. The human needs to decide whether the requirement is clear enough, whether the boundary is explicit, whether ambiguity has been removed, whether omissions have been filled, and whether the acceptance basis is valid. Only after those questions are confirmed should the agent enter implementation.
in-review → completed is a quality gate. At this stage, the human needs to read the implementation report, review report, and test report, and must directly verify functional behavior. Reading code is optional when necessary, but human acceptance should not become primarily a code-reading exercise. For one story, or a group of related stories, the human needs to make a product-style acceptance judgment: whether the implementation matches the intent, whether the problem is truly solved, whether the risk is acceptable, and whether the result can be closed.
The human gate must be hard because it depends heavily on human experience, judgment, and responsibility. The agent needs to prepare materials, summarize risks, run checks, and fix issues, but it cannot grant final authorization on behalf of the human. Otherwise, the system would allow the executor to approve its own direction and quality.
The second place is exceptional status transitions. Routine progress along the happy path can be assisted by agents, and under clear rules it may even be advanced automatically. But exceptional transitions must be human decisions: pausing a plan, abandoning a story, pulling completed work back into analysis, sending review findings back to implementation, or deciding that something which seemed necessary no longer needs to be done.
Exceptional transitions usually mean that the goal, priority, scope, or quality judgment has changed. These are not merely execution issues; they are responsibility issues. The agent can detect the exception, explain it, and suggest next steps, but the final decision must belong to the human.
Human in the Loop, then, is not about making humans mechanically click confirmation buttons, nor about slowing AI down. It is about keeping authority, responsibility, and judgment aligned inside the workflow: the agent executes and prepares, while the human directs, authorizes, and makes the final judgment.
Back to Harness: Workflow Is the Skeleton for AI Entering the Engineering Process
This brings the discussion back to the original question: why should an agent not start implementing immediately after receiving a vague requirement?
Not because the agent is weak, and not because workflow is meant to restrain AI capability. The opposite is true. The stronger the capability, the more it needs to be placed inside an engineering process that can confirm goals, calibrate direction, manage scope, preserve context, and verify results.
Plans give long-term goals boundaries. Stories turn those goals into engineering units that can be executed, reviewed, and verified. Status tells each unit where it is. Human in the Loop keeps direction, authorization, and final judgment in human hands. Together, they form the workflow that makes an AI harness useful.
Workflow is not a shackle. It is the steering wheel, dashboard, and braking system. It tells the agent where to go, and it lets humans judge whether the direction has drifted. It turns AI speed into leverage instead of letting it accelerate down the wrong path.
The workflow in Harness Engineering is therefore not a simple replay of traditional process management in the AI era. It reorganizes the decomposition, transition, confirmation, and verification mechanisms that software engineering has accumulated over time into an execution structure suitable for AI agents participating in real development.
If the goal of a harness is to let AI do the right thing under real engineering constraints, then workflow is its most fundamental skeleton.