Agents: Boundaries of Responsibility, Not Personas

2 阅读8分钟

Agents: Boundaries of Responsibility, Not Personas

This is the sixth article in the series "Practices and Reflections on Harness Engineering in Enterprise Applications."

What Is an Agent, Really

In Harness Engineering practice, agents appear across many stages: analyzing requirements, shaping specs, implementing code, reviewing code, verifying behavior, coordinating state, and preparing reports. They are not decorative role names. They are an important part of making work decomposable, executable, transferable, and traceable.

But because the word "agent" is now used everywhere, it is worth clarifying what an agent is not in this context.

First, it is not simply the default agent provided by an AI coding tool or a general agent framework. Claude Code, Cursor, OpenClaw, and similar tools can provide an interaction surface, an execution environment, and a set of baseline capabilities. But that does not make them engineering agents inside a harness. They do not naturally understand a project's workflow, how the project manages context through document engineering, how specs are designed, how status flows, how reports are recorded, or how different agents hand work off to one another. Without that project and practice context, they are not yet harness-level engineering agents.

An agent is also not just a prompt-based persona. A common pattern in Prompt Engineering is to begin with something like "You are a senior software engineer." That can influence tone, attention, and response style, but it is not enough to define an agent. An agent needs stable context, available tools, a clear responsibility, defined inputs and outputs, and handoff artifacts. Without those, the so-called agent is only a posture inside a conversation, not an executor inside an engineering process.

An agent is also not a one-to-one mapping of a real-world job title. A BA Agent is not a business analyst. A TL Agent is not a technical lead. A QA Agent is not a full replacement for a QA role. Real-world roles include organizational responsibility, judgment, authority, and accountability. An agent can only take on the parts of that work that can be engineered, documented, and handed off.

In Harness Engineering, an agent is a reusable executor defined around a real responsibility. It is defined by responsibility, context, tools, capabilities, inputs, outputs, and handoff rules. The question is not whether it resembles a person, but whether it can reliably take on a class of work inside a workflow and produce something the next step can use.

Agents Grow Out of Real Problems

Agents are not created by drawing an org chart and filling in roles. In practice, the path usually runs in the opposite direction: a certain kind of problem keeps appearing during development, a responsibility around that problem becomes stable, and only then does an agent emerge.

The Dev Agent grows out of writing code.

When AI enters software development, the most natural entry point is writing code. Whether the requirement begins as a vague prompt or as a stricter spec later in the process, someone still has to turn that requirement into code. This responsibility stabilizes first, so the Dev Agent appears first.

The Dev Agent is responsible for implementing functionality from requirements, modifying code, completing necessary test implementation, recording development progress, and binding important code changes to commits. Its context mainly comes from the project itself: language, tech stack, architectural boundaries, database, network APIs, page layout, team conventions, and existing code structure. Its tools are also close to those used by traditional developers: language and framework tools for development, compilation, build, and runtime, plus git.

This also shows how agents differ from human roles. In a real team, every developer has different experience, habits, and style. In agentic development, multiple Dev Agents are multiple execution instances generated from the same responsibility definition. They may work on different tasks, or work in parallel or in sequence when a task is too large, but their responsibility boundary, context requirements, tool set, and output format are consistent.

The QA Agent and Code Reviewer Agent grow out of feedback.

It quickly becomes clear that development cannot prove its own correctness. A Dev Agent can implement functionality and complete tests, but testing responsibility should not fully belong to the Dev Agent. It also should not be pushed back into the main agent, because testing requires independent knowledge, perspective, and tools. This is where the QA Agent appears.

The QA Agent focuses on whether behavior matches requirements and acceptance expectations. In stricter practice, that source of truth is narrowed down to the spec. It needs to understand the testing pyramid, the testing quadrants, test doubles, test design principles, test data design, acceptance paths, and real user behavior. Its tools are more verification-oriented: test frameworks, Docker, Playwright, curl, database tools, and anything else needed to start services, construct data, and verify results.

But testing alone is not enough. Tests are better at answering whether behavior is correct. They do not necessarily answer whether code is good. Does the code contain code smells? Does it follow team conventions? Does it break architectural boundaries? Does it introduce unnecessary complexity or hidden risk? These questions require a Code Reviewer Agent.

The Code Reviewer Agent is not there to reimplement the feature. Its job is to make the code withstand engineering review. It needs to understand design principles, project architecture rules, team coding conventions, common code smells, and risk patterns. It can use lint, typecheck, static analysis, and custom inspection scripts as supporting tools. Tools can expose problems, but review itself comes from responsibility and standards, not from tool output alone.

The QA Agent and Code Reviewer Agent do not exist to negate the Dev Agent. They exist to shorten the feedback loop and make implementation results face different kinds of scrutiny as early as possible.

The BA Agent and TL Agent grow out of analysis.

When problems keep surfacing only after implementation, the feedback loop is still too long. Many deviations should not wait until code is written. They should be visible before implementation begins: whether the requirement is clear, whether business rules are complete, whether interaction boundaries are explicit, whether the existing system can support the change, whether the architecture is feasible, and whether risks have already been identified.

At this point, analysis becomes a stable responsibility, and the BA Agent and TL Agent emerge.

The BA Agent focuses on business understanding. It helps humans organize business goals, user paths, rules, exceptions, permissions, acceptance criteria, and potential ambiguity. It turns scattered information from prototypes, documents, discussions, and system behavior into an analyzable problem definition.

The TL Agent focuses on technical understanding. It helps humans read code, map architectural boundaries, identify dependencies, understand APIs and data flow, and determine how a requirement can land in the current system and which risks need to be exposed early.

Their tools and context are different from those of the Dev Agent. Analysis may require various MCPs or tools: using a browser to read web pages, Google Drive to read product documents, Figma to inspect designs, Slack to read historical conversations, and also reading OpenAPI definitions, database structure, and existing code. Their goal is not to make decisions for humans, but to help humans read widely, understand clearly, and organize scattered information.

The Spec Reviewer Agent grows out of repeated checks.

Even with help from the BA Agent and TL Agent, the human work in analysis remains heavy. Humans still need to make direction judgments, scope trade-offs, and final confirmations. But some checks can be structurally delegated: whether the spec is clear, complete, consistent, unambiguous, implementable, verifiable, and free of missing critical boundaries.

These recurring checks fit the Spec Reviewer Agent. It reviews the problem definition, not the code. It cannot pass the human gate for humans, but it can expose missing pieces, conflicts, and ambiguity before the final human confirmation.

The PM Agent grows out of workflow complexity.

Once workflow, plans, stories, statuses, specs, reports, and multiple agents are all moving at the same time, a new problem appears: who chooses the next piece of work, tracks status, summarizes reports, decides which agent should be invoked, and ensures handoff artifacts are not lost?

This is where the PM Agent appears. Its responsibility is not to manage project direction for humans, but to help humans manage workflow and agent collaboration: read plan/story status, understand current progress, summarize context and reports, suggest next steps, and coordinate inputs and outputs across agents. It needs to understand the process, project knowledge, progress state, plan/story structure, and the responsibilities and capabilities of other agents. One of its most important capabilities is dispatching the right agent at the right moment; in different tools, that may appear as spawning agents, agent teams, or another collaboration mechanism.

Slash roles let unstable responsibilities grow through practice.

Not every responsibility is clear enough at the beginning to become an independent agent. Some responsibilities only appear faintly at first. The workflow is not stable, tools are incomplete, and context has not yet been distilled. In that stage, an existing agent can temporarily take on the responsibility through a slash role.

For example, PM-like responsibilities may initially be slashed out from the main agent. DevOps-like responsibilities may initially be slashed out from the Dev Agent. Once the responsibility boundary, workflow, context, tools, and handoff artifacts become stable, it can become an independent agent, handbook, or workflow rule.

So more agents is not automatically better, and earlier separation is not automatically better. Whether an agent should stand on its own depends on whether that responsibility truly exists, recurs often, has its own context and tools, and needs stable outputs and handoffs.

Agent Collaboration Is Not Solo Work

The appearance of agents does not mean they run independently from humans, nor does it mean each agent only works in its own little box. In real harness practice, agents need to collaborate with other agents, and agents need to collaborate with humans.

Collaboration starts with clear responsibilities and no overstepping.

An agent is an executor inside a responsibility boundary, so it must first stay within that boundary. A QA Agent can verify behavior, design tests, and record bugs, but it should not casually modify code. A Code Reviewer Agent can identify code smells, architectural risks, and convention violations, but it should not reimplement the solution. A TL Agent can analyze technical boundaries and implementation risk, but it should not take over business understanding from the BA Agent. A BA Agent can organize business rules and acceptance criteria, but it should not make scope trade-offs on behalf of humans.

Agents also cannot make final judgments for humans. Whether the direction is right, whether the scope is acceptable, how abnormal status transitions should happen, and whether a story is complete must be confirmed by the human in the loop. Conversely, the fact that humans have the highest authority does not mean they should bypass the workflow. Manually patching code, directly changing tests, or bypassing the spec and reports to fix implementation breaks the evidence chain in the harness and makes the work less explainable, less verifiable, and less traceable.

Collaboration also means helping one another, not working in isolation.

Clear responsibility does not mean isolation. Business rules organized by the BA Agent become input for the TL Agent's technical boundary analysis. Technical constraints exposed by the TL Agent may affect how the BA Agent refines scope and acceptance criteria. Issues discovered by the Dev Agent during implementation need to be reported back to the Code Reviewer Agent, the QA Agent, or the analysis stage. Behavior problems found by the QA Agent should help the Dev Agent determine whether the issue lies in implementation, test data, or a missing requirement.

The same is true between humans and agents. Agents help humans read materials, organize context, prepare reports, start services, construct test data, and answer questions, reducing the cost of verification and judgment. Humans help agents correct direction, confirm decisions, supply experience, and provide final authorization. Agents are not independently replacing human work, and humans are not just waiting on the sidelines. Real collaboration means humans and agents complement one another within their own boundaries.

Collaboration must eventually land in documents.

Natural-language communication can certainly happen. Agents can talk to agents. Humans can talk to agents. But if the communication only stays inside a conversation, it is disposable: the next session may not know it, another agent may not know it, and a future human taking over may not know it either.

So in a harness, conversation can happen, but handoff cannot rely on conversation alone. Any communication that affects implementation, review, verification, status flow, or final judgment must be recorded in documents. Specs, reports, status, commit IDs, handbooks, review records, and test results are all handoff artifacts for collaboration.

This is the biggest difference between agent collaboration and ordinary chat. Agents and humans can help each other through natural language, but collaboration only becomes sustainable when that help is recorded and turned into an engineering artifact the next step can depend on.

Back to Harness: Agents Are Executors Inside the Workflow

Back to harness: the value of an agent is not whether it resembles a person, but whether it can reliably take on a class of responsibility inside a workflow.

The clearer the responsibility boundary, the easier the agent is to dispatch, hand off, and review. The clearer the context, the easier it is for the agent to focus on the current task. The better the tools fit, the easier it is for the agent to turn responsibility into concrete action. The clearer the output, the easier it is for other agents and humans to continue the work. An agent is not a model given a persona. It is a dependable executor inside a workflow.

This is also why agents ultimately return to humans. A harness is not designed to let agents replace humans. It is designed to help humans steer agents more easily and more reliably. It gives repetitive, heavy, engineerable work to agents; records the process in documents; returns the result to the workflow; and frees humans from details, state tracking, and mechanical checks.

What truly needs human attention is direction judgment, scope trade-off, risk evaluation, final confirmation, and responsibility for the long-term evolution of the system. The clearer the boundaries of agents are, the more humans can focus on the work that is genuinely high-value.