Agent-First Engineering
The thesis behind tx-agent-kit. Agents implement, humans steer
tx-agent-kit is built on a specific thesis about how software gets built in the age of capable AI agents. This is not a philosophical abstraction. It is a concrete operating model that shapes every decision in the repository.
The thesis
Software engineering is splitting into two distinct activities:
- Steering: deciding what to build, why it matters, and what "done" looks like.
- Implementing: writing the code, running the tests, iterating until the checks pass.
Humans are better at steering. Agents are better at implementing. The best results come from clear separation of these roles with well-defined interfaces between them.
Origins
This approach is inspired by OpenAI's Harness Engineering post (February 2026), which argued that the role of the engineer is shifting from writing code to building the harness: the scaffolding of docs, linters, tests, and scripts that enable agents to operate effectively.
tx-agent-kit takes this idea and makes it concrete. The repository is designed from the ground up to be the harness: every architectural constraint is mechanically enforced, every convention is encoded in a check, and every workflow is documented in the repo itself.
What this means in practice
When you work with tx-agent-kit, the workflow looks like this:
- You define intent. Write acceptance criteria, describe the domain, specify the behavior.
- The agent implements. It reads
CLAUDE.md, follows the DDD construction pattern, runs the scaffold CLI, writes the code. - Mechanical checks validate. ESLint rules, structural invariants, type checks, and tests catch violations.
- The agent iterates. If checks fail, the agent reads the error, fixes the code, and re-runs.
- You review and accept. The final code has passed all mechanical checks. You review for intent alignment.
The key insight is that step 3, mechanical enforcement, is what makes this work. Without it, the agent is guessing at conventions. With it, the agent has a concrete feedback loop.
The improvement cycle
When an agent fails repeatedly at a task, the correct response is not to write the code yourself. It is to ask: why did the agent fail, and what scaffolding would prevent that failure?
This creates a self-reinforcing loop where each failure strengthens the harness:
| Agent failure | Harness improvement |
|---|---|
| Repeats the same coding mistake | Add a linter rule that catches it |
| Uses the wrong import path | Add a structural invariant check |
| Skips a setup step | Add it to the scaffold CLI |
| Misunderstands a convention | Document it in CLAUDE.md |
Over time, the repository becomes a better and better harness. Each failure makes future agents more effective.
Comparison with traditional approaches
| Traditional | Agent-First |
|---|---|
| Conventions in wiki pages | Conventions enforced by lint rules |
| Architecture in slide decks | Architecture encoded in invariant checks |
| Onboarding in pair programming | Onboarding in CLAUDE.md and scaffold CLIs |
| Code review catches style issues | Linters catch style issues before review |
| Tribal knowledge about "how we do things" | Mechanical knowledge in the repo |
The agent-first approach is strictly better even if you never use an AI agent. Mechanical enforcement benefits human developers equally. It just happens to also make agents effective.