Vibe Coding, Goodbye: why specifications have become the backbone of AI-assisted development

In February 2025, Andrej Karpathy posts on X what will become a defining phrase: “vibe coding” — the art of giving a vague prompt to an AI agent and watching the code appear. He’s upfront about enjoying it. And for throwaway prototypes and side-projects without specific requirements (security, longevity, evolvability, performance) it’s a surprisingly productive experience.

The problem surfaces when you try the same thing on something you want to maintain for the long haul. I’ve seen teams let an agent run for an hour on a feature defined in two lines of prompt. The code compiled. Tests passed. And the architecture had silently drifted from what had been decided two weeks earlier.

AI writes fast. It doesn’t know which direction to head if nobody tells it. But it goes anyway.

TL;DR — Spec-Driven Development turns formal specifications into the single source of truth before a single line of code is written. In the era of autonomous coding agents, it’s the difference between delegating with a clear mandate and delegating blind. Relevant for any team working with coding agents on projects meant to last. The main limitation: the upfront cost of a rigorous spec, which requires expertise to anticipate edge cases before any implementation begins.

Photo: Brooke Cagle on Unsplash

Vibe coding: fast, enjoyable, but hard to steer

The term Karpathy popularized describes an approach where intent replaces specification. You describe what you want in natural language, the agent produces code, you correct on feel. It’s enjoyable. It’s sometimes surprisingly effective.

But vibe coding carries a structural flaw: there’s no contract. The agent interprets, infers, fills in the gaps with what seems consistent based on its training. On an isolated function, that’s rarely a problem. On a feature that touches five files and must respect architectural invariants defined three months ago, it’s a source of silent drift.

Agents like Claude Code or GitHub Copilot CLI plan, execute, and iterate with a level of autonomy that didn’t exist eighteen months ago. That autonomy is precisely why specifications become indispensable: the further the agent acts from human supervision, the more precise its starting point needs to be.

The spec as an executable contract

Spec-Driven Development (SDD) isn’t a new concept. Its roots run through Eiffel’s Design by Contract, the user stories of Extreme Programming, formal specification methods from decades past. What’s new is the context in which it applies.

With AI agents, a well-written specification becomes an executable contract: a document that defines the “what” and “why” before the agent generates the “how.” It contains functional requirements, acceptance criteria, architectural invariants, technical constraints. Everything a human developer absorbs in a planning meeting — but that the agent simply doesn’t have by default.

You can draw a parallel with Object Calisthenics formalized as persistent instructions, which I covered in a previous article: in both cases, you’re encoding rules that guide generation upfront rather than correcting drift after the fact. The specification is the macro version of that principle — not style rules, but architectural decisions and intent.

What changes concretely: architectural drift becomes detectable. If the agent proposes something that contradicts the spec, it surfaces as an exception; it doesn’t quietly slip into the codebase, creating invisible technical and design debt.

Why specifications are indispensable with AI agents?

The short answer: because intent errors cost more than syntax errors.

A LLM fixes a syntax bug in thirty seconds. It doesn’t spontaneously fix an architectural decision that’s absent from its context. And it has no way of knowing that a feature correctly generated in June 2026 conflicts with a constraint decided in March — if that constraint isn’t formalized anywhere.

Three concrete problems that SDD addresses:

Architectural drift: without a spec, every agentic session starts from scratch on decisions already made. The spec carries the memory the agent doesn’t have. At best, the agent deduces from existing code. At worst, it invents a new architecture each session.
Silent regression: LLMs optimize locally. A spec with acceptance criteria validates that the delivered feature matches the original intent — not just that it compiles.
Missing traceability: versioning the spec with the code explains why something was built a certain way, not just how. It’s the same logic as Architecture Decision Records, applied across the entire development process.

How does a SDD workflow work in practice?

The workflow described by GitHub Spec Kit outlines four distinct phases:

Specify: Define functional objectives and acceptance criteria — what the system must do, what it must not do, the expected edge cases.
Plan: Translate intent into technical architecture: stack, responsibility boundaries, integration constraints.
Break down: Divide the plan into atomic tasks, each with its own validation criterion. One task per modified file is a useful heuristic.
Implement: The agent generates code step by step, each implementation validated against the corresponding specification.

What sets this workflow apart from ordinary planning: the spec isn’t a document you write before coding and then forget. It’s the reference point at every stage, including validation. Martin Fowler, in his analysis of SDD tools, notes that the key value lies not in the spec’s format, but in the discipline of consulting it before each implementation decision.

Which tools implement this approach today?

The SDD ecosystem for AI agents is still forming, but three approaches stand out in 2026:

GitHub Spec Kit is an open-source toolkit that structures the SDD workflow for GitHub Copilot, Claude Code, and Gemini CLI. It provides spec templates, initialization scripts, and system prompts adapted to each agent. It’s the most accessible entry point for a team starting from scratch.

Kiro is AWS’s agentic IDE, launched in preview in June 2025. It organizes development around an explicit Requirements → Design → Tasks flow, with a visual interface for managing specifications. The bet: making the spec the first-class layer of the IDE, not a Markdown file you consult in a separate window.

Tessl pushes the principle even further with the “spec-as-source” concept: the specification is the source of truth, and code is a generated artifact derived from it. Regenerate code from the spec whenever intent changes. It’s a radical proposition — and it raises open questions about the place of direct human contributions in the codebase.

These three tools share the same conviction: formalized intent before implementation. The differences lie in the degree of imposed structure and how they integrate with existing tooling.

What SDD is not

Spec-Driven Development is not a promise of perfection — nor an opportunity to reduce headcount. The quality of the spec determines the quality of the generated code; a vague spec produces vague code. Human expertise to anticipate edge cases, formalize constraints, and structure intent is more critical than ever. SDD is not a substitute for human thinking; it’s an amplifier of it.

It’s not a fix for organizational problems either. If architectural decisions aren’t discussed and agreed upon between people, no tool will invent them.

Related reading: AI developer tools became agents · Object Calisthenics as machine contracts · RTK: cutting 80% of tokens from agentic sessions

FAQ

Is Spec-Driven Development suited to small projects?

Yes, as long as you calibrate the formalism level. A solo side-project doesn’t need a thirty-page spec: a twenty-line SPEC.md covering objectives, technical constraints, and main edge cases is enough to guide an agent effectively. The practical rule: the spec should contain what a new developer would need to know to avoid re-making decisions already taken. On an exploratory or throwaway project, vibe coding stays relevant; it’s when you think about maintenance that SDD pays off.

What’s the difference between a SDD spec and a plain README?

A README explains what the project does; a SDD spec defines what the code must do, the constraints it must respect, and the criteria that let you validate it does so. The spec includes architectural invariants (“this service must not access the database directly”), expected edge cases, and testable acceptance criteria. A README is static documentation; a spec is an active contract guiding every implementation decision, including those made by an AI agent.

Is AI-generated code lower quality without a specification?

Not necessarily in terms of syntax or local unit tests — current LLMs generate functional code on isolated tasks without a spec. The problem is systemic. Over time, code generated without a spec accumulates implicit decisions that progressively diverge from the team’s intent. The drift is invisible line by line; it only becomes visible when you step back. The spec doesn’t make code “better” at the line level; it maintains coherence with intent at the project scale.

Should you version the spec alongside the code?

Yes, always. A spec versioned with the code lets you answer “why was this architectural decision made?” by tracing back through git history. It’s the same logic as Architecture Decision Records: decisions have a date, a context, and sometimes an explicitly rejected alternative. Without versioning, the spec is a living document that lies about its own history — and a spec you can’t trace over time is just wishful thinking.