IID.systems
ProfileServicesFormal MethodsAI AlignmentEssaysBookSchoolGitHub日本語
日本語

What If There Were No Specification: TDD and Its Hidden Premise

What happens when the same sort is built with test-driven development — the cognitive merit of examples, the premise of shared culture, and the multi-agent era

Building the same sort without a specification

The previous two pages descended from an implicit specification down to design, implementation, and tests. Now consider the opposite world: no specification is written, and the same sort is built with test-driven development (TDD). TDD runs the cycle "write a failing test → write the minimal implementation that passes → clean up" (Red-Green-Refactor). In place of a specification document, tests — that is, examples — come first.

// TDD: start from examples instead of a specification test('empty stays empty', () => { expect(sort([])).toEqual([]); }); test('single element stays put', () => { expect(sort([5])).toEqual([5]); }); test('sorted input is unchanged', () => { expect(sort([1, 2, 3])).toEqual([1, 2, 3]); }); test('reversed input gets sorted', () => { expect(sort([3, 2, 1])).toEqual([1, 2, 3]); }); test('duplicates are kept', () => { expect(sort([3, 1, 3])).toEqual([1, 3, 3]); }); test('negative numbers are handled', () => { expect(sort([-1, 5, -10])).toEqual([-10, -1, 5]); });

As tests accumulate, the implementation seems to be steered toward "correct sorting". But what this suite pins down is only a finite set of points — in sharp contrast to the implicit specification's postcondition, which spoke about all inputs.


The implementer is inferring the true specification

Look at the following implementation. It passes all six tests above.

// This implementation passes all six tests above function sort(l) { const r = [...l].sort((a, b) => a - b); return r.length <= 3 ? r : r.slice(0, 3); // truncates anything longer than 3 (!) }

It looks absurd. Yet by the criterion "all tests pass", nothing rules it out. However thick the test suite grows, the situation does not fundamentally change: infinitely many implementations satisfy it. The reason a real implementer — human or AI — does not write this is that they infer from the projection that is the test suite that the author must mean sorting, reconstruct the true specification in their head, and write toward it.

This reconstruction works because the test author and the implementer share the same culture and common sense. The intent is gleaned from the name sort; the handling of boundaries is guessed from the choice of typical examples; the intent to preserve elements is read from the duplicates test. TDD — and communication of intent by examples in general — rests on this implicit sharing.


Examples still have rational merit

TDD did not spread by accident: examples are overwhelmingly easy for humans to understand. Reading post IsPermutation(r, l) and IsOrdered(r) takes familiarity with universal quantifiers, permutations, and sets; that sort([3, 1, 3]) yields [1, 3, 3] can be read by anyone.

Within a group that shares culture and common sense, examples keep the cognitive load of the majority of humans far lower than unintuitive abstractions like set theory and universal quantifiers. In an environment where humans write the code and humans review it, this is exactly what TDD optimized for.

AspectExamples (tests)Predicates (formal spec)
Background the reader needsAlmost none — readable through culture and common senseBasics of logic and set theory
What can be pinned downA finite set of pointsAll inputs
Conveying intentDepends on shared culture and common senseDepends only on the definitions of the symbols
Cognitive load for most humansLowHigh
Connection to machine verificationPoint-by-point equality checksFoundation for universal verification and proof

In multi-agent development, the premise collapses

To an AI agent, a formal predicate is not a "hard-to-read abstraction". The cognitive-load argument is specific to human cognition and does not carry over to machine-to-machine cooperation. The greatest merit examples had stops working in an environment where both writer and reader are AIs.

Meanwhile, the premise TDD stood on — shared culture and common sense — is not guaranteed in an environment where agents of different models, different vendors, and different versions cooperate. Cooperation that relies on inference breaks silently the moment the inferences diverge. And because tests only look at finitely many points, the divergence slips through the gaps between them.

With a formal contract, what the contract means is fixed uniquely by the definitions of the symbols alone. Whether the intent has been captured correctly in the specification — the question from part 1's "forgetting the permutation condition" — remains; but the contract itself, as the previous two pages showed, connects mechanically to runtime checking, proof obligations, and specification-derived tests.


Methods stand on premises

Nevertheless, the industry remains strongly pulled toward development methods from the era when humans wrote the code. Tools, organizations, and best practices have all been polished toward one goal: minimizing human cognitive load. When the writer shifts to AI, that objective function itself changes.

TDD was not wrong. The premise TDD optimized for is what changes. When premises change, the optimal method changes with them — from inferring intent through examples to fixing intent through contracts. This is why multi-agent development needs formal contracts.


Summary: from inference to contracts

The specification fixed the what (part 1); it was traced into design, implementation, and tests (part 2); and development without a specification turned out to rest on the implicit premise of shared culture and common sense (part 3). Between humans, that premise mostly holds. But in development where heterogeneous agents cooperate, the foundation of cooperation is not inference — it is contracts.