The Problem of Absent Specification in TDD
Why TDD Thrived, and Why It Reaches Its Limits in the AI Era
Why TDD Became Popular — Compatibility with Human Cognition
TDD's widespread adoption owes much to its compatibility with human cognitive patterns. "Given this input, this output should be returned" — this style of concrete example is far more intuitive than writing abstract specifications. For most developers, enumerating specific input/output pairs aligns with natural thought processes more than formally describing mathematical invariants and preconditions.
The characteristics of the domain where TDD primarily took hold — web development — are also significant. Web applications have a relatively high tolerance for defects compared to mission-critical systems. A critical bug can be redeployed immediately. A user refresh resolves the issue. Outside of payment processing and a few other areas, rapid iteration is prioritized over strict correctness.
In this environment, the "write tests quickly and fix things as you go" approach offered better economic returns than "define a perfect specification upfront." TDD's popularity stems less from inherent methodological superiority than from its high practicality within the specific context of web development.
Where Is the Specification? — TDD's Structural Flaw
TDD's principle of "write the test first" sidesteps a fundamental question: what guarantees the correctness of that test?
Test cases are fragments of a specification. But in TDD, that specification is never materialized. It exists only implicitly, fragmentarily, and likely incompletely in the test author's mind. The developer writes tests based on their belief about how "this API should behave" — but that "should" rests on subjective understanding.
The critical point is that tests function as a substitute for the specification itself. In TDD it is often said that "tests are executable specifications," but this conflates two distinct concepts. Tests are verification mechanisms that reflect a portion of the specification — they are not the specification itself. A finite set of input/output pairs merely samples from the infinite input space that the specification defines.
Test quality depends entirely on the test author's knowledge and experience. Experienced developers think of edge cases, but even they cannot test "the cases they didn't think of." When multiple people write tests, their implicit understanding of the specification may differ. Person A understands "empty strings are errors" while Person B understands "empty strings are allowed" — such inconsistencies can quietly coexist within a test suite, undetected.
If a formal specification exists, such contradictions are detectable at the specification level. If VDM-SL states "name must not be empty," any test contradicting this is clearly identified as a specification violation. But in TDD, no mechanism exists to ensure consistency among tests themselves.
Generating Tests from Existing Code — Institutionalizing Bugs
TDD's ideal is "write the test first," but in real-world projects, adding tests to existing code after the fact is extremely common. Legacy code refactoring, inherited projects, or code that simply started without tests — when writing tests in these situations, what does the developer reference?
The answer is "the current implementation." They read the code, understand its behavior, and write tests that reproduce that behavior. This process has a structural defect.
When code contains bugs, that buggy behavior gets embedded in the tests as the "correct specification." For instance, if tax calculation code rounds up when it should round down, the test author writes the rounded-up result as the "expected value." The test passes, but the business rule is wrong.
# Existing code (bug: rounding up instead of down)
def calc_tax(price):
return math.ceil(price * 0.1) # Should be floor
# Test author infers spec from code
def test_calc_tax():
assert calc_tax(105) == 11 # Test passes. But correct answer is 10This test only guarantees that "the code behaves as it currently does" — not that "the code behaves correctly." A green test is not evidence of correctness; it is evidence of status quo preservation.
"We can refactor safely because we have tests" is considered a key TDD benefit. But when tests are based on a buggy specification, "tests pass after refactoring" is synonymous with "bugs have been preserved." Tests function as a safety net only when the specification underlying them is correct.
Why TDD Does Not Fit the AI Era
The structural problems described above were accepted as "tolerable limitations" in human-centered development. The humans writing tests possess domain knowledge, maintain implicit specifications in their heads, and catch inconsistencies through test reviews — human judgment mitigated the problems.
But when AI becomes the center of development, these assumptions collapse at their foundation.
In human teams, domain knowledge is implicitly shared. "This is the industry convention for such calculations" or "This client insists on this specification" — such knowledge is transmitted through conversation and review even without documentation. Between agents of different models, vendors, and versions, there is no guarantee that this sharing holds — and the human team's corrective mechanism of implicit alignment through conversation and review does not operate.
In multi-agent development, multiple AI agents handle different modules. When Agent A handles authentication, Agent B handles payments, and Agent C handles the frontend, the interface specifications between agents must be precisely agreed upon. TDD cannot achieve this agreement. Each agent writes tests based on "its own understanding," and each agent's tests pass individually — but the system fails when integrated. The absence of materialized specification is fatal for inter-agent collaboration.
Tests verify behavior of individual modules; they do not define contracts between modules. "Agent A's output is of this type and satisfies these preconditions." "Agent B accepts this input and guarantees these postconditions." Such contracts should be explicitly defined as formal specifications, not as collections of test cases.
Structural Resolution Through Formal Methods
Formal methods resolve TDD's problems at their root. By writing specifications in a formal language such as VDM-SL, the specification becomes a concrete artifact. Test cases are derived from the specification rather than depending on developer tacit knowledge.
Specification Materialization
Specifications are made explicit in mathematically rigorous notation rather than remaining tacit knowledge. The specification itself becomes a verifiable artifact.
Test Derivation
Test cases are systematically derived from the specification. Strategies can be formed to cover the entire space defined by the specification.
Inter-Agent Contracts
Each agent's interface is formally defined as preconditions, postconditions, and invariants. Contract consistency can be verified before integration.
Prevention of Bug Codification
Because the basis of tests is the formal specification, code bugs do not propagate into tests. Divergence between specification and code is detectable.
TDD was an excellent practical methodology within the specific context of human-driven web development. However, in an era where AI becomes the center of development, system complexity increases, and precise specification agreement between agents is required, an approach premised on specification materialization becomes essential. Formal methods do not negate TDD — they explicitly provide the "existence of correct specification" that TDD implicitly assumed.