From Specification to Design
Add size and time requirements to the sort specification, then walk the whole path: design investigation, design document, implementation, and specification-derived tests
Refining the specification: a precondition and a non-functional requirement
The Sort specification on the previous page fixed what output is acceptable, but said nothing about input size or allowed execution time. In real development, such non-functional requirements are exactly what drive design. So we first pin down the expected input size in a precondition (pre): here, sequences of length up to 10^7.
The other requirement — "execution time within 5 seconds" — cannot be written in VDM-SL directly, because VDM-SL functions have no notion of time. Rather than dropping a requirement we cannot formalize, we attach it to the specification as a comment and carry it into the design phase. (Treating time itself formally is the territory of the real-time extension VDM-RT, or of timed automata as in UPPAAL.)
Sort (l : seq of int) r : seq of int pre len l <= 10**7 post IsPermutation(r, l) and IsOrdered(r); -- Non-functional requirement (execution time cannot be expressed in VDM-SL): -- for inputs with len l = 10**7, execution time must be within 5 seconds.
A precondition is a promise the caller must keep. Putting len l <= 10^7 in pre makes the contract's domain explicit: the implementation owes nothing for longer inputs. The time requirement in the comment is outside the reach of formal verification, but it remains part of the specification document and is traced into design and testing.
Design investigation: can the specification be met?
Design means deciding the how that the specification left open. Is bubble sort good enough, or do we need quicksort? Is C++ required, or is JavaScript sufficient? Against the requirements (n = 10^7, within 5 seconds), we first estimate comparison counts.
| Algorithm | Order of comparisons | Estimate at n = 10^7 |
|---|---|---|
| Bubble sort | O(n²) | n²/2 ≈ 5×10^13 |
| Quicksort / merge sort | O(n log n) | n log₂ n ≈ 2.3×10^8 |
The gap is roughly a factor of 200,000. A CPU handles on the order of 10^8–10^9 simple operations per second, so already at this point we can tell: bubble sort is hopeless in any language, while O(n log n) is within reach of every mainstream language. Let us back the estimate with measurements.
Reference measurements
Reference values on a Node.js v26 / Apple Silicon development machine. Measured times depend on the environment, so the design document records the measurement conditions alongside the numbers.
| Approach | Measured time |
|---|---|
| Int32Array(10^7).sort() (typed array, numeric sort) | ≈ 0.15 s |
| Array(10^7).sort((a, b) => a - b) (plain array + comparator) | ≈ 1.6 s |
| Bubble sort (0.35 s measured at n = 3×10^4, extrapolated by n²) | ≈ 3.9×10^4 s ≈ 11 hours |
Two conclusions. First, the algorithm is the decisive factor — bubble sort falls short by four orders of magnitude, and no choice of language can buy that back. Second, with an O(n log n) algorithm, JavaScript is sufficient (0.15–1.6 s, far below 5 s). C++ would add headroom, but this requirement does not demand it: the language can be chosen on context — the team's stack and the deployment environment — rather than performance.
Writing the design document
The investigation is then recorded as a design document. One of the most widely used formats today is the Design Doc popularized at Google and elsewhere — written in natural language, gathering background, goals and non-goals, the design, alternatives considered, and a test plan into a single document. ADRs (Architecture Decision Records) are also popular for recording individual decisions in a lightweight way; here we show a sample in Design Doc form, which gives the full picture at once.
Design Doc: Large-scale integer sequence sort module
Summary
Design of an implementation satisfying the VDM specification Sort (pre: len l <= 10^7; post: permutation and ascending order; non-functional: 10^7 elements within 5 seconds). Uses the standard library's O(n log n) sort on a typed array.
Background
The contract of the sort operation is fixed as a VDM-SL implicit specification. The algorithm and the implementation language are not constrained by the specification; they are selected in design so as to meet the non-functional requirement (execution time).
Goals / Non-goals
Goals: satisfy the postcondition (permutation, ascending). Process inputs of n = 10^7 within 5 seconds. Explicitly reject inputs violating the precondition.
Non-goals: stable sorting (not required by the specification). Inputs larger than 10^7. Parallelization and external-memory sorting.
Design
Algorithm — the theoretical lower bound for comparison sorting is Ω(n log n). Bubble sort, O(n²), needs about 5×10^13 comparisons at n = 10^7 — about 11 hours by measured extrapolation — and is rejected. Since the specification fixes no algorithm, we adopt the proven standard-library sort (in V8, Array#sort is TimSort and the default typed-array sort is a C++ introsort — both O(n log n)).
Language — reference measurements show JavaScript is sufficient: Int32Array.sort() takes about 0.15 s and Array.sort with a comparator about 1.6 s, both far below 5 s. C++ would add headroom but is not required here; we adopt JavaScript to match the operational stack (Node.js).
Data representation — map VDM's seq of int to an Int32Array (assuming a 32-bit integer range; recorded explicitly as part of the correspondence with the specification). This avoids boxed elements and comparator overhead.
Alternatives considered
Bubble sort — rejected. Four orders of magnitude short on speed; no language choice can recover that.
Hand-written quicksort — rejected. The standard implementation already meets the requirement, so the O(n²) worst case and the risk of implementation bugs buy nothing.
C++ (std::sort) — deferred. Ample performance headroom, but a mismatch with the operational stack. Revisit if the requirement grows toward n = 10^8.
Test plan
Property-based tests using the postcondition as an oracle. Boundary tests for the precondition (lengths 0, 1, 10^7) and rejection tests for violating inputs. A performance test corresponding to the time requirement (n = 10^7 within 5 seconds).
Risks
Inputs containing values outside the 32-bit range (feed the value range back into pre and add input validation). Performance variation across environments (record measurement conditions; run the performance test in a dedicated CI job).
The point is that every decision in the design document traces back to a line of the specification. The algorithm choice is driven by the time comment (not by the postcondition), the data representation by seq of int, and the input guard by the precondition.
Implementation
The design document's decisions now become code. Because the specification fixes no algorithm, the implementation is a thin wrapper around the standard library's sort. The precondition surfaces as an input guard.
// Design decision: the standard library's O(n log n) sort on a typed array function sortLargeSequence(input) { // pre: len l <= 10**7 — the precondition becomes an input guard if (input.length > 10 ** 7) { throw new RangeError('precondition violated: len l <= 10^7'); } // map seq of int to Int32Array (32-bit ints assumed — recorded in the design doc) const r = Int32Array.from(input); r.sort(); // typed-array sort: numeric ascending, O(n log n) return r; }
The specification's pre becomes the guard at the top, and the design document's data-representation decision becomes Int32Array.from. Every line of the implementation points back to either the specification or the design document. Note that the 32-bit range assumption is a narrower contract than the specification's seq of int; strictly it should be fed back into the specification as an addition to pre — traceability running in the reverse direction, with a design decision refining the spec.
Verifying the code with specification-derived tests
Finally, the finished code is verified by returning to the specification. Tests can be derived from it mechanically. First, the two predicates of the postcondition are transcribed directly as a test oracle.
// Transcribe the two postcondition predicates directly as a test oracle const isOrdered = (r) => r.every((v, i) => i === 0 || r[i - 1] <= v); const isPermutation = (r, l) => { if (r.length !== l.length) return false; const count = new Map(); for (const x of l) count.set(x, (count.get(x) ?? 0) + 1); for (const x of r) { const c = count.get(x); if (!c) return false; count.set(x, c - 1); } return true; };
Using this oracle, property-based tests (e.g. with fast-check) check the postcondition against randomized inputs. The precondition becomes a guard test, and the time requirement in the comment becomes a performance test as-is. Note that fc.integer()'s default range is the 32-bit integers, in line with the data-representation assumption.
// post → property-based test: check the postcondition on random inputs test('post: IsPermutation(r, l) and IsOrdered(r)', () => { fc.assert( fc.property(fc.array(fc.integer()), (l) => { const r = [...sortLargeSequence(l)]; return isOrdered(r) && isPermutation(r, l); }) ); }); // pre → boundary: exactly 10^7 accepted, beyond rejected test('pre: len l <= 10^7', () => { expect(() => sortLargeSequence(new Int32Array(10 ** 7))).not.toThrow(); // an array-like with only length avoids allocating a huge array expect(() => sortLargeSequence({ length: 10 ** 7 + 1 })) .toThrow(RangeError); }); // time comment → performance test test('perf: 10^7 elements within 5 s', () => { const l = Int32Array.from({ length: 10 ** 7 }, () => (Math.random() * 2 ** 31) | 0); const t0 = performance.now(); sortLargeSequence(l); expect(performance.now() - t0).toBeLessThan(5000); });
Tracing specification to tests
| Element of the specification | Derived test |
|---|---|
| pre len l <= 10^7 | Input-guard boundary tests (exactly 10^7 accepted, beyond rejected; short lengths such as 0 and 1 are generated by the property test) |
| post IsPermutation(r, l) and IsOrdered(r) | Property-based tests with the postcondition as oracle |
| Comment: 10^7 elements within 5 seconds | Performance test (measure n = 10^7 and assert under 5 s) |
| A guarantee over all inputs | Beyond the reach of testing — the territory of proof obligations and proof, as on the previous page |
One honest caveat. What tests show is that the postcondition held for the inputs that were tried — that is not a proof. Property-based testing narrows the gap considerably, but if a guarantee over all inputs is needed, the road leads to discharging proof obligations.
Summary: end-to-end traceability
The specification fixes the what; the design investigation chooses the how with numbers; the design document records the decisions and their grounds; the implementation mirrors the design; and the tests return to the specification to verify the result. Every artifact traces back to the specification — that is the backbone of development that starts from a formal specification. IID's formal-agent-contracts plugin applies this same flow to inter-agent contracts in multi-agent development.