Why a Methodology

The strongest argument for this methodology is the methodology itself. Its specification was brainstormed, specified, planned, implemented, verified, and reviewed using the exact lifecycle it describes. The artifact is its own proof: a document that could not have been produced coherently without the process it documents. If you want to know whether the process works, you are reading the output of a run.

The core tension

AI code generation is fast. That speed is its strength and its primary risk at the same time. A model that produces syntactically correct code with proper type annotations, reasonable names, and coherent docstrings produces output that looks correct to a human reviewer — even when it is subtly wrong. The natural failure mode of unstructured AI-assisted development is therefore not obviously broken code (that gets caught immediately). It is plausible code, produced quickly, reviewed superficially, and committed before anyone verified it actually works.

A methodology earns its overhead only if it prevents real, recurring failures. This one is built around three.

Three failure modes

1. Locally correct, globally incoherent

When AI generates code without architectural supervision, each component works in isolation but the system does not compose. The model has no business context, no long-term architectural vision, and no awareness of cross-cutting concerns. It optimizes for the immediate request. The result is a collection of individually reasonable decisions that do not add up to a coherent system — the classic outcome of treating the AI as an autocomplete tool rather than a supervised engineer.

2. Plausible but wrong at runtime

This is the "it should work now" problem. Code passes visual review and fails at runtime: an async function called without await, a query that uses the wrong column but happens to match another column's type, an exception silently swallowed where it should propagate. Human review is a probabilistic defense — reviewers catch what they think to look for, and the categories of AI error do not always match the categories experienced developers habitually check. The phrase "it should work now" is treated here as semantically equivalent to "I have not verified it."

3. Corner-cutting under deadline pressure

The AI collaborator has no intrinsic motivation to maintain quality. It follows instructions literally. Told "write tests," it writes tests. Told "skip the tests for now, we'll add them later," it complies without objection. Discipline-based quality is fragile precisely because it depends on every participant remembering every rule, every time, under pressure — and one of those participants will agree to cut any corner it is asked to cut.

The response

Each failure mode has a structural answer, not a behavioral one:

Design before code. No code is written until a design exists. The design phase forces articulation of the problem and the approach before the fast-but-blind generator runs — and the design doubles as high-quality context for the AI, so the first implementation is aligned rather than the third.
Evidence over claims. No task is complete until fresh verification output confirms it. Tests must run and pass, coverage must meet thresholds, linters must be clean — enforced by tooling, not by good intentions.
Quality as architecture. Quality enforcement lives in infrastructure (hooks, skills, reviewer agents, memory), so that producing high-quality work is the path of least resistance and cutting corners requires more effort than following the process.

Why neither extreme works

Both unsupervised AI and unaided humans fail, in opposite directions. The methodology is the convergence of the two onto a supervised-collaboration model: the human supplies judgment, context, and architectural vision; the AI supplies implementation velocity, comprehensive tests, and consistent pattern adherence.

Where to next

Verifiable Benefits — the measured evidence from the reference implementation.
Philosophy — the principles in full, each tied to the failure mode that produced it.

The core tension​

Three failure modes​

1. Locally correct, globally incoherent​

2. Plausible but wrong at runtime​

3. Corner-cutting under deadline pressure​

The response​

Why neither extreme works​

Where to next​