Skip to main content

Testing Standard (Summary)

In one line: No mocking by default, deterministic tests only, actual output as evidence, fast by default — the enforcement behind "evidence over claims" (Section 2.3).

The testing standard governs how tests are written, what counts as valid, and what thresholds apply.

Key Principles:

  1. No mocking by default. Mocking and in-memory databases are forbidden unless explicitly approved with a documented MOCK APPROVED comment that states the reason, the approver, the date, and the alternative for running against real services. Use testcontainers for PostgreSQL, MinIO, Redis, and Temporal.

  2. Deterministic tests only. No time.sleep() in tests. For time-dependent behavior, use the project's standard time-control library (canonical default per Section 4.5: freezegun; Temporal-workflow tests use WorkflowEnvironment.start_time_skipping() instead). Tests must produce the same result regardless of execution timing.

  3. Evidence requirements. Every implementation response must include actual test output — pytest results and coverage report with timestamps. The statement "tests should pass" is not evidence.

  4. Fast by default. Tests that require more than 30 seconds are marked with @pytest.mark.slow and excluded from the default test run. Long-running integration tests are opt-in, not mandatory for every commit.

PoC vs Production Mode:

DimensionPoC ModeProduction Mode
Test timingCode first, tests afterTDD encouraged
Edge case testsSkip initiallyRequired
E2E testsSkipRequired for user-facing apps
Security testsDeferredRequired for auth/data endpoints
Failure branch coverageNot measured85% minimum

Coverage Targets:

LayerPoCProduction
Workflow state machine + activities90%90%
FastAPI endpoints70%90%
React components70%90%
Integration layers (document parsing, object storage)70%90%
Line coverage (overall)70%90%
Failure branch coverage--85%
Real integration ratio--80%

Evidence: Enforced by the testcontainers requirement and the MOCK APPROVED annotation pattern — a mock without the structured approval comment is flagged at review, and the pre-push gate requires fresh passing output. Reproducible: add a bare mock and watch review flag it. See appendix-a-testing.md.

Full specification: appendix-a-testing.md