Testing
Testing conventions and strategies
Test execution strategy + Playwright e2e tuning.
MUST
- Run only failing tests first; verify 2-3×; stop. Why: full suites are slow + noisy.
- Pre-test:
bun fixpasses, kill stale procs (pkill -9 -f "next"), clear results (rm -rf test-results). Why: clean baseline. - Cheapest faithful harness first — unit → focused integration → headless script → service smoke → full UI/e2e. Why: reproduce a bug at the lowest tier that shows it; don’t loop a 40s e2e on a logic bug.
- Paid-API tests (Anthropic, etc.) in dedicated files (
*.cost.test.ts,smoke-*), cheapest model + shortest prompt, run LAST after free checks (lint → unit → build). Why: pass the zero-cost gate before anything billable. - After a failed paid-API cycle, record actual cost (tokens × model price) before the next attempt. Why: silent retries compound the bill without tracking lesson-rate.
NEVER
- Run the full suite unless explicitly asked. Cost: slow, noisy, masks the real failure.
- Scale to full suites blindly before verifying. Cost: wasted wall-clock.
- Run paid-API tests in a debug loop. Cost: real money per iteration.
Playwright E2E
| Scope | Timeout | Kill |
|---|---|---|
| Single test | 5s | 10s |
| Single file | 8s/test | 30s |
| Full suite | 10s/test | 180s |
| Symptom | Fix |
|---|---|
Hangs on fill()/click() | Check element visible/enabled |
networkidle hangs | Use waitForSelector() instead |
| Element not found | Check testid on element vs parent |
| Flaky counts | --workers=1 |