Testing

Test execution strategy + Playwright e2e tuning.

MUST

Run only failing tests first; verify 2-3×; stop. Why: full suites are slow + noisy.
Pre-test: bun fix passes, kill stale procs (pkill -9 -f "next"), clear results (rm -rf test-results). Why: clean baseline.
Cheapest faithful harness first — unit → focused integration → headless script → service smoke → full UI/e2e. Why: reproduce a bug at the lowest tier that shows it; don’t loop a 40s e2e on a logic bug.
Paid-API tests (Anthropic, etc.) in dedicated files (*.cost.test.ts, smoke-*), cheapest model + shortest prompt, run LAST after free checks (lint → unit → build). Why: pass the zero-cost gate before anything billable.
After a failed paid-API cycle, record actual cost (tokens × model price) before the next attempt. Why: silent retries compound the bill without tracking lesson-rate.

Run the full suite unless explicitly asked. Cost: slow, noisy, masks the real failure.
Scale to full suites blindly before verifying. Cost: wasted wall-clock.
Run paid-API tests in a debug loop. Cost: real money per iteration.

Symptom	Fix
Hangs on `fill()`/`click()`	Check element visible/enabled
`networkidle` hangs	Use `waitForSelector()` instead
Element not found	Check testid on element vs parent
Flaky counts	`--workers=1`