` - P3: Banned punctuation absent from user-visible text. > FILL IN: your banned list. Examples: em dash (U+2014), en dash (U+2013), Unicode minus (U+2212), smart quotes (U+201C/D, U+2018/9). If you don't have a list, this rule defaults to "no em dashes" since that's the most reliable AI-generation tell. - P4: Hover states differ by element type (links, cards, CTAs, nav each get distinct hover) - P5: Mobile is not desktop-stacked (real reflow, not a single column) **Score and decide:** | Score | Action | |---|---| | ≥22/25 | Page production-ready. Commit, move to next page. | | 18-21/25 | Enrichment pass. Target failing evals (lowest first). Max 3 passes. | | <18/25 | Structural issue. Re-read spec for this page before fixing. | ### Build Rules - **Scope guard:** Code introducing a NOT capability = stop and ask - **No side quests:** "While I was in here" improvements forbidden unless fixing current section's bug - **Commit after each verified section.** Every commit = rollback point - **Mobile-aware build:** Don't build desktop-first then fix mobile at the end. Use responsive classes from the start. --- ## Step 2.5: Full-Site Convergence Gate Per-page evals catch page-level issues. This gate catches site-wide coherence: cross-page consistency, full taste compliance, interaction integrity across all pages. ### Coherence Checks Read across all per-page reports: - Nav behavior consistent across all pages - Footer identical on all pages - Typography scale consistent (same H1 size everywhere, same body size) - Color usage consistent (accent doesn't shift between pages) - Animation vocabulary consistent (same easing curves, same reveal patterns) ### Mobile Click-Test After per-page mobile evals pass, click-test the hamburger manually: 1. Click hamburger button → verify menu visible 2. Click a nav link in the menu → verify menu closes and page scrolls to target Mechanical evals check hamburger presence but not click behavior; this covers the gap. ### Copy Clarity Audit - Every CTA explains what happens when you click it - Pricing displays include context (what's included, constraints) - Proper nouns and names spell-checked - No two adjacent sections share the same composition type (centered, split, full-bleed, editorial, statement) ### Decide | Per-page score | Action | |---|---| | ≥22/25 all pages + coherence PASS | Production-ready. Proceed to Step 3. | | 18-21/25 or coherence FAIL | Site-wide enrichment pass. Max 5 passes. | | <18/25 after 5 passes | Structural issue. Stop, diagnose, re-plan. | --- ## Step 2.6: VALIDATE Build looks right. VALIDATE confirms it works right. Spawn parallel agents that test features against binary evals. ### Eval Generation Auto-generate evals from three sources: 1. **Step 0 acceptance criteria** - each section's criterion becomes a FUNCTIONAL eval 2. **Standard categories** - build compiles, types check, security basics 3. **Codebase scan** - detect auth, database, API routes, external services and generate targeted evals Every eval: binary YES/NO, tagged with agent name, verifiable by command or code inspection. Minimum 15 evals. ### Agent Roles Spawn based on what was built. Minimum 3, maximum 6. | Agent | When | Threshold | |-------|------|-----------| | BUILD | Always | 100% | | FUNCTIONAL | Always | 90% | | SECURITY | Auth, database, or API routes | 100% critical | | MOBILE | Visual UI present | 90% | | INTEGRATION | 3+ interacting features | 85% | | PERFORMANCE | Production deploy | 85% | ### Execution Each agent fixes lowest-scoring eval first, verifies (build + eval), commits, moves on. Stop after 10 attempts or all evals pass. After all agents complete a round: 1. Verify only owned files modified 2. Merge changes 3. Re-run ALL evals on merged code | Condition | Action | |-----------|--------| | Score >= threshold | Agent converged | | Score unchanged 2 rounds | Converged (stalled), log why | | All converged | Move to HARDEN | | Round >= 3 | Force stop, move to HARDEN | | BUILD fails | Block all others until BUILD converges | --- ## Step 2.7: HARDEN VALIDATE proved features work. HARDEN finds what nobody thought to test. ### Research Spawn 3-5 research agents in parallel. Select angles by project type: | Angle | When | Mandate | |-------|------|---------| | Security | Auth/data/API | Injection, token handling, CSRF, abuse paths | | Edge Cases | Complex business logic | Race conditions, duplicates, boundary values | | Resilience | External services | Timeouts, delivery failures, rate limits | | Data Integrity | DB writes with relationships | Constraints, cascades, concurrent writes | | UX Flow | User-facing features | Error messages, loading states, expired sessions | Each agent returns 3-5 improvements with: current state (file + line), target state, binary eval, priority (1-5), risk. ### Synthesize 1. Deduplicate across agents 2. Red team each improvement, what could it break? 3. Priority stack: impact / (effort + risk). Filter to **must-ship only** (priority 1-2). ### Execute For each must-ship improvement: 1. Make the change 2. Run build 3. Run ALL VALIDATE evals (regression gate) 4. Pass + no regressions: commit 5. Any regression: revert, alternate approach (3 attempts max) 6. All attempts regress: skip, log "needs manual review" --- ## Step 3: Ship ### Pre-Ship Verification - Build passes, zero errors - Step 2.5 Convergence Gate passed - Step 2.6 VALIDATE converged - Step 2.7 HARDEN complete - Verify against deployed state, not local working directory ### Deploy > FILL IN: your deploy command. Mine is `{{DEPLOY_COMMAND}}` (e.g., `wrangler pages deploy . --project-name {project-slug}`). If you don't have one, the skill prompts you to pick a target the first time it deploys. ### Output After deploying, present the full pipeline results: ``` Build deployed at {URL}. Pipeline: - Build: {sections} sections, {commits} commits, {enrichment_passes} enrichment passes - Score: {taste}/10 | Functional: {functional}/10 | Combined: {combined}/20 - Coherence: {PASS/FAIL} | Enrichment rounds: {page-level} + {site-level} - Validate: {passed}/{total} evals | {converged}/{agents} agents converged - Harden: {shipped}/{attempted} improvements | 0 regressions ``` --- ## Optional companions Two files I keep in my own pilot directory that significantly raise the ceiling on output quality. Not bundled here because they're heavily calibrated to my taste, but they're worth building your own version of: 1. **`taste-constitution.md`**, 30+ principles your build is graded against per section. Mine has principles like "accent color is medicine not seasoning" and "no two adjacent sections share the same composition type." Build your own from the corrections you keep giving Claude. 2. **`taste-library.md`**, the technical spec, exact CSS values, easing curves, font scale. Mine has my preferred type scale, my custom cubic-bezier curves, my elevation system. Yours should reflect your aesthetic, not mine. If you want a starting point, the `/taste` skill in this same pack ships with a 34-principle constitution genericized to be a reasonable default. Fork that. --- ## Hard Rules 1. Never guess at business logic. Ask. 2. The default tech stack and deploy target you set during customize are the defaults. Justify deviations. 3. No deploy without verification. 4. Commit after each verified section. 5. 3 failures same approach = stop and re-plan. 6. Build the thing. Don't ask for permission at every step. The pipeline is autonomous by design.

` gets a kebab-case `id` matching its purpose. Nav links use fragment anchors for cross-page section links (`/pricing#enterprise`). Interactive components (tabs, accordions, carousels) sync visible state to URL query params via `history.replaceState`. All sections get `scroll-margin-top` matching fixed nav height. ### Scope = Capabilities, Not Files ``` HAS: [landing-page, contact-form, blog, dark-mode] NOT: [auth, database, cms, payments, admin] ``` New files during build: fine. New capabilities in NOT: stop and ask. ### Tech Stack Default > FILL IN: your default stack. Mine is `{{TECH_STACK_DEFAULT}}` (e.g., "Astro + Tailwind v4 + Cloudflare Pages"). If the brain dump explicitly names a different stack, use that. Otherwise default to your stack and mention it in the spec. ### Voice / Brand Constraints > FILL IN: your voice and content rules. Examples that go here: > - Banned punctuation (e.g., no em dashes in user-visible text) > - Hard-stop phrases (e.g., never use "innovative" or "leverage" as verbs) > - Tone guidelines (e.g., "terse declarative, no marketing fluff") > - Names or terms that must never appear in customer-facing copy > > If you don't have any yet, leave this section blank and the skill operates with sensible defaults: no marketing fluff, specific numbers over vague claims, position-specific section headings. --- ## Step 1: Scaffold Present a compact spec with the 6 fields + resolved design tokens. Then immediately scaffold, no confirmation needed. ### Project Directory > FILL IN: where new builds go on disk. Mine is `{{BUILD_ROOT}}/demos/{project-slug}/`. Pick a directory pattern that works for you. If you don't have one, use `~/code/{project-slug}/`. 1. Create project directory at the chosen path 2. Copy a template scaffold into it (your boilerplate, or framework-default) 3. Generate a real index page: navigation, design tokens in CSS, stub section headings 4. Start dev server Non-visual projects (API, bot, CLI): scaffold file structure + working hello-world instead. ### Save the source brief at project root If the brain dump is a client brief (Upwork posting, Slack message, email), save the full original text verbatim to `{project-root}/brief.md` during scaffold. This is the source of truth you refer back to for proposals, follow-ups, and future sessions. Format: ```markdown # {Inferred title}: Brief **Captured:** YYYY-MM-DD **Source:** {Upwork URL / email subject / "internal idea"} **Budget:** {if mentioned} --- {full original text, verbatim} ``` Skip this step only when the brain dump is a single short directive (under ~30 words). Proceed immediately after scaffold. Do not wait for confirmation. --- ## Step 2: Build-Test-Enrich Loop Build each page from Step 0 in order. Multi-page builds: complete one page fully before starting the next. Home page first. Each page goes through a build-test-enrich cycle that runs until the page scores production-ready or hits the enrichment cap. ### Per-Section Build After each section: 1. Structure + style together 2. Verify the acceptance criterion for that section 3. Pass: commit, continue. Fail: fix, re-verify 4. Circuit breaker: 3 failures same approach = stop, re-read North Star, re-plan ### Per-Page Eval After completing all sections for a page, score the page on these binary checks: **Functional (F1-F10):** - F1: All interactive elements actually do something (no dead `#` links) - F2: Touch targets ≥ 44px on mobile - F3: Forms post to a real endpoint (or render a clear "demo, not wired" message) - F4: Images load with explicit width/height to prevent CLS - F5: No horizontal overflow at 375px viewport - F6: Body text ≥ 16px at all breakpoints - F7: All headings render in DOM order (no skipping h2 → h4) - F8: External links have `target="_blank" rel="noopener"` - F9: Page builds successfully (no console errors at first paint) - F10: 404 page exists and is branded **Taste (T1-T10):** - T1: Heading letter-spacing tightened on large display sizes - T2: Line-height varies between body, headings, display - T3: Body text constrained to ~65ch measure (no full-bleed paragraphs) - T4: Spacing above headings exceeds spacing below (forward momentum) - T5: 3+ distinct font weights in use - T6: Accent color appears on < 15% of visible surface area - T7: 2+ background lightness levels (surface depth, not flat) - T8: Custom easing curves used, not CSS defaults (`ease`, `linear`) - T9: Animation restraint, only `transform` and `opacity` for transitions - T10: At least one section breaks the uniform grid (intentional asymmetry) **Polish (P1-P5):** - P1: Custom favicon (not framework default) - P2: OG meta tags + page-specific `` - P3: Banned punctuation absent from user-visible text. > FILL IN: your banned list. Examples: em dash (U+2014), en dash (U+2013), Unicode minus (U+2212), smart quotes (U+201C/D, U+2018/9). If you don't have a list, this rule defaults to "no em dashes" since that's the most reliable AI-generation tell. - P4: Hover states differ by element type (links, cards, CTAs, nav each get distinct hover) - P5: Mobile is not desktop-stacked (real reflow, not a single column) **Score and decide:** | Score | Action | |---|---| | ≥22/25 | Page production-ready. Commit, move to next page. | | 18-21/25 | Enrichment pass. Target failing evals (lowest first). Max 3 passes. | | <18/25 | Structural issue. Re-read spec for this page before fixing. | ### Build Rules - **Scope guard:** Code introducing a NOT capability = stop and ask - **No side quests:** "While I was in here" improvements forbidden unless fixing current section's bug - **Commit after each verified section.** Every commit = rollback point - **Mobile-aware build:** Don't build desktop-first then fix mobile at the end. Use responsive classes from the start. --- ## Step 2.5: Full-Site Convergence Gate Per-page evals catch page-level issues. This gate catches site-wide coherence: cross-page consistency, full taste compliance, interaction integrity across all pages. ### Coherence Checks Read across all per-page reports: - Nav behavior consistent across all pages - Footer identical on all pages - Typography scale consistent (same H1 size everywhere, same body size) - Color usage consistent (accent doesn't shift between pages) - Animation vocabulary consistent (same easing curves, same reveal patterns) ### Mobile Click-Test After per-page mobile evals pass, click-test the hamburger manually: 1. Click hamburger button → verify menu visible 2. Click a nav link in the menu → verify menu closes and page scrolls to target Mechanical evals check hamburger presence but not click behavior; this covers the gap. ### Copy Clarity Audit - Every CTA explains what happens when you click it - Pricing displays include context (what's included, constraints) - Proper nouns and names spell-checked - No two adjacent sections share the same composition type (centered, split, full-bleed, editorial, statement) ### Decide | Per-page score | Action | |---|---| | ≥22/25 all pages + coherence PASS | Production-ready. Proceed to Step 3. | | 18-21/25 or coherence FAIL | Site-wide enrichment pass. Max 5 passes. | | <18/25 after 5 passes | Structural issue. Stop, diagnose, re-plan. | --- ## Step 2.6: VALIDATE Build looks right. VALIDATE confirms it works right. Spawn parallel agents that test features against binary evals. ### Eval Generation Auto-generate evals from three sources: 1. **Step 0 acceptance criteria** - each section's criterion becomes a FUNCTIONAL eval 2. **Standard categories** - build compiles, types check, security basics 3. **Codebase scan** - detect auth, database, API routes, external services and generate targeted evals Every eval: binary YES/NO, tagged with agent name, verifiable by command or code inspection. Minimum 15 evals. ### Agent Roles Spawn based on what was built. Minimum 3, maximum 6. | Agent | When | Threshold | |-------|------|-----------| | BUILD | Always | 100% | | FUNCTIONAL | Always | 90% | | SECURITY | Auth, database, or API routes | 100% critical | | MOBILE | Visual UI present | 90% | | INTEGRATION | 3+ interacting features | 85% | | PERFORMANCE | Production deploy | 85% | ### Execution Each agent fixes lowest-scoring eval first, verifies (build + eval), commits, moves on. Stop after 10 attempts or all evals pass. After all agents complete a round: 1. Verify only owned files modified 2. Merge changes 3. Re-run ALL evals on merged code | Condition | Action | |-----------|--------| | Score >= threshold | Agent converged | | Score unchanged 2 rounds | Converged (stalled), log why | | All converged | Move to HARDEN | | Round >= 3 | Force stop, move to HARDEN | | BUILD fails | Block all others until BUILD converges | --- ## Step 2.7: HARDEN VALIDATE proved features work. HARDEN finds what nobody thought to test. ### Research Spawn 3-5 research agents in parallel. Select angles by project type: | Angle | When | Mandate | |-------|------|---------| | Security | Auth/data/API | Injection, token handling, CSRF, abuse paths | | Edge Cases | Complex business logic | Race conditions, duplicates, boundary values | | Resilience | External services | Timeouts, delivery failures, rate limits | | Data Integrity | DB writes with relationships | Constraints, cascades, concurrent writes | | UX Flow | User-facing features | Error messages, loading states, expired sessions | Each agent returns 3-5 improvements with: current state (file + line), target state, binary eval, priority (1-5), risk. ### Synthesize 1. Deduplicate across agents 2. Red team each improvement, what could it break? 3. Priority stack: impact / (effort + risk). Filter to **must-ship only** (priority 1-2). ### Execute For each must-ship improvement: 1. Make the change 2. Run build 3. Run ALL VALIDATE evals (regression gate) 4. Pass + no regressions: commit 5. Any regression: revert, alternate approach (3 attempts max) 6. All attempts regress: skip, log "needs manual review" --- ## Step 3: Ship ### Pre-Ship Verification - Build passes, zero errors - Step 2.5 Convergence Gate passed - Step 2.6 VALIDATE converged - Step 2.7 HARDEN complete - Verify against deployed state, not local working directory ### Deploy > FILL IN: your deploy command. Mine is `{{DEPLOY_COMMAND}}` (e.g., `wrangler pages deploy . --project-name {project-slug}`). If you don't have one, the skill prompts you to pick a target the first time it deploys. ### Output After deploying, present the full pipeline results: ``` Build deployed at {URL}. Pipeline: - Build: {sections} sections, {commits} commits, {enrichment_passes} enrichment passes - Score: {taste}/10 | Functional: {functional}/10 | Combined: {combined}/20 - Coherence: {PASS/FAIL} | Enrichment rounds: {page-level} + {site-level} - Validate: {passed}/{total} evals | {converged}/{agents} agents converged - Harden: {shipped}/{attempted} improvements | 0 regressions ``` --- ## Optional companions Two files I keep in my own pilot directory that significantly raise the ceiling on output quality. Not bundled here because they're heavily calibrated to my taste, but they're worth building your own version of: 1. **`taste-constitution.md`**, 30+ principles your build is graded against per section. Mine has principles like "accent color is medicine not seasoning" and "no two adjacent sections share the same composition type." Build your own from the corrections you keep giving Claude. 2. **`taste-library.md`**, the technical spec, exact CSS values, easing curves, font scale. Mine has my preferred type scale, my custom cubic-bezier curves, my elevation system. Yours should reflect your aesthetic, not mine. If you want a starting point, the `/taste` skill in this same pack ships with a 34-principle constitution genericized to be a reasonable default. Fork that. --- ## Hard Rules 1. Never guess at business logic. Ask. 2. The default tech stack and deploy target you set during customize are the defaults. Justify deviations. 3. No deploy without verification. 4. Commit after each verified section. 5. 3 failures same approach = stop and re-plan. 6. Build the thing. Don't ask for permission at every step. The pipeline is autonomous by design.