# /taste, How It Got Here

A taste audit doesn't work out of the box. It works after you've fed it failure cases and corrected its read of each one. Here's what went into mine, in roughly the order it happened.

> **A note on reconstruction.** I don't keep a formal AR experiment log on `/taste` (no `~/.claude/ar/experiments/taste/` directory exists). Some entries below are literal, traceable to a memory file or a constitution principle that records the date and incident. Others are reconstructed: I read the 34 principles in the current constitution and worked backward to the kind of failure that would have driven each rule. I've labeled which is which. The reconstructed ones are still useful as a model for the kind of correction that moves a skill forward.

---

## Iteration 1, ~early build, "the first audit said everything was fine"

**Test prompt:** "Run /taste on this landing page I just built." (a project demo with three centered hero sections, default Tailwind colors, default CSS easing throughout)

**Output:** Score: 9/10. The skill reported the site looked clean and ready to ship.

**Feedback:** "This site is template-AI-slop. The skill is grading the surface, not the soul. Three centered sections in a row is the #1 difference between a real site and a Claude default."

**Change made:** Added the principle "no two adjacent sections share the same composition type" to the constitution as VD-5 (now Domain 1, principle 5). The skill now reads composition type per section and explicitly flags adjacent duplicates.

**Status:** Reconstructed. The principle exists. The exact framing of the test that drove it is inferred from the principle's framing ("This is the #1 difference between template and custom") which sounds like a correction received, not a rule invented.

---

## Iteration 2, animation discipline, "everything is moving"

**Test prompt:** "/taste this build" (a build using `transition-all duration-300` on basically every element, with `ease-in-out` as the default curve)

**Output:** Skill said motion looked fine. Smooth. Acceptable.

**Feedback:** "Smooth is the AI default. Real taste uses custom easing curves and animates only `transform` and `opacity`. CSS defaults like `ease`, `ease-in-out`, `linear` are immediate AI tells. Look for cubic-bezier or fail."

**Change made:** Added T8 (Custom easing) and T9 (Animation restraint) as binary evals. T8 fails on any use of `ease`, `ease-in-out`, `linear` keywords. T9 caps animated properties to under 5 and fails if anything other than transform/opacity is in a transition. Added principle CA-17 (compositor-only transitions) and CA-18 (custom easing curves never CSS defaults) to the Code Architecture domain.

**Status:** Reconstructed from principles CA-17, CA-18, and binary evals T8, T9. The framing in the constitution ("`ease`, `ease-in-out`, `linear` are tells") is the language of someone who just lost a build to default easing.

---

## Iteration 3, accent restraint, "the cyan is everywhere"

**Test prompt:** "/taste this site" (a build using the brand cyan on every CTA, every checkmark, every tag pill, every icon, every overline, every section accent)

**Output:** Score: 8/10. Skill said the brand color usage was strong.

**Feedback:** "Accent color is medicine, not seasoning. Below 15% surface area. When the accent is on every element, it's not an accent anymore, it's the body color. Count the elements using accent. If it's more than 15% of the visible UI, it's broken."

**Change made:** Added T6 (Accent restraint) as a binary eval. Added VD-3 (accent is medicine, not seasoning) to the Visual Design domain as a constitution principle. The skill now counts accent-color elements explicitly and fails if the count exceeds 15% of total visible elements.

**Status:** Reconstructed from VD-3 and T6. The "medicine, not seasoning" framing is mine, written down after I saw it explained somewhere as the right metaphor. Likely first surfaced during a real correction.

---

## Iteration 4, mobile audit, "the desktop screenshot is a lie"

**Test prompt:** "/taste path/to/index.html" (the skill at this point only screenshotted at 1440x900)

**Output:** Score: 9/10. Skill confirmed the desktop layout was clean.

**Feedback:** "Mobile is broken. The skill never looked at it. Stack-of-divs columns at 375px isn't 'responsive', it's 'desktop, squeezed.' Re-architect this so the skill always screenshots at both 1440x900 AND 375x812, and grades against both."

**Change made:** Step 1 of the skill now takes two screenshots (desktop + mobile) and reads both. Added UX-21 (mobile is not desktop-stacked) as a constitution principle. The mobile screenshot is graded for: touch target sizes, body text size at small viewport, hero CTA stacking, hamburger menu presence, horizontal overflow.

**Status:** Reconstructed from the screenshot capture flow and UX-21. The "Check the mix in mono" mixing analogy in the constitution ("Check the layout at 375px mobile") is the residue of this lesson, framed as music production translation.

---

## Iteration 5, headline copy, "the text is fine but it's not selling"

**Test prompt:** "/taste this proposal landing page" (build with H1 "Build the Future" and section headings like "Our Approach", "What We Do", "Get Started")

**Output:** Score: 8/10. Skill said the typography hierarchy was working, the spacing was right, the contrast was clean.

**Feedback:** "Vague headlines are a build failure. If you can swap two section headings without breaking the page, both are too generic. The skill is grading the layout, not the substance. Add content quality evals."

**Change made:** Added Domain 4 (Content Quality) to the constitution, principles 24-34. Headlines are now graded for specificity (number > process > claim > adjective). Section headings are graded for position-specificity (the "swap two without breaking the page" test is now an explicit rule, CQ-32). Hero taglines that are comma-chain adjective lists (CQ-29) fail.

**Status:** Reconstructed from CQ-24 through CQ-34. The "swap without breaking" framing is in the principle text verbatim. Likely real correction.

---

## Iteration 6, em dashes specifically, "the rule was documented but the dashes still landed"

**Test prompt:** Site built with the voice rule "no em dashes" present in CLAUDE.md and global rules. Result still had em dashes throughout.

**Output:** Skill noted "voice rules followed" and gave a 9/10. The em dashes were technically correct English. The audit didn't flag them.

**Feedback:** "147 em dashes in one site copy. 187 across all my skill files. The rule existed. The agents ignored it. Voice rules without mechanical enforcement leak. Add a binary eval that walks the DOM and counts banned punctuation. Fail the audit if any are present."

**Change made:** Added CQ-33 (No banned punctuation in user-visible text) to the constitution. Mirrored as F11 in the corresponding pilot-qa.mjs functional eval suite so the rule is enforced both at audit time and at build time. The eval walks `textContent` of the rendered DOM and counts em dash (U+2014), en dash (U+2013), Unicode minus (U+2212), and smart quotes (U+201C/D, U+2018/9). Any count > 0 fails the eval.

**Status:** Literal. Source: `~/.claude/projects/-Users-studio/memory/feedback_pilot_qa_f11_banned_punctuation.md`, dated 2026-04-27 region. Date confirmed by the memory file's `originSessionId`.

---

## Iteration 7, signature moments, "every page has the same rhythm"

**Test prompt:** "/taste my portfolio" (built well, technically passing all binary evals, scored 9/10, but with no visual surprise mid-page, every section roughly the same density and weight)

**Output:** Score: 9/10. The skill called the site clean and well-paced.

**Feedback:** "Every site has one signature moment. A mid-page visual beat that breaks the established pattern. The hero doesn't count. This is what people screenshot. Your skill is rewarding consistent rhythm and missing the moment."

**Change made:** Added VD-6 (Every site has one signature moment) to the constitution. Examples in the principle text: oversized pull quote, horizontal scroll gallery, bento grid exploration, parallax close-up. Added T10 (Intentional asymmetry) as a binary eval. T10 fails if every section uses identical grid columns, identical padding, identical density.

**Status:** Reconstructed from VD-6 and T10. The "this is what people screenshot" framing in the principle is the residue.

---

## Iteration 8, the auto-fix loop, "telling me what's wrong is half the value"

**Test prompt:** Run /taste, get a list of 8 violations.

**Output:** A scored report with violations.

**Feedback:** "Don't make me fix these manually. Half the time the fix is mechanical, change `ease` to `cubic-bezier(0.16, 1, 0.3, 1)`, add `max-width: 65ch` to body containers, swap `transition-all` for `transition: transform, opacity`. Add an auto-fix step. Make it optional. Show me the diff before applying."

**Change made:** Added Step 6 (Auto-Fix Optional). After presenting the report, the skill asks "Fix the failures? (y/n)" and on yes runs through a priority-ordered fix loop: T3 measure → T8 easing → T6 accent → T4 spacing → T10 asymmetry → composition restructure → signature moment → choreography. Re-screenshots and re-grades after the fix pass.

**Status:** Reconstructed from Step 6 of the skill. The fix priority order matches the order of violations I see most often.

---

## Iteration 9, calibration, "how do I know if the skill is right?"

**Test prompt:** "/taste these 5 sites that I shipped and consider 9/10 work"

**Output:** Skill scored some 8/10, some 7/10, one 6/10. Did not match my judgment.

**Feedback:** "If the skill grades my known-good 9/10 ships at 7/10, the skill is wrong, not the ships. Calibrate against my validated wins. The skill should score them 9-10. If it doesn't, the constitution is missing principles or weighting them wrong."

**Change made:** Re-tuned the constitution against 30+ validated wins (sites I'd shipped and rated 9-10) and 40+ corrections (specific feedback I'd given Claude during builds). Added the calibration data note to the constitution's footer. The current skill now scores my validated 9-10 ships at 9-10.

**Status:** Reconstructed from the constitution's "Calibration Data" footer ("5/5 /pilot calibration builds scored 9-10/10") and the explicit "30+ validated wins and 40+ corrections" claim in the constitution preamble.

---

## What this teaches

1. **The constitution is a residue of failures.** Every principle started as a build that didn't catch what it should have. The principle is the rule that prevents the same miss next time. If you fork this skill, your version of the constitution should grow the same way: from real corrections, not from theory.

2. **Mechanical enforcement beats documented voice.** Em dashes were banned in writing for months before the F11 / CQ-33 mechanical eval landed, and the dashes still leaked. Don't trust agents to follow rules they read once. Enforce mechanically.

3. **Score against your known-good work.** If the skill grades work you consider 10/10 at 7/10, the skill is wrong. Calibrate by running it on your best output and seeing whether it agrees. If it doesn't, the constitution is missing a principle or weighting one wrong.

4. **The auto-fix is half the value.** A taste audit that says "your spacing is wrong" without telling you the exact CSS to change is a half-built tool. The fix priority list matters. Order it by leverage, the fixes that close the most violations per minute go first.

5. **Mobile is a separate skill.** A skill that only audits desktop is grading half the work. Always screenshot at both viewports. Always grade against both.

---

## What you should do with this

Read your own corrections back. The last 20-50 times you told Claude "no, do it this way", what was the rule? Write it down. That's the next principle in your version of the constitution. Run `/taste` against your last 5 shipped builds. If the score is suspiciously high, the constitution is missing something. If the score is suspiciously low, the constitution has rules that don't match your actual taste.

The skill's job isn't to grade against universal rules. There is no universal taste. Its job is to grade against YOUR rules, written down so the next build inherits them for free.
