docs: add Addy Osmani spec-writing evaluation (4/5) + spec-first.md sections

Integration of "How to write a good spec for AI agents" by Addy Osmani: Evaluation (docs/resource-evaluations/addy-osmani-good-spec.md): - Score: 4/5 (High Value - Integrate within 1 week) - Fills gaps: modular design, operational boundaries, command specs - Fact-checked: credentials verified via Perplexity, all claims sourced - Challenge phase: technical-writer agent corrected initial 3/5 → 4/5 Spec-First Workflow Updates (guide/workflows/spec-first.md): - NEW: "Modular Spec Design" section (~50 lines, line 322) Pattern: Split large specs into focused files (CLAUDE-[domain].md) - NEW: "Operational Boundaries" section (~60 lines, line 372) Three-tier system: Always/Ask First/Never → maps to Claude Code modes - NEW: "Command Spec Template" section (~40 lines, line 432) Executable command specs with expected outputs & error handling - NEW: "Anti-Pattern: Monolithic CLAUDE.md" section (~30 lines, line 472) Explains cognitive load problem (>200 lines = context pollution) Reference Index (machine-readable/reference.yaml): - 8 new entries: spec_first_workflow → spec_osmani_score - Links to new spec-first.md sections with line numbers - Source attribution: https://addyosmani.com/blog/good-spec/ Public Facing (README.md): - Incremented resource evaluations count: 35 → 36 File growth: spec-first.md 327 → 507 lines (+180) Source: Addy Osmani (former Chrome team, 14y), published Jan 13, 2026 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-01 21:30:34 +01:00 · 2026-02-01 21:30:34 +01:00 · a5942f1c53
commit a5942f1c53
parent bc86c8ed7f
4 changed files with 729 additions and 1 deletions
--- a/guide/workflows/spec-first.md
+++ b/guide/workflows/spec-first.md
@ -318,6 +318,504 @@ Review the spec in CLAUDE.md and create an implementation plan.

 ---

+## Modular Spec Design
+
+**Pattern**: Break large specifications into multiple focused files instead of cramming everything into a single CLAUDE.md.
+
+### The Problem: Monolithic CLAUDE.md
+
+When specs exceed ~200 lines, several issues emerge:
+
+- **Context pollution**: Claude struggles to extract relevant information from bloated context
+- **Cognitive overload**: Developers can't quickly scan for what they need
+- **Maintenance burden**: Updating one area requires navigating unrelated sections
+- **Performance degradation**: Large CLAUDE.md files slow down context loading and processing
+
+### When to Split
+
+| Threshold | Action |
+|-----------|--------|
+| **<100 lines** | Single CLAUDE.md is fine |
+| **100-200 lines** | Consider splitting if distinct domains exist |
+| **>200 lines** | **Split immediately** — you're past the cognitive load threshold |
+| **Multi-team projects** | Split by domain/ownership regardless of size |
+
+### Split Strategies
+
+**1. Feature-Based Split**
+
+```
+CLAUDE.md              # Core project context
+CLAUDE-auth.md         # Authentication spec
+CLAUDE-api.md          # API endpoints spec
+CLAUDE-billing.md      # Payment processing spec
+```
+
+**2. Role-Based Split**
+
+```
+CLAUDE.md              # Shared conventions
+CLAUDE-frontend.md     # UI/UX specifications
+CLAUDE-backend.md      # API/database specs
+CLAUDE-infra.md        # DevOps/deployment specs
+```
+
+**3. Workflow-Based Split**
+
+```
+CLAUDE.md              # Daily development rules
+CLAUDE-testing.md      # Test specifications
+CLAUDE-release.md      # Release process spec
+CLAUDE-security.md     # Security requirements
+```
+
+### Implementation Pattern
+
+**Main CLAUDE.md** (stays concise):
+```markdown
+# Project: [NAME]
+
+## Tech Stack
+[Core technologies]
+
+## Commands
+[Daily commands]
+
+## Rules
+[Universal rules]
+
+## Detailed Specs
+- Authentication: See @CLAUDE-auth.md
+- API Design: See @CLAUDE-api.md
+- Testing: See @CLAUDE-testing.md
+```
+
+**CLAUDE-auth.md** (focused spec):
+```markdown
+# Authentication Specification
+
+## Capabilities
+- MUST: JWT-based authentication
+- MUST: Refresh token rotation
+- MUST NOT: Store tokens in localStorage
+
+## API Contract
+[Detailed auth endpoints...]
+
+## Security Requirements
+[Specific auth security rules...]
+```
+
+**Benefits**:
+- Claude can reference specific files with `@CLAUDE-auth.md`
+- Faster context loading (only relevant specs)
+- Easier maintenance (edit one domain without affecting others)
+- Better team collaboration (ownership per spec file)
+
+**Source**: Addy Osmani, ["How to write a good spec for AI agents"](https://addyosmani.com/blog/good-spec/) (Jan 2026)
+
+---
+
+## Operational Boundaries
+
+**Pattern**: Define explicit boundaries for what AI agents should do automatically, ask about, or never touch.
+
+### The Three-Tier System
+
+Traditional specs use binary constraints (MUST/MUST NOT), but operational work requires three levels:
+
+| Tier | Meaning | Claude Code Mapping |
+|------|---------|---------------------|
+| **Always** | Execute automatically without asking | Auto-accept mode |
+| **Ask First** | Get user confirmation before proceeding | Default mode |
+| **Never** | Block or require Plan Mode | Plan mode / Hook blocking |
+
+### Operational Boundaries Template
+
+```markdown
+## Boundaries
+
+### Always (Auto-accept)
+- Run tests after code changes
+- Format code with Prettier
+- Update imports when moving files
+- Fix linting errors
+- Add type annotations for untyped code
+
+### Ask First (Confirm)
+- Modify database schemas
+- Add new dependencies
+- Change API contracts
+- Refactor >50 lines of code
+- Update configuration files
+
+### Never (Block)
+- Push to production branch
+- Commit secrets or API keys
+- Delete data without backup
+- Modify CI/CD workflows without review
+- Bypass security checks
+```
+
+### Mapping to Claude Code Permissions
+
+**Always → Auto-accept Mode** (Shift+Tab):
+```json
+{
+  "allowedPrompts": {
+    "Bash": [
+      "run tests",
+      "format code",
+      "check types"
+    ]
+  }
+}
+```
+
+**Ask First → Default Mode**:
+- Standard behavior, prompts for every action
+- Use for actions with moderate risk/impact
+
+**Never → Plan Mode + Hooks**:
+```bash
+# .claude/hooks/pre-tool-use.sh
+#!/bin/bash
+if [[ "$TOOL_USE_TOOL_NAME" == "Bash" ]] && [[ "$TOOL_USE_INPUT" =~ "git push origin main" ]]; then
+  echo "ERROR: Direct push to main blocked. Use feature branches."
+  exit 2  # Block the action
+fi
+```
+
+### Decision Framework
+
+Ask yourself for each action:
+1. **Can it cause data loss?** → Ask First or Never
+2. **Is it reversible with git?** → Maybe Always
+3. **Does it affect other developers?** → Ask First
+4. **Is it a security risk?** → Never
+5. **Is it part of the standard workflow?** → Always
+
+### Example: API Development
+
+```markdown
+### Always
+- Run unit tests (npm test)
+- Validate request schemas
+- Generate API documentation
+- Check response formats
+
+### Ask First
+- Add new API endpoints
+- Change existing endpoint signatures
+- Modify authentication requirements
+- Update rate limiting rules
+
+### Never
+- Expose internal endpoints publicly
+- Log sensitive user data
+- Disable authentication checks
+- Remove rate limiting
+```
+
+### Maintenance
+
+Review boundaries quarterly:
+- **Promote**: Actions that never caused issues (Ask First → Always)
+- **Demote**: Actions that caused problems (Always → Ask First)
+- **Block**: Repeated mistakes (Ask First → Never)
+
+**Source**: Addy Osmani, ["How to write a good spec for AI agents"](https://addyosmani.com/blog/good-spec/) (Jan 2026)
+
+---
+
+## Command Spec Template
+
+**Pattern**: Document executable commands with expected outputs and error handling.
+
+### Why Command Specs Matter
+
+Most specs focus on **features** ("build authentication"), but **commands** ("how to test authentication") are equally critical for AI agents.
+
+### Template Structure
+
+```markdown
+## Commands
+
+### [Command Category]
+
+**Purpose**: [What this command accomplishes]
+
+#### Command: `[actual command]`
+**When**: [Trigger condition]
+**Expected Output**: [What success looks like]
+**Error Handling**: [What to do on failure]
+**Flags**: [Important options]
+
+---
+```
+
+### Example: Testing Commands
+
+```markdown
+## Commands
+
+### Testing
+
+#### Command: `pnpm test`
+**When**: Before every commit, after code changes
+**Expected Output**:
+- All tests pass (exit code 0)
+- Coverage ≥80% (lines, branches, functions)
+- No console warnings
+**Error Handling**:
+- If tests fail → Fix tests, don't skip
+- If coverage drops → Add tests for uncovered code
+- If warnings appear → Investigate before committing
+**Flags**:
+- `--coverage`: Generate coverage report
+- `--watch`: Run in watch mode for development
+- `--silent`: Suppress console output
+
+#### Command: `pnpm test:e2e`
+**When**: Before merging to main, in CI pipeline
+**Expected Output**:
+- All E2E scenarios pass
+- Screenshots captured for failures
+- Test duration <5 minutes
+**Error Handling**:
+- If flaky → Investigate race conditions, don't retry blindly
+- If timeout → Check network mocks, async handling
+- If screenshots differ → Review UI changes deliberately
+**Flags**:
+- `--headed`: Run with visible browser (debugging)
+- `--project chromium`: Test specific browser
+```
+
+### Example: Build & Deployment
+
+```markdown
+## Commands
+
+### Build
+
+#### Command: `pnpm build`
+**When**: Before deployment, in CI pipeline
+**Expected Output**:
+- Build succeeds (exit code 0)
+- Output in `dist/` directory
+- No TypeScript errors
+- Bundle size <500KB (main chunk)
+**Error Handling**:
+- If TypeScript errors → Fix types, don't use `@ts-ignore`
+- If bundle too large → Analyze with `pnpm analyze`, code-split
+- If missing assets → Check public/ directory, update paths
+**Flags**:
+- `--mode production`: Production optimizations
+- `--analyze`: Generate bundle size report
+
+### Deployment
+
+#### Command: `pnpm deploy:staging`
+**When**: After PR approval, before production
+**Expected Output**:
+- Deployment succeeds
+- Health check returns 200 OK
+- Staging URL: https://staging.example.com
+**Error Handling**:
+- If health check fails → Rollback automatically
+- If database migration fails → Don't proceed, investigate
+- If environment vars missing → Check .env.staging, update secrets
+**Never**: Run `pnpm deploy:production` manually — use CI/CD only
+```
+
+### Example: Database Commands
+
+```markdown
+## Commands
+
+### Database
+
+#### Command: `pnpm db:migrate`
+**When**: After pulling schema changes, before development
+**Expected Output**:
+- Migrations applied successfully
+- Database schema matches models
+- Seed data loaded (development only)
+**Error Handling**:
+- If migration fails → Check database connection, review SQL
+- If conflicts detected → Resolve migrations, don't force
+**Never**: Run migrations in production manually — CI/CD only
+
+#### Command: `pnpm db:reset`
+**When**: Development only, never in staging/production
+**Expected Output**:
+- Database dropped and recreated
+- All migrations applied
+- Seed data loaded
+**Error Handling**:
+- If production check fails → Abort immediately, verify environment
+**Safety**: Requires `NODE_ENV=development` check
+```
+
+### Integration with CLAUDE.md
+
+Reference command specs in your main CLAUDE.md:
+
+```markdown
+## Commands
+- Build: `pnpm build` (see spec for error handling)
+- Test: `pnpm test` (must pass before commit)
+- Deploy: See CLAUDE-deployment.md for full procedures
+```
+
+**Source**: Addy Osmani, ["How to write a good spec for AI agents"](https://addyosmani.com/blog/good-spec/) (Jan 2026)
+
+---
+
+## Anti-Pattern: Monolithic CLAUDE.md
+
+### The Problem
+
+**Symptom**: Your CLAUDE.md has grown to 300+ lines, mixing feature specs, API contracts, testing requirements, deployment procedures, and team conventions.
+
+**Impact**:
+- **Context inefficiency**: Claude loads entire 300 lines for every request, even for simple tasks
+- **Slow response time**: Large context = slower processing
+- **Reduced accuracy**: Important details get lost in noise
+- **Maintenance overhead**: Updating one section requires navigating unrelated content
+- **Team friction**: Multiple developers editing same file = merge conflicts
+
+### Real-World Example
+
+**Before** (monolithic):
+```markdown
+# CLAUDE.md (387 lines)
+
+## Tech Stack
+[20 lines]
+
+## Authentication
+[45 lines of auth spec]
+
+## API Endpoints
+[67 lines of API contracts]
+
+## Database Schema
+[52 lines of schema rules]
+
+## Testing
+[38 lines of test requirements]
+
+## Deployment
+[41 lines of deployment procedures]
+
+## Security Rules
+[55 lines of security requirements]
+
+## Team Conventions
+[33 lines of coding standards]
+
+## Git Workflow
+[28 lines of branching rules]
+
+## Troubleshooting
+[8 lines of common issues]
+```
+
+**Problem**: Claude loads all 387 lines even when user asks: "Add a new API endpoint for user profile"
+
+**After** (modular):
+```
+CLAUDE.md (82 lines)          # Core context: tech stack, commands, rules
+CLAUDE-auth.md (45 lines)     # Authentication spec only
+CLAUDE-api.md (67 lines)      # API contracts only
+CLAUDE-database.md (52 lines) # Database schema only
+CLAUDE-testing.md (38 lines)  # Test requirements only
+CLAUDE-deploy.md (41 lines)   # Deployment procedures only
+CLAUDE-security.md (55 lines) # Security requirements only
+```
+
+**Benefit**: Claude loads CLAUDE.md (82 lines) + CLAUDE-api.md (67 lines) = 149 lines (61% reduction)
+
+### Split Strategy
+
+**Step 1: Identify Domains**
+
+Look for natural boundaries in your spec:
+- Do these sections serve different purposes?
+- Would different team members own different sections?
+- Are some sections referenced more frequently than others?
+
+**Step 2: Extract to Focused Files**
+
+Move domain-specific content to dedicated files:
+
+```bash
+# Keep in CLAUDE.md (always loaded)
+- Tech stack (unchanging baseline)
+- Daily commands (frequent reference)
+- Universal rules (apply to all work)
+
+# Extract to domain files (load on demand)
+- Feature specs → CLAUDE-[feature].md
+- API contracts → CLAUDE-api.md
+- Testing → CLAUDE-testing.md
+- Deployment → CLAUDE-deploy.md
+```
+
+**Step 3: Create Index in Main CLAUDE.md**
+
+```markdown
+# Project: [NAME]
+
+## Tech Stack
+[Core technologies]
+
+## Commands
+[Daily commands]
+
+## Rules
+[Universal rules]
+
+## Detailed Specifications
+Reference these files for domain-specific requirements:
+- @CLAUDE-auth.md — Authentication & authorization
+- @CLAUDE-api.md — API endpoint contracts
+- @CLAUDE-database.md — Schema and migrations
+- @CLAUDE-testing.md — Test requirements
+- @CLAUDE-deploy.md — Deployment procedures
+- @CLAUDE-security.md — Security requirements
+```
+
+**Step 4: Reference When Needed**
+
+Claude can reference specific files:
+```
+User: "Add a new API endpoint for user settings"
+Claude: Reads CLAUDE.md + @CLAUDE-api.md (relevant context only)
+```
+
+### Maintenance Rules
+
+1. **Keep CLAUDE.md <100 lines** (core context only)
+2. **Domain files <150 lines each** (if bigger, split further)
+3. **Review quarterly**: Merge rarely-used files, split frequently-updated sections
+4. **Use @file references**: Explicitly load what you need
+
+### Migration Checklist
+
+- [ ] Identify domains in current CLAUDE.md (>200 lines?)
+- [ ] Create domain-specific files (CLAUDE-[domain].md)
+- [ ] Move content to focused files
+- [ ] Update main CLAUDE.md with index/references
+- [ ] Test: Ask Claude to perform domain-specific task
+- [ ] Verify: Check context usage with `/status`
+- [ ] Document: Update team on new structure
+
+**Source**: Addy Osmani, ["How to write a good spec for AI agents"](https://addyosmani.com/blog/good-spec/) (Jan 2026)
+
+---
+
 ## See Also

 - [../methodologies.md](../methodologies.md) — SDD and other methodologies