docs: add Addy Osmani spec-writing evaluation (4/5) + spec-first.md sections
Integration of "How to write a good spec for AI agents" by Addy Osmani: Evaluation (docs/resource-evaluations/addy-osmani-good-spec.md): - Score: 4/5 (High Value - Integrate within 1 week) - Fills gaps: modular design, operational boundaries, command specs - Fact-checked: credentials verified via Perplexity, all claims sourced - Challenge phase: technical-writer agent corrected initial 3/5 → 4/5 Spec-First Workflow Updates (guide/workflows/spec-first.md): - NEW: "Modular Spec Design" section (~50 lines, line 322) Pattern: Split large specs into focused files (CLAUDE-[domain].md) - NEW: "Operational Boundaries" section (~60 lines, line 372) Three-tier system: Always/Ask First/Never → maps to Claude Code modes - NEW: "Command Spec Template" section (~40 lines, line 432) Executable command specs with expected outputs & error handling - NEW: "Anti-Pattern: Monolithic CLAUDE.md" section (~30 lines, line 472) Explains cognitive load problem (>200 lines = context pollution) Reference Index (machine-readable/reference.yaml): - 8 new entries: spec_first_workflow → spec_osmani_score - Links to new spec-first.md sections with line numbers - Source attribution: https://addyosmani.com/blog/good-spec/ Public Facing (README.md): - Incremented resource evaluations count: 35 → 36 File growth: spec-first.md 327 → 507 lines (+180) Source: Addy Osmani (former Chrome team, 14y), published Jan 13, 2026 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
parent
bc86c8ed7f
commit
a5942f1c53
4 changed files with 729 additions and 1 deletions
|
|
@ -318,6 +318,504 @@ Review the spec in CLAUDE.md and create an implementation plan.
|
|||
|
||||
---
|
||||
|
||||
## Modular Spec Design
|
||||
|
||||
**Pattern**: Break large specifications into multiple focused files instead of cramming everything into a single CLAUDE.md.
|
||||
|
||||
### The Problem: Monolithic CLAUDE.md
|
||||
|
||||
When specs exceed ~200 lines, several issues emerge:
|
||||
|
||||
- **Context pollution**: Claude struggles to extract relevant information from bloated context
|
||||
- **Cognitive overload**: Developers can't quickly scan for what they need
|
||||
- **Maintenance burden**: Updating one area requires navigating unrelated sections
|
||||
- **Performance degradation**: Large CLAUDE.md files slow down context loading and processing
|
||||
|
||||
### When to Split
|
||||
|
||||
| Threshold | Action |
|
||||
|-----------|--------|
|
||||
| **<100 lines** | Single CLAUDE.md is fine |
|
||||
| **100-200 lines** | Consider splitting if distinct domains exist |
|
||||
| **>200 lines** | **Split immediately** — you're past the cognitive load threshold |
|
||||
| **Multi-team projects** | Split by domain/ownership regardless of size |
|
||||
|
||||
### Split Strategies
|
||||
|
||||
**1. Feature-Based Split**
|
||||
|
||||
```
|
||||
CLAUDE.md # Core project context
|
||||
CLAUDE-auth.md # Authentication spec
|
||||
CLAUDE-api.md # API endpoints spec
|
||||
CLAUDE-billing.md # Payment processing spec
|
||||
```
|
||||
|
||||
**2. Role-Based Split**
|
||||
|
||||
```
|
||||
CLAUDE.md # Shared conventions
|
||||
CLAUDE-frontend.md # UI/UX specifications
|
||||
CLAUDE-backend.md # API/database specs
|
||||
CLAUDE-infra.md # DevOps/deployment specs
|
||||
```
|
||||
|
||||
**3. Workflow-Based Split**
|
||||
|
||||
```
|
||||
CLAUDE.md # Daily development rules
|
||||
CLAUDE-testing.md # Test specifications
|
||||
CLAUDE-release.md # Release process spec
|
||||
CLAUDE-security.md # Security requirements
|
||||
```
|
||||
|
||||
### Implementation Pattern
|
||||
|
||||
**Main CLAUDE.md** (stays concise):
|
||||
```markdown
|
||||
# Project: [NAME]
|
||||
|
||||
## Tech Stack
|
||||
[Core technologies]
|
||||
|
||||
## Commands
|
||||
[Daily commands]
|
||||
|
||||
## Rules
|
||||
[Universal rules]
|
||||
|
||||
## Detailed Specs
|
||||
- Authentication: See @CLAUDE-auth.md
|
||||
- API Design: See @CLAUDE-api.md
|
||||
- Testing: See @CLAUDE-testing.md
|
||||
```
|
||||
|
||||
**CLAUDE-auth.md** (focused spec):
|
||||
```markdown
|
||||
# Authentication Specification
|
||||
|
||||
## Capabilities
|
||||
- MUST: JWT-based authentication
|
||||
- MUST: Refresh token rotation
|
||||
- MUST NOT: Store tokens in localStorage
|
||||
|
||||
## API Contract
|
||||
[Detailed auth endpoints...]
|
||||
|
||||
## Security Requirements
|
||||
[Specific auth security rules...]
|
||||
```
|
||||
|
||||
**Benefits**:
|
||||
- Claude can reference specific files with `@CLAUDE-auth.md`
|
||||
- Faster context loading (only relevant specs)
|
||||
- Easier maintenance (edit one domain without affecting others)
|
||||
- Better team collaboration (ownership per spec file)
|
||||
|
||||
**Source**: Addy Osmani, ["How to write a good spec for AI agents"](https://addyosmani.com/blog/good-spec/) (Jan 2026)
|
||||
|
||||
---
|
||||
|
||||
## Operational Boundaries
|
||||
|
||||
**Pattern**: Define explicit boundaries for what AI agents should do automatically, ask about, or never touch.
|
||||
|
||||
### The Three-Tier System
|
||||
|
||||
Traditional specs use binary constraints (MUST/MUST NOT), but operational work requires three levels:
|
||||
|
||||
| Tier | Meaning | Claude Code Mapping |
|
||||
|------|---------|---------------------|
|
||||
| **Always** | Execute automatically without asking | Auto-accept mode |
|
||||
| **Ask First** | Get user confirmation before proceeding | Default mode |
|
||||
| **Never** | Block or require Plan Mode | Plan mode / Hook blocking |
|
||||
|
||||
### Operational Boundaries Template
|
||||
|
||||
```markdown
|
||||
## Boundaries
|
||||
|
||||
### Always (Auto-accept)
|
||||
- Run tests after code changes
|
||||
- Format code with Prettier
|
||||
- Update imports when moving files
|
||||
- Fix linting errors
|
||||
- Add type annotations for untyped code
|
||||
|
||||
### Ask First (Confirm)
|
||||
- Modify database schemas
|
||||
- Add new dependencies
|
||||
- Change API contracts
|
||||
- Refactor >50 lines of code
|
||||
- Update configuration files
|
||||
|
||||
### Never (Block)
|
||||
- Push to production branch
|
||||
- Commit secrets or API keys
|
||||
- Delete data without backup
|
||||
- Modify CI/CD workflows without review
|
||||
- Bypass security checks
|
||||
```
|
||||
|
||||
### Mapping to Claude Code Permissions
|
||||
|
||||
**Always → Auto-accept Mode** (Shift+Tab):
|
||||
```json
|
||||
{
|
||||
"allowedPrompts": {
|
||||
"Bash": [
|
||||
"run tests",
|
||||
"format code",
|
||||
"check types"
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Ask First → Default Mode**:
|
||||
- Standard behavior, prompts for every action
|
||||
- Use for actions with moderate risk/impact
|
||||
|
||||
**Never → Plan Mode + Hooks**:
|
||||
```bash
|
||||
# .claude/hooks/pre-tool-use.sh
|
||||
#!/bin/bash
|
||||
if [[ "$TOOL_USE_TOOL_NAME" == "Bash" ]] && [[ "$TOOL_USE_INPUT" =~ "git push origin main" ]]; then
|
||||
echo "ERROR: Direct push to main blocked. Use feature branches."
|
||||
exit 2 # Block the action
|
||||
fi
|
||||
```
|
||||
|
||||
### Decision Framework
|
||||
|
||||
Ask yourself for each action:
|
||||
1. **Can it cause data loss?** → Ask First or Never
|
||||
2. **Is it reversible with git?** → Maybe Always
|
||||
3. **Does it affect other developers?** → Ask First
|
||||
4. **Is it a security risk?** → Never
|
||||
5. **Is it part of the standard workflow?** → Always
|
||||
|
||||
### Example: API Development
|
||||
|
||||
```markdown
|
||||
### Always
|
||||
- Run unit tests (npm test)
|
||||
- Validate request schemas
|
||||
- Generate API documentation
|
||||
- Check response formats
|
||||
|
||||
### Ask First
|
||||
- Add new API endpoints
|
||||
- Change existing endpoint signatures
|
||||
- Modify authentication requirements
|
||||
- Update rate limiting rules
|
||||
|
||||
### Never
|
||||
- Expose internal endpoints publicly
|
||||
- Log sensitive user data
|
||||
- Disable authentication checks
|
||||
- Remove rate limiting
|
||||
```
|
||||
|
||||
### Maintenance
|
||||
|
||||
Review boundaries quarterly:
|
||||
- **Promote**: Actions that never caused issues (Ask First → Always)
|
||||
- **Demote**: Actions that caused problems (Always → Ask First)
|
||||
- **Block**: Repeated mistakes (Ask First → Never)
|
||||
|
||||
**Source**: Addy Osmani, ["How to write a good spec for AI agents"](https://addyosmani.com/blog/good-spec/) (Jan 2026)
|
||||
|
||||
---
|
||||
|
||||
## Command Spec Template
|
||||
|
||||
**Pattern**: Document executable commands with expected outputs and error handling.
|
||||
|
||||
### Why Command Specs Matter
|
||||
|
||||
Most specs focus on **features** ("build authentication"), but **commands** ("how to test authentication") are equally critical for AI agents.
|
||||
|
||||
### Template Structure
|
||||
|
||||
```markdown
|
||||
## Commands
|
||||
|
||||
### [Command Category]
|
||||
|
||||
**Purpose**: [What this command accomplishes]
|
||||
|
||||
#### Command: `[actual command]`
|
||||
**When**: [Trigger condition]
|
||||
**Expected Output**: [What success looks like]
|
||||
**Error Handling**: [What to do on failure]
|
||||
**Flags**: [Important options]
|
||||
|
||||
---
|
||||
```
|
||||
|
||||
### Example: Testing Commands
|
||||
|
||||
```markdown
|
||||
## Commands
|
||||
|
||||
### Testing
|
||||
|
||||
#### Command: `pnpm test`
|
||||
**When**: Before every commit, after code changes
|
||||
**Expected Output**:
|
||||
- All tests pass (exit code 0)
|
||||
- Coverage ≥80% (lines, branches, functions)
|
||||
- No console warnings
|
||||
**Error Handling**:
|
||||
- If tests fail → Fix tests, don't skip
|
||||
- If coverage drops → Add tests for uncovered code
|
||||
- If warnings appear → Investigate before committing
|
||||
**Flags**:
|
||||
- `--coverage`: Generate coverage report
|
||||
- `--watch`: Run in watch mode for development
|
||||
- `--silent`: Suppress console output
|
||||
|
||||
#### Command: `pnpm test:e2e`
|
||||
**When**: Before merging to main, in CI pipeline
|
||||
**Expected Output**:
|
||||
- All E2E scenarios pass
|
||||
- Screenshots captured for failures
|
||||
- Test duration <5 minutes
|
||||
**Error Handling**:
|
||||
- If flaky → Investigate race conditions, don't retry blindly
|
||||
- If timeout → Check network mocks, async handling
|
||||
- If screenshots differ → Review UI changes deliberately
|
||||
**Flags**:
|
||||
- `--headed`: Run with visible browser (debugging)
|
||||
- `--project chromium`: Test specific browser
|
||||
```
|
||||
|
||||
### Example: Build & Deployment
|
||||
|
||||
```markdown
|
||||
## Commands
|
||||
|
||||
### Build
|
||||
|
||||
#### Command: `pnpm build`
|
||||
**When**: Before deployment, in CI pipeline
|
||||
**Expected Output**:
|
||||
- Build succeeds (exit code 0)
|
||||
- Output in `dist/` directory
|
||||
- No TypeScript errors
|
||||
- Bundle size <500KB (main chunk)
|
||||
**Error Handling**:
|
||||
- If TypeScript errors → Fix types, don't use `@ts-ignore`
|
||||
- If bundle too large → Analyze with `pnpm analyze`, code-split
|
||||
- If missing assets → Check public/ directory, update paths
|
||||
**Flags**:
|
||||
- `--mode production`: Production optimizations
|
||||
- `--analyze`: Generate bundle size report
|
||||
|
||||
### Deployment
|
||||
|
||||
#### Command: `pnpm deploy:staging`
|
||||
**When**: After PR approval, before production
|
||||
**Expected Output**:
|
||||
- Deployment succeeds
|
||||
- Health check returns 200 OK
|
||||
- Staging URL: https://staging.example.com
|
||||
**Error Handling**:
|
||||
- If health check fails → Rollback automatically
|
||||
- If database migration fails → Don't proceed, investigate
|
||||
- If environment vars missing → Check .env.staging, update secrets
|
||||
**Never**: Run `pnpm deploy:production` manually — use CI/CD only
|
||||
```
|
||||
|
||||
### Example: Database Commands
|
||||
|
||||
```markdown
|
||||
## Commands
|
||||
|
||||
### Database
|
||||
|
||||
#### Command: `pnpm db:migrate`
|
||||
**When**: After pulling schema changes, before development
|
||||
**Expected Output**:
|
||||
- Migrations applied successfully
|
||||
- Database schema matches models
|
||||
- Seed data loaded (development only)
|
||||
**Error Handling**:
|
||||
- If migration fails → Check database connection, review SQL
|
||||
- If conflicts detected → Resolve migrations, don't force
|
||||
**Never**: Run migrations in production manually — CI/CD only
|
||||
|
||||
#### Command: `pnpm db:reset`
|
||||
**When**: Development only, never in staging/production
|
||||
**Expected Output**:
|
||||
- Database dropped and recreated
|
||||
- All migrations applied
|
||||
- Seed data loaded
|
||||
**Error Handling**:
|
||||
- If production check fails → Abort immediately, verify environment
|
||||
**Safety**: Requires `NODE_ENV=development` check
|
||||
```
|
||||
|
||||
### Integration with CLAUDE.md
|
||||
|
||||
Reference command specs in your main CLAUDE.md:
|
||||
|
||||
```markdown
|
||||
## Commands
|
||||
- Build: `pnpm build` (see spec for error handling)
|
||||
- Test: `pnpm test` (must pass before commit)
|
||||
- Deploy: See CLAUDE-deployment.md for full procedures
|
||||
```
|
||||
|
||||
**Source**: Addy Osmani, ["How to write a good spec for AI agents"](https://addyosmani.com/blog/good-spec/) (Jan 2026)
|
||||
|
||||
---
|
||||
|
||||
## Anti-Pattern: Monolithic CLAUDE.md
|
||||
|
||||
### The Problem
|
||||
|
||||
**Symptom**: Your CLAUDE.md has grown to 300+ lines, mixing feature specs, API contracts, testing requirements, deployment procedures, and team conventions.
|
||||
|
||||
**Impact**:
|
||||
- **Context inefficiency**: Claude loads entire 300 lines for every request, even for simple tasks
|
||||
- **Slow response time**: Large context = slower processing
|
||||
- **Reduced accuracy**: Important details get lost in noise
|
||||
- **Maintenance overhead**: Updating one section requires navigating unrelated content
|
||||
- **Team friction**: Multiple developers editing same file = merge conflicts
|
||||
|
||||
### Real-World Example
|
||||
|
||||
**Before** (monolithic):
|
||||
```markdown
|
||||
# CLAUDE.md (387 lines)
|
||||
|
||||
## Tech Stack
|
||||
[20 lines]
|
||||
|
||||
## Authentication
|
||||
[45 lines of auth spec]
|
||||
|
||||
## API Endpoints
|
||||
[67 lines of API contracts]
|
||||
|
||||
## Database Schema
|
||||
[52 lines of schema rules]
|
||||
|
||||
## Testing
|
||||
[38 lines of test requirements]
|
||||
|
||||
## Deployment
|
||||
[41 lines of deployment procedures]
|
||||
|
||||
## Security Rules
|
||||
[55 lines of security requirements]
|
||||
|
||||
## Team Conventions
|
||||
[33 lines of coding standards]
|
||||
|
||||
## Git Workflow
|
||||
[28 lines of branching rules]
|
||||
|
||||
## Troubleshooting
|
||||
[8 lines of common issues]
|
||||
```
|
||||
|
||||
**Problem**: Claude loads all 387 lines even when user asks: "Add a new API endpoint for user profile"
|
||||
|
||||
**After** (modular):
|
||||
```
|
||||
CLAUDE.md (82 lines) # Core context: tech stack, commands, rules
|
||||
CLAUDE-auth.md (45 lines) # Authentication spec only
|
||||
CLAUDE-api.md (67 lines) # API contracts only
|
||||
CLAUDE-database.md (52 lines) # Database schema only
|
||||
CLAUDE-testing.md (38 lines) # Test requirements only
|
||||
CLAUDE-deploy.md (41 lines) # Deployment procedures only
|
||||
CLAUDE-security.md (55 lines) # Security requirements only
|
||||
```
|
||||
|
||||
**Benefit**: Claude loads CLAUDE.md (82 lines) + CLAUDE-api.md (67 lines) = 149 lines (61% reduction)
|
||||
|
||||
### Split Strategy
|
||||
|
||||
**Step 1: Identify Domains**
|
||||
|
||||
Look for natural boundaries in your spec:
|
||||
- Do these sections serve different purposes?
|
||||
- Would different team members own different sections?
|
||||
- Are some sections referenced more frequently than others?
|
||||
|
||||
**Step 2: Extract to Focused Files**
|
||||
|
||||
Move domain-specific content to dedicated files:
|
||||
|
||||
```bash
|
||||
# Keep in CLAUDE.md (always loaded)
|
||||
- Tech stack (unchanging baseline)
|
||||
- Daily commands (frequent reference)
|
||||
- Universal rules (apply to all work)
|
||||
|
||||
# Extract to domain files (load on demand)
|
||||
- Feature specs → CLAUDE-[feature].md
|
||||
- API contracts → CLAUDE-api.md
|
||||
- Testing → CLAUDE-testing.md
|
||||
- Deployment → CLAUDE-deploy.md
|
||||
```
|
||||
|
||||
**Step 3: Create Index in Main CLAUDE.md**
|
||||
|
||||
```markdown
|
||||
# Project: [NAME]
|
||||
|
||||
## Tech Stack
|
||||
[Core technologies]
|
||||
|
||||
## Commands
|
||||
[Daily commands]
|
||||
|
||||
## Rules
|
||||
[Universal rules]
|
||||
|
||||
## Detailed Specifications
|
||||
Reference these files for domain-specific requirements:
|
||||
- @CLAUDE-auth.md — Authentication & authorization
|
||||
- @CLAUDE-api.md — API endpoint contracts
|
||||
- @CLAUDE-database.md — Schema and migrations
|
||||
- @CLAUDE-testing.md — Test requirements
|
||||
- @CLAUDE-deploy.md — Deployment procedures
|
||||
- @CLAUDE-security.md — Security requirements
|
||||
```
|
||||
|
||||
**Step 4: Reference When Needed**
|
||||
|
||||
Claude can reference specific files:
|
||||
```
|
||||
User: "Add a new API endpoint for user settings"
|
||||
Claude: Reads CLAUDE.md + @CLAUDE-api.md (relevant context only)
|
||||
```
|
||||
|
||||
### Maintenance Rules
|
||||
|
||||
1. **Keep CLAUDE.md <100 lines** (core context only)
|
||||
2. **Domain files <150 lines each** (if bigger, split further)
|
||||
3. **Review quarterly**: Merge rarely-used files, split frequently-updated sections
|
||||
4. **Use @file references**: Explicitly load what you need
|
||||
|
||||
### Migration Checklist
|
||||
|
||||
- [ ] Identify domains in current CLAUDE.md (>200 lines?)
|
||||
- [ ] Create domain-specific files (CLAUDE-[domain].md)
|
||||
- [ ] Move content to focused files
|
||||
- [ ] Update main CLAUDE.md with index/references
|
||||
- [ ] Test: Ask Claude to perform domain-specific task
|
||||
- [ ] Verify: Check context usage with `/status`
|
||||
- [ ] Document: Update team on new structure
|
||||
|
||||
**Source**: Addy Osmani, ["How to write a good spec for AI agents"](https://addyosmani.com/blog/good-spec/) (Jan 2026)
|
||||
|
||||
---
|
||||
|
||||
## See Also
|
||||
|
||||
- [../methodologies.md](../methodologies.md) — SDD and other methodologies
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue