New: interactive configurator at cc.bruniaux.com/context/ that generates a personalized CLAUDE.md starter kit based on team size, stack, and current setup. Multi-step flow (profile, current state, stack, results) with maturity scoring (Level 1-5), copy-to-clipboard artifacts, localStorage persistence. Guide content: - guide/core/context-engineering.md (1,188 lines, 8 sections): context budget, 150-instruction ceiling, modular architecture, team assembly, ACE pipeline, quality measurement, context reduction techniques - examples/context-engineering/ (10 templates): assembler.ts, profile-template.yaml, skeleton-template.md, canary-check.sh, ci-drift-check.yml, eval-questions.yaml, context-budget-calculator.sh, rules/knowledge-feeding.md, rules/update-loop-retro.md - tools/context-audit-prompt.md (543 lines): 8-dimension scoring /100 Navigation: guide/README.md, machine-readable/reference.yaml (24 new entries) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
173 lines
8.3 KiB
YAML
173 lines
8.3 KiB
YAML
# Context Engineering Self-Evaluation
|
|
#
|
|
# Use these questions to audit your CLAUDE.md periodically.
|
|
# Score each question: pass (1) | partial (0.5) | fail (0)
|
|
# Target score: 16+ / 20
|
|
#
|
|
# Run this quarterly, or after any significant project change.
|
|
# Low-scoring areas tell you exactly where to invest time.
|
|
|
|
meta:
|
|
version: "1.0"
|
|
total_questions: 20
|
|
passing_threshold: 16
|
|
scoring:
|
|
pass: 1.0
|
|
partial: 0.5
|
|
fail: 0.0
|
|
|
|
questions:
|
|
# ── Coverage: Does CLAUDE.md contain the right information? ──────────────────
|
|
coverage:
|
|
description: "Checks that essential information is present"
|
|
questions:
|
|
- id: C1
|
|
weight: 1.0
|
|
question: "Does CLAUDE.md explain what this project does in 2-3 sentences?"
|
|
failing_indicator: "No Project Overview section, or it's vague ('a web app')"
|
|
passing_indicator: "Clear statement of purpose, audience, and deployment target"
|
|
|
|
- id: C2
|
|
weight: 1.0
|
|
question: "Are the primary tech stack and versions documented?"
|
|
failing_indicator: "No version numbers, or stack listed without context"
|
|
passing_indicator: "Table or list with framework + version for each layer"
|
|
|
|
- id: C3
|
|
weight: 1.0
|
|
question: "Are coding standards specific enough to be actionable?"
|
|
failing_indicator: "Rules like 'write clean code' or 'follow best practices'"
|
|
passing_indicator: "Rules like 'use zod for external input', 'no default exports'"
|
|
|
|
- id: C4
|
|
weight: 1.0
|
|
question: "Are deployment environments and commands documented?"
|
|
failing_indicator: "No deployment section, or just 'deploy to production'"
|
|
passing_indicator: "Environments listed, deploy commands specified"
|
|
|
|
- id: C5
|
|
weight: 1.0
|
|
question: "Are there explicit anti-patterns (what NOT to do)?"
|
|
failing_indicator: "No negative rules at all"
|
|
passing_indicator: "At least 3 explicit prohibitions with reasons"
|
|
|
|
# ── Quality: Are the rules well-written? ────────────────────────────────────
|
|
quality:
|
|
description: "Checks that rules are clear, non-redundant, and manageable"
|
|
questions:
|
|
- id: Q1
|
|
weight: 1.0
|
|
question: "Can each rule be followed without reading the rest of the file?"
|
|
failing_indicator: "Rules like 'follow the patterns established earlier'"
|
|
passing_indicator: "Each rule is self-contained and unambiguous"
|
|
|
|
- id: Q2
|
|
weight: 1.0
|
|
question: "Are there fewer than 150 total rules or instructions?"
|
|
failing_indicator: "More than 150 bullet points / numbered items"
|
|
passing_indicator: "Compact set that fits in working memory"
|
|
note: "150 is the adherence ceiling — beyond this, Claude starts dropping rules"
|
|
|
|
- id: Q3
|
|
weight: 1.0
|
|
question: "Does the file have fewer than 500 lines?"
|
|
failing_indicator: "File exceeds 500 lines"
|
|
passing_indicator: "Under 500 lines, or split into @imported modules"
|
|
note: "Long files signal either verbosity or lack of modularization"
|
|
|
|
- id: Q4
|
|
weight: 1.0
|
|
question: "Are rules specific and measurable rather than vague?"
|
|
failing_indicator: "Rules that require judgment ('write readable code')"
|
|
passing_indicator: "Rules that produce consistent output across different prompts"
|
|
|
|
- id: Q5
|
|
weight: 1.0
|
|
question: "Does each rule appear only once (no semantic duplicates)?"
|
|
failing_indicator: "Same constraint expressed in multiple ways across sections"
|
|
passing_indicator: "No redundancy — each constraint stated exactly once"
|
|
|
|
# ── Adherence: Is Claude actually following the rules? ───────────────────────
|
|
adherence:
|
|
description: "Behavioral checks — requires reviewing actual Claude outputs"
|
|
questions:
|
|
- id: A1
|
|
weight: 1.0
|
|
question: "Do Claude's outputs consistently follow the stated coding standards?"
|
|
how_to_check: "Ask Claude to write a new function and check it against your standards"
|
|
failing_indicator: "Claude uses patterns explicitly prohibited in CLAUDE.md"
|
|
passing_indicator: "Outputs match stated standards without prompting"
|
|
|
|
- id: A2
|
|
weight: 1.0
|
|
question: "Does Claude correctly use the specified libraries and avoid alternatives?"
|
|
how_to_check: "Ask Claude to add input validation — does it use your specified library?"
|
|
failing_indicator: "Claude suggests alternatives you've explicitly excluded"
|
|
passing_indicator: "Claude reaches for the specified tools automatically"
|
|
|
|
- id: A3
|
|
weight: 1.0
|
|
question: "Does Claude follow the git conventions when generating commits or PRs?"
|
|
how_to_check: "Ask Claude to write a commit message and check the format"
|
|
failing_indicator: "Wrong format, missing type prefix, wrong capitalization"
|
|
passing_indicator: "Commits match your conventional commits format exactly"
|
|
|
|
- id: A4
|
|
weight: 1.0
|
|
question: "Does Claude avoid the listed anti-patterns?"
|
|
how_to_check: "Introduce a scenario where the anti-pattern would be tempting"
|
|
failing_indicator: "Claude suggests a prohibited pattern without flagging it"
|
|
passing_indicator: "Claude avoids the pattern or explicitly explains why it's excluded"
|
|
|
|
- id: A5
|
|
weight: 1.0
|
|
question: "When given ambiguous instructions, does Claude make the right default choice?"
|
|
how_to_check: "Give a vague task ('add a cache') and see what Claude picks"
|
|
failing_indicator: "Claude invents a default that contradicts your architecture"
|
|
passing_indicator: "Claude's default aligns with your documented decisions"
|
|
|
|
# ── Maintenance: Is the file healthy over time? ──────────────────────────────
|
|
maintenance:
|
|
description: "Checks that CLAUDE.md stays accurate as the project evolves"
|
|
questions:
|
|
- id: M1
|
|
weight: 1.0
|
|
question: "Was CLAUDE.md updated in the last 90 days?"
|
|
how_to_check: "git log --format='%ar' -- CLAUDE.md | head -1"
|
|
failing_indicator: "Last update more than 90 days ago"
|
|
passing_indicator: "Updated within 90 days, or project genuinely unchanged"
|
|
|
|
- id: M2
|
|
weight: 1.0
|
|
question: "Are there no rules about deprecated technologies still in the file?"
|
|
how_to_check: "Scan for library names — are any of them removed from package.json?"
|
|
failing_indicator: "Rules referencing libraries or patterns no longer in use"
|
|
passing_indicator: "All rules apply to the current actual state of the codebase"
|
|
|
|
- id: M3
|
|
weight: 1.0
|
|
question: "Do all linked files or @imports actually exist?"
|
|
how_to_check: "Run canary-check.sh or check each @import manually"
|
|
failing_indicator: "Broken @import paths or links to non-existent docs"
|
|
passing_indicator: "All references resolve to real files"
|
|
|
|
- id: M4
|
|
weight: 1.0
|
|
question: "Have team members other than the original author contributed to or reviewed the file?"
|
|
failing_indicator: "Single author, no review, rules reflect one person's preferences"
|
|
passing_indicator: "At least one other team member has reviewed or edited"
|
|
|
|
- id: M5
|
|
weight: 1.0
|
|
question: "Is the file tracked in git with meaningful commit messages?"
|
|
how_to_check: "git log --oneline -- CLAUDE.md"
|
|
failing_indicator: "Not tracked, or all commits are 'update CLAUDE.md'"
|
|
passing_indicator: "Tracked with descriptive history (e.g., 'context: add zod validation rule')"
|
|
|
|
# ── Scoring Guide ─────────────────────────────────────────────────────────────
|
|
scoring_guide:
|
|
"18-20": "Excellent — your context is production-grade, maintain it quarterly"
|
|
"16-17": "Good — minor gaps, address the failing questions this sprint"
|
|
"13-15": "Needs work — coverage or quality issues affecting Claude's output"
|
|
"10-12": "Significant issues — likely causing frequent corrections or rework"
|
|
"0-9": "Critical — CLAUDE.md is not serving its purpose, rebuild from skeleton"
|