release: v3.36.0 — 3 AI roles + ContextOps + comprehension debt
Added:
- guide/roles/ai-roles.md: §14 MLOps Engineer, §15 AI Developer Advocate,
§16 AI Orchestration Engineer with full profiles (responsibilities, skills,
entry paths, salary benchmarks, career matrix rows)
- 4 resource evaluations (Packmind ContextOps, comprehension debt,
Addy Osmani agents.md anti-pattern, Claude Swarm Monitor)
Changed:
- guide/roles/ai-roles.md: ToC renumbered, Career Decision Matrix +3 rows,
Salary Benchmarks +3 rows, removed "Orchestration Engineer" from What's Not a Role
- docs/for-cto.md, for-cio-ceo.md, for-tech-leads.md: updated docs positioning
- guide/ecosystem: mcp-servers-ecosystem.md + third-party-tools.md updates
- guide/roles/learning-with-ai.md: content updates
Bump: 3.35.0 → 3.36.0 (VERSION, README, cheatsheet, ultimate-guide, reference.yaml,
llms.txt, llms-full.txt, machine-readable/llms.txt)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
c2a642dabe
commit
19bdc910cc
19 changed files with 669 additions and 48 deletions
|
|
@ -48,7 +48,7 @@ This is not a chatbot. It's a production tool.
|
|||
|
||||
**License**: $100/month per developer (Claude Max). For a team of 10, that's $1,000/month — less than 2 days of external consulting.
|
||||
|
||||
**Training**: one structured day is enough for a team of 10 to 15 people. A free Brown Bag Lunch (1h) lets you test team interest before committing to anything.
|
||||
**Training**: one structured day is enough for a team of 10 to 15 people. A free Brown Bag Lunch (1h) lets you test team interest before committing to anything — and costs you nothing, I do those for the networking and the challenge.
|
||||
|
||||
**Doing nothing**: your developers use unvetted free tools, with no data policy, no audit trail. That scenario carries more risk than structured adoption.
|
||||
|
||||
|
|
@ -70,11 +70,11 @@ This is not a chatbot. It's a production tool.
|
|||
|
||||
**Option 1 — You want to understand before deciding**: ask your CTO for a 30-minute demo on a real use case from your codebase.
|
||||
|
||||
**Option 2 — You want to move fast**: a free Brown Bag Lunch (1h, in-person or remote) covers the fundamentals for your executive and technical teams simultaneously.
|
||||
**Option 2 — You want to move fast**: a free Brown Bag Lunch (1h, in-person or remote) covers the fundamentals for your executive and technical teams simultaneously. Free — I do these for the networking and the challenge.
|
||||
|
||||
**Option 3 — You already have teams using it**: a configuration audit (half-day) identifies active risks and optimization opportunities.
|
||||
|
||||
→ [Contact Florian Bruniaux](https://florianbruniaux.github.io/claude-code-ultimate-guide-landing/) — availability and pricing
|
||||
→ [Contact Florian Bruniaux](https://florian.bruniaux.com/) — availability and, depending on the mission, potentially pricing
|
||||
|
||||
---
|
||||
|
||||
|
|
|
|||
|
|
@ -91,11 +91,12 @@ The real cost isn't the subscription — it's unstructured adoption creating sec
|
|||
|
||||
If you want to accelerate adoption or get an independent assessment of your current setup:
|
||||
|
||||
**Brown Bag Lunch (1h, free)** — executive + team intro, live demo, Q&A
|
||||
**Config audit (half-day)** — review your current setup against security and productivity standards
|
||||
**Team formation (1-3 days)** — hands-on training, your codebase, your workflows, measurable outcomes
|
||||
**Brown Bag Lunch, talk, or panel (1-3h, free)** — executive + team intro, live demo, Q&A, or speaker slot. I do these for the pleasure of it — getting challenged, sharing what I know, building network. No strings attached.
|
||||
|
||||
→ [Contact Florian Bruniaux](https://florianbruniaux.github.io/claude-code-ultimate-guide-landing/) for availability and pricing
|
||||
**Config audit (half-day)** — review your current setup against security and productivity standards.
|
||||
**Team formation (1-3 days)** — hands-on training, your codebase, your workflows, measurable outcomes. Not something I'm actively seeking right now, but I'm open to the right conversation.
|
||||
|
||||
→ [Contact Florian Bruniaux](https://florian.bruniaux.com/) for availability and, depending on the mission, pricing
|
||||
|
||||
---
|
||||
|
||||
|
|
|
|||
|
|
@ -81,11 +81,11 @@ Full coverage in WP03 — Security and WP06 — Privacy *(whitepapers, coming so
|
|||
|
||||
If you want structured onboarding rather than self-learning:
|
||||
|
||||
- **Brown Bag Lunch (1h, free)** — intro session covering core concepts + team config live
|
||||
- **Team formation (1-2 days)** — hands-on, your codebase, your workflows
|
||||
- **Config audit** — review your current setup against security and productivity best practices
|
||||
- **Brown Bag Lunch, talk, or panel (1-3h, free)** — intro session, live demo, or speaker slot. Done for the pleasure of it: sharing, getting challenged, building network.
|
||||
- **Config audit** — review your current setup against security and productivity best practices.
|
||||
- **Team formation (1-2 days)** — hands-on, your codebase, your workflows. Not something I'm actively looking for, but open to the right conversation.
|
||||
|
||||
→ [Contact Florian Bruniaux](https://florianbruniaux.github.io/claude-code-ultimate-guide-landing/) for availability
|
||||
→ [Contact Florian Bruniaux](https://florian.bruniaux.com/) for availability and, depending on the mission, potentially pricing
|
||||
|
||||
---
|
||||
|
||||
|
|
|
|||
|
|
@ -0,0 +1,195 @@
|
|||
# Evaluation: Addy Osmani — Stop Using /init for AGENTS.md
|
||||
|
||||
**Resource Type**: Blog Article (Research Synthesis + Practitioner Guidance)
|
||||
**Author**: Addy Osmani (Director, Google Cloud AI)
|
||||
**Date**: February 23, 2026
|
||||
**Source**: LinkedIn post + full article (https://lnkd.in/gkmZ3HJs)
|
||||
**Evaluation Date**: 2026-03-17
|
||||
**Evaluator**: Claude Sonnet 4.6
|
||||
|
||||
---
|
||||
|
||||
## 1. Content Summary
|
||||
|
||||
Research-backed critique of the `/init` auto-generation workflow for AGENTS.md / CLAUDE.md context files, synthesizing two 2026 academic papers with practitioner architecture recommendations.
|
||||
|
||||
**Key claims**:
|
||||
- **ETH Zurich study**: LLM-generated context files reduce task success by 2-3% and inflate costs by 20%+, because agents can already discover what those files contain
|
||||
- **Lulla et al. (ICSE JAWs 2026)**: Human-authored context files reduced wall-clock runtime by 28.64% and token consumption by 16.58% — but only because they contained genuinely non-discoverable information
|
||||
- **The discoverability filter**: the only criterion for adding a line is whether the agent can find it by reading the code; if yes, delete it
|
||||
- **"Pink elephant" anchoring effect**: mentioning a technology in CLAUDE.md biases the agent toward it every session, even if it's deprecated or rarely used
|
||||
- **Static monolithic files are architecturally flawed**: they load the same context regardless of task type, wasting tokens on irrelevant instructions
|
||||
- **ACE framework (ICLR 2026)**: dynamic routing layer outperformed static CLAUDE.md approaches by 12.3% on agent benchmarks
|
||||
- **Arize AI automated optimization**: iterative prompt learning yielded +5.19% accuracy cross-repo, +10.87% in-repo
|
||||
- **Mental model shift**: CLAUDE.md as diagnostic tool for codebase friction, not permanent configuration
|
||||
|
||||
**Depth**: ~3,500 words, research synthesis + practical architecture recommendations. Source credibility is high — Osmani is Google Cloud AI Director, article cites peer-reviewed 2026 papers.
|
||||
|
||||
---
|
||||
|
||||
## 2. Initial Scoring: 4/5 (High Value)
|
||||
|
||||
| Score | Meaning | Action |
|
||||
|-------|---------|--------|
|
||||
| 5 | Critical — fills major gap | < 24h |
|
||||
| **4** | **High value — significant improvement** | **< 1 week** |
|
||||
| 3 | Moderate — useful complement | When time available |
|
||||
| 2 | Marginal — skip or minimal mention | — |
|
||||
| 1 | Out of scope — reject | — |
|
||||
|
||||
### Justification
|
||||
|
||||
**What the guide already covers (§3.1, line 4532)**:
|
||||
- CLAUDE.md hierarchy (global / project / local)
|
||||
- Minimum Viable CLAUDE.md concept
|
||||
- "Auto-generated CLAUDE.md files tend to be generic, bloated" — line 4589
|
||||
- Anti-pattern: preemptively documenting everything
|
||||
- What Claude auto-detects (tech stack, directory structure, conventions)
|
||||
|
||||
**What's missing from the guide — filled by this article**:
|
||||
- **Research backing**: The guide has correct intuitions but zero empirical evidence. Osmani provides two 2026 papers with concrete numbers.
|
||||
- **The `/init` anti-pattern explicitly named**: line 21308 lists `/init` as a command without any caveat. The article makes the cost explicit: +20% cost, -2-3% success.
|
||||
- **Discoverability filter as decision rule**: "Can the agent find this by reading the code?" is nowhere in the guide as an explicit framework.
|
||||
- **Anchoring/pink elephant concept**: context contamination from stale or irrelevant tech mentions — not covered anywhere in the guide.
|
||||
- **Dynamic context routing architecture**: 3-layer model (protocol file + skill files + maintenance subagent) — aligns with the guide's skills system but the connection is never made.
|
||||
- **Scale implications**: 15-20% cost overhead compounds across CI/CD runs — no coverage in the guide's cost sections.
|
||||
|
||||
**Why 4/5 and not 5/5**:
|
||||
- The guide already gives the right advice; this strengthens it with data and adds missing concepts
|
||||
- A 5/5 would require the guide to be actively wrong or have a complete gap — here it's partially covered but under-evidenced
|
||||
- Some claims (ACE framework +12.3%, Arize +5.19%) are from research with limited real-world validation
|
||||
|
||||
---
|
||||
|
||||
## 3. Comparative Analysis
|
||||
|
||||
| Aspect | This Resource | Guide §3.1 |
|
||||
|--------|--------------|------------|
|
||||
| Minimum CLAUDE.md principle | ✅ Research-backed | ✅ Present (intuition only) |
|
||||
| `/init` anti-pattern | ✅ Named, quantified | ❌ Listed as command, no caveat |
|
||||
| Discoverability filter | ✅ Explicit decision rule | ❌ Implied but not stated |
|
||||
| Anchoring / pink elephant effect | ✅ Named concept with mechanism | ❌ Not covered |
|
||||
| Dynamic context routing | ✅ 3-layer architecture | ❌ Not covered |
|
||||
| Research citations (2026 papers) | ✅ ETH Zurich, Lulla et al. | ❌ None |
|
||||
| Scale/cost at CI/CD volume | ✅ Quantified overhead | ❌ Not covered |
|
||||
| CLAUDE.md as diagnostic tool | ✅ Central mental model | Partially (anti-pattern section) |
|
||||
| Hierarchy of CLAUDE.md files | ✅ Mentioned | ✅ Well covered |
|
||||
| Automated optimization loop | ✅ Arize AI approach | ❌ Not covered |
|
||||
|
||||
---
|
||||
|
||||
## 4. Integration Recommendations
|
||||
|
||||
### Where to integrate
|
||||
|
||||
**Primary: §3.1 Memory Files (CLAUDE.md) — line ~4589**
|
||||
|
||||
Add a dedicated subsection "The Discoverability Filter" immediately after the existing "Minimum Viable CLAUDE.md" section:
|
||||
|
||||
```markdown
|
||||
### The Discoverability Filter
|
||||
|
||||
Before adding any line to CLAUDE.md, apply this test: **can the agent discover
|
||||
this by reading the codebase?** If yes, don't add it.
|
||||
|
||||
Two 2026 research studies quantify the cost of ignoring this: LLM-generated
|
||||
context files (the output of `/init`) reduce task success by 2-3% and inflate
|
||||
costs by 20%+ because they duplicate what agents find by exploring the repo
|
||||
anyway (ETH Zurich, 2026). Human-authored files that contain genuinely
|
||||
non-discoverable information perform better: -28.64% wall-clock time,
|
||||
-16.58% token consumption (Lulla et al., ICSE JAWs 2026).
|
||||
|
||||
What earns a line:
|
||||
- Tooling preference that can't be inferred: `Use uv, not pip`
|
||||
- Operational landmine: `legacy/ is deprecated but imported by prod — do not delete`
|
||||
- Non-obvious convention: `auth module uses custom middleware — do not refactor to Express standard`
|
||||
|
||||
What does not earn a line:
|
||||
- Directory structure (agent reads it in the first tool call)
|
||||
- Tech stack (agent reads package.json / go.mod / Cargo.toml)
|
||||
- Testing conventions (agent reads existing tests)
|
||||
```
|
||||
|
||||
**Secondary: `/init` command documentation — line ~21308**
|
||||
|
||||
Add a warning note alongside the command listing that auto-generated output tends to be redundant and can hurt performance.
|
||||
|
||||
**Tertiary: New "Context Anchoring" warning in §3.1**
|
||||
|
||||
Add a callout about the pink elephant / anchoring effect: mentioning a technology in CLAUDE.md biases the agent toward it every session. Stale entries are worse than no entries.
|
||||
|
||||
**Optional: Advanced patterns section**
|
||||
|
||||
The 3-layer dynamic routing architecture (protocol file + persona/skill files + maintenance subagent) could slot into §4 (agents) or §9 (advanced workflows) as an architecture pattern for teams running agents at scale.
|
||||
|
||||
### Priority
|
||||
|
||||
**High** — the `/init` usage is common, the anti-pattern is quantified, and the guide already gives the right advice without the evidence to back it. Adding the data strengthens the guide's credibility.
|
||||
|
||||
---
|
||||
|
||||
## 5. Challenge Results (technical-writer agent)
|
||||
|
||||
The challenger **downgraded the score to 3/5** with substantive reasoning.
|
||||
|
||||
**Core finding**: This is a secondary synthesis article, not a primary source. The guide already evaluated the ETH Zurich paper directly (`agents-md-empirical-study-2602-11988.md`, scored 4/5). Osmani's article derives most of its authority from that same paper — scoring the derivative higher than the source is backwards.
|
||||
|
||||
**Unverified claims the evaluation initially missed**:
|
||||
- **Lulla et al. (ICSE JAWs 2026)**: -28.64% / -16.58% numbers are suspiciously precise for a 2026 paper with no arXiv link or DOI provided
|
||||
- **ACE framework (ICLR 2026)**: +12.3% claim — ICLR 2026 results not fully public as of March 2026
|
||||
- **Arize AI +5.19% / +10.87%**: commercial observability company, incentive to publish favorable benchmarks, source unclear
|
||||
- **"Pink elephant" anchoring**: Osmani's interpretive layer, not a finding from any cited study
|
||||
|
||||
**Conflict of interest flag**: Osmani is Director at Google Cloud AI, which competes with Anthropic in the AI coding tools space. His framing of Claude Code's `/init` as an anti-pattern is not neutral commentary. The underlying research remains valid, but the framing context should be noted.
|
||||
|
||||
**What Osmani genuinely adds** (not covered by ETH Zurich eval):
|
||||
- The dynamic routing layer argument (static monolithic AGENTS.md as architecturally flawed) — if ACE framework checks out, this is a forward-looking direction worth tracking
|
||||
- Practitioner authority signal: widely-read voice from Google reaching this conclusion documents where community consensus is moving
|
||||
|
||||
**Integration recommendation revised**:
|
||||
- Do not create a new section
|
||||
- Fold Osmani's practitioner framing into the already-planned ETH Zurich callout as a convergence note (2 sentences)
|
||||
- Verify Lulla et al. and ACE framework before any integration of those claims
|
||||
- Flag Arize AI numbers as unverified commercial claims
|
||||
|
||||
---
|
||||
|
||||
## 6. Fact-Check (Perplexity + existing evaluations)
|
||||
|
||||
| Claim | Status | Source |
|
||||
|-------|--------|--------|
|
||||
| ETH Zurich: LLM-generated files -2-3% task success, +20% cost | ✅ | Verified — matches `agents-md-empirical-study-2602-11988.md` (arXiv 2602.11988, peer-reviewed) |
|
||||
| ETH Zurich: developer-written files +4% success | ✅ | Same source — confirmed |
|
||||
| 100% of auto-gen files contained codebase overviews | ✅ | Consistent with ETH Zurich paper findings |
|
||||
| `uv`: 1.6 uses/task when mentioned vs <0.01 without | ⚠️ | Plausible ETH Zurich finding, specific numbers not independently verified |
|
||||
| **Lulla et al. (ICSE JAWs 2026)**: -28.64% wall-clock, -16.58% tokens, 124 PRs | ❌ | **NOT FOUND** — no arXiv, no DOI, no academic search hit via Perplexity. Specific precision of these numbers is a red flag. Paper may not exist or may not be publicly available yet. |
|
||||
| **ACE framework (ICLR 2026)**: +12.3% vs static approach | ❌ | **NOT FOUND** — no paper matching "Agentic Context Engineering" from ICLR 2026 found in academic search. |
|
||||
| **Arize AI**: +5.19% cross-repo, +10.87% in-repo accuracy | ⚠️ | **Partially verified** — Arize blog post exists (arize.com/blog/optimizing-coding-agent-rules..., Oct 2025, updated Mar 2026) and confirms automated optimization yields "10-15% improvement." The specific split numbers (+5.19% / +10.87%) do not appear in Perplexity results — may be Osmani's own restatement of the blog data. |
|
||||
| Addy Osmani role as Director, Google Cloud AI | ✅ | LinkedIn profile |
|
||||
| Article date: February 23, 2026 | ✅ | Article header |
|
||||
|
||||
**Summary of verification**:
|
||||
- **Verified**: ETH Zurich claims (backed by peer-reviewed arXiv paper already in our eval database)
|
||||
- **Unverifiable**: Lulla et al. and ACE framework — no findable published source; treat as unverified
|
||||
- **Partially verified**: Arize AI — concept confirmed, specific numbers uncorroborated
|
||||
|
||||
**Corrections to previous evaluation**: The initial evaluation incorrectly marked Lulla et al. and ACE framework as verified. These are unverified claims from a secondary synthesis article. Osmani may be citing pre-publication papers, conference proceedings not yet indexed, or may have imprecise numbers in the synthesis.
|
||||
|
||||
---
|
||||
|
||||
## 7. Final Decision
|
||||
|
||||
**Score**: 3/5 (Moderate — derivative synthesis with unverified secondary claims)
|
||||
|
||||
**Action**: Partial integration — ETH Zurich-backed points only, fold into existing planned callout
|
||||
|
||||
**Confidence**: High on ETH Zurich claims, Low on Lulla/ACE/Arize specific numbers
|
||||
|
||||
**What to integrate** (ETH Zurich-verified only):
|
||||
1. Add `/init` anti-pattern warning to the command listing (~line 21308): "Auto-generated output from `/init` falls into the LLM-generated category — ETH Zurich research shows these reduce task success by ~3% and add 20%+ cost. Review and prune before committing."
|
||||
2. Osmani's "discoverability filter" framing ("can the agent find this by reading the code?") is a useful pedagogical tool — cite as practitioner convergence with the ETH Zurich finding
|
||||
3. The anchoring/pink elephant concept is editorial but valid — add as a callout in §3.1 without claiming it's a study finding
|
||||
|
||||
**Do not integrate**: Lulla et al. numbers (-28.64% / -16.58%), ACE framework +12.3%, Arize specific numbers (+5.19% / +10.87%) — all unverifiable.
|
||||
|
||||
**Pre-condition**: Ship the ETH Zurich integration (`agents-md-empirical-study-2602-11988.md`, 4/5) first. This article rides on that work.
|
||||
110
docs/resource-evaluations/076-packmind-contextops-platform.md
Normal file
110
docs/resource-evaluations/076-packmind-contextops-platform.md
Normal file
|
|
@ -0,0 +1,110 @@
|
|||
# Resource Evaluation #076 — Packmind: ContextOps Platform for AI Coding Agents
|
||||
|
||||
**Source:** [GitHub — PackmindHub/packmind](https://github.com/PackmindHub/packmind) / [Demo use cases](https://github.com/PackmindHub/demo-use-case-skills)
|
||||
**Type:** Open-source platform + SaaS layer — engineering standards distribution for AI coding agents
|
||||
**Evaluated:** 2026-03-17
|
||||
|
||||
---
|
||||
|
||||
## 📄 Content Summary
|
||||
|
||||
Packmind is a "ContextOps" platform (Packmind's own term) that captures engineering standards once and distributes them as AI-readable context across all AI coding agents a team uses.
|
||||
|
||||
1. **Standards Distribution** — Single source of truth for coding rules, architecture patterns, naming conventions. Generates `CLAUDE.md` + slash commands + skills for Claude Code, `.cursor/rules/*.mdc` for Cursor, `.github/copilot-instructions.md` for Copilot, `AGENTS.md` for generic agents.
|
||||
2. **MCP Server** — Lets Claude Code (or any MCP-capable agent) create and manage playbook standards interactively during a session.
|
||||
3. **Continuous Learning Loop** — Claimed workflow: bug fixed → root cause + resolution captured via Skill+MCP → playbook update proposed → human validates → distributed across repos. (Claimed behavior, no reproducible benchmark found.)
|
||||
4. **Knowledge Ingestion from Team Tools** — Demo repo shows 6 ready-made use cases pulling context from GitHub PR comments, Slack, Jira, GitLab MRs, Confluence, Notion via their MCP servers.
|
||||
5. **Self-hostable** — Docker/Kubernetes, Apache-2.0 CLI. SaaS layer at packmind.com with unspecified pricing.
|
||||
|
||||
**Traction:** 245 GitHub stars, 22 CLI releases in 6 months (v0.19.0→v0.22.0), active commits as of March 16 2026, 29 open issues.
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Relevance Score
|
||||
|
||||
| Score | Meaning |
|
||||
|-------|---------|
|
||||
| 5 | Essential — Major gap in the guide |
|
||||
| 4 | Very relevant — Significant improvement |
|
||||
| 3 | Relevant — Useful complement |
|
||||
| 2 | Marginal — Secondary info |
|
||||
| 1 | Out of scope — Not relevant |
|
||||
|
||||
**Score: 4/5**
|
||||
|
||||
The guide covers CLAUDE.md authorship per-project but has zero coverage of organizational-scale standards distribution across repos and teams. Packmind addresses exactly that gap. It also introduces the only tool with measurable traction specifically targeting multi-agent context sync (Claude Code + Copilot + Cursor + Windsurf from a single source).
|
||||
|
||||
---
|
||||
|
||||
## ⚖️ Comparison
|
||||
|
||||
| Aspect | Packmind | Our Guide |
|
||||
|--------|----------|-----------|
|
||||
| CLAUDE.md per-project authorship | ✅ Automated via CLI | ✅ Well documented |
|
||||
| Org-scale standards distribution | ✅ Core feature | ❌ Missing — real gap |
|
||||
| Multi-agent sync (Copilot, Cursor, Windsurf) | ✅ Native support | ⚠️ Partial (third-party-tools) |
|
||||
| MCP server for context management | ✅ Ships one | ✅ Documented (mcp-servers-ecosystem) |
|
||||
| `.claude/rules/` modular pattern at org scale | ✅ Packmind = org-level version | ✅ Project-level documented |
|
||||
| Continuous learning loop from failures | ✅ Claimed (unverified) | ❌ Missing |
|
||||
| Security implications of centralized context | ⚠️ Not documented by them | ✅ Security section exists |
|
||||
|
||||
---
|
||||
|
||||
## 📍 Integration Recommendations
|
||||
|
||||
**Priority High — `guide/ecosystem/third-party-tools.md`**
|
||||
|
||||
New subsection "Engineering Standards Distribution." Cover: what it generates (CLAUDE.md + slash commands + skills), MCP server, multi-agent sync, self-hostable CLI Apache-2.0. Add security caveat: centralized standards distribution creates a shared attack surface — if the Packmind repository is compromised, prompt injection vectors can reach every developer's AI session simultaneously. Cross-reference the guide's security section.
|
||||
|
||||
**Priority Medium — `guide/ultimate-guide.md` Team Configuration section**
|
||||
|
||||
3-4 lines after the CLAUDE.md compounding memory pattern. Hook: "At organizational scale, maintaining consistent standards across dozens of repositories requires tooling beyond manual CLAUDE.md authorship." Frame Packmind as the organizational-scale evolution of what `.claude/rules/` does at the project level — immediately actionable for readers already using that pattern. Cross-reference third-party-tools.
|
||||
|
||||
**Priority Low — `guide/ecosystem/mcp-servers-ecosystem.md`**
|
||||
|
||||
One-liner in the Orchestration or Documentation section: Packmind ships an MCP server for creating and managing engineering standards directly from Claude Code.
|
||||
|
||||
---
|
||||
|
||||
## 🔥 Challenge (technical-writer agent)
|
||||
|
||||
Score **adjusted to 4/5** — initial estimate of 3/5 was too conservative.
|
||||
|
||||
**Points not in initial assessment:**
|
||||
- **Security surface**: Centralized CLAUDE.md distribution = shared prompt injection attack vector. Must be flagged when documenting.
|
||||
- **Pricing opacity**: CLI is Apache-2.0 and self-hostable, but SaaS layer pricing is unspecified. Different from Rippletide (#072) situation, but still needs to be explicit.
|
||||
- **"ContextOps" is a Packmind-coined term**, not industry standard. Introduce it as "Packmind's term for..." — not as established vocabulary.
|
||||
- **Link to `.claude/rules/` pattern**: The guide already documents modular rules at project level. Packmind scales this to org level. That framing makes the concept immediately actionable.
|
||||
|
||||
**Risk of not integrating**: The organizational context distribution problem is underserved. A competing guide or Anthropic's own docs may pick up this pattern first. Packmind is the only tool with measurable traction (245 stars, 6 months, 22 releases) targeting it specifically.
|
||||
|
||||
---
|
||||
|
||||
## ✅ Fact-Check
|
||||
|
||||
| Claim | Verified | Source |
|
||||
|-------|----------|--------|
|
||||
| Apache-2.0 license | ✅ | GitHub LICENSE file |
|
||||
| Supports Claude Code, Copilot, Cursor, Windsurf | ✅ | README packmind |
|
||||
| Generates CLAUDE.md + slash commands + skills | ✅ | README + CLI docs |
|
||||
| MCP server available | ✅ | README packmind |
|
||||
| 22 CLI releases in 6 months | ✅ | GitHub releases tab |
|
||||
| Self-hostable Docker/Kubernetes | ✅ | README |
|
||||
| Continuous learning loop (bug → playbook) | ⚠️ Claimed | README + demo repo — no reproducible benchmark |
|
||||
| 245 GitHub stars | ✅ | GitHub (verified 2026-03-17) |
|
||||
|
||||
**Corrections**: None. No hallucinated figures. The learning loop claim must be presented as claimed behavior, not established fact.
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Final Decision
|
||||
|
||||
- **Score**: 4/5
|
||||
- **Action**: Integrate
|
||||
- **Confidence**: High (sources verified directly from GitHub)
|
||||
- **Priority**: Medium — not urgent, but real gap in org-scale context distribution
|
||||
- **Constraints**:
|
||||
- Do not reproduce the learning loop claim without qualifying it as claimed behavior
|
||||
- Introduce "ContextOps" with attribution ("Packmind's term for..."), not as established vocabulary
|
||||
- Add security caveat on centralized context distribution
|
||||
- Frame relative to `.claude/rules/` modular pattern (org-scale evolution)
|
||||
117
docs/resource-evaluations/077-comprehension-debt-ai-coding.md
Normal file
117
docs/resource-evaluations/077-comprehension-debt-ai-coding.md
Normal file
|
|
@ -0,0 +1,117 @@
|
|||
# Resource Evaluation #077: "Comprehension Debt — The Hidden Cost of AI Generated Code"
|
||||
|
||||
**Date**: 2026-03-17
|
||||
**Evaluator**: Claude Sonnet 4.6
|
||||
**Source**: LinkedIn post + full article by unknown author
|
||||
**Published**: March 14, 2026
|
||||
**Original URL**: https://lnkd.in/g-vEeZry (LinkedIn shortlink, article at external blog)
|
||||
**Input type**: Copied text
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
Long-form LinkedIn article arguing that AI coding tools create "comprehension debt" — the growing gap between code volume and human understanding. The piece is structured as a think piece for software engineers, with sections on speed asymmetry, the limits of tests and specs, invisible measurement gaps, and an emerging regulatory risk. Primary empirical anchor is the Shen & Tamkin (2026) Anthropic Fellows study (arXiv 2601.20245).
|
||||
|
||||
---
|
||||
|
||||
## 📄 Key Points
|
||||
|
||||
- **Comprehension debt** = the gap between how much code exists and how much any human genuinely understands. Breeds false confidence because metrics look fine while system knowledge erodes.
|
||||
- **Speed asymmetry**: Junior devs can now generate code faster than senior devs can critically audit it. The rate-limiting factor that historically made code review meaningful has been removed.
|
||||
- **Tests are necessary but not sufficient**: You can't test behavior you haven't specified. When an AI updates hundreds of tests to match new behavior, correctness is no longer the right question.
|
||||
- **Specs don't close the gap**: Every spec-to-code translation involves implicit decisions (edge cases, error handling, tradeoffs) that no spec captures. A complete spec is the program, written in a non-executable language.
|
||||
- **Measurement gap**: Velocity, DORA, and coverage metrics don't capture comprehension loss. The incentive structure optimizes correctly for what it measures — but the wrong thing is being measured.
|
||||
- **Regulation horizon**: AI-generated code in healthcare, finance, and government makes "the AI wrote it" an untenable defense. Teams building comprehension discipline now will be better positioned when liability arrives.
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Score: 3/5
|
||||
|
||||
**Pertinent — useful addition at the margin.**
|
||||
|
||||
The resource is well-written and addresses real dynamics. But the primary empirical anchor — the Shen & Tamkin (2026) study, arXiv 2601.20245 — is already integrated into `guide/roles/learning-with-ai.md` with full statistics, sample size, p-value, and interpretation. The article adds a terminology layer ("comprehension debt") that functions as a communications device rather than a conceptual breakthrough. Skill atrophy, verification debt, and the limits of passive AI delegation are all present in the guide. The regulation angle is the only content not covered.
|
||||
|
||||
---
|
||||
|
||||
## ⚖️ Comparatif
|
||||
|
||||
| Aspect | This resource | Guide coverage |
|
||||
|--------|---------------|----------------|
|
||||
| Anthropic skill formation study (n=52, 17% lower, Cohen's d=0.738) | ✅ Cited and explained | ✅ Already in learning-with-ai.md:1045 |
|
||||
| Skill atrophy / comprehension loss framing | ✅ Central theme | ✅ Extensively covered in learning-with-ai.md |
|
||||
| Speed asymmetry in code review | ✅ Clear framing | ⚠️ Partially covered, less explicitly framed |
|
||||
| Tests are necessary but not sufficient | ✅ Good examples | ⚠️ Present but not as a dedicated argument |
|
||||
| Measurement gap (velocity vs. comprehension) | ✅ Concrete | ❌ Not explicitly addressed |
|
||||
| "Comprehension debt" as named concept | ✅ Yes (new terminology) | ❌ Concept present, term absent |
|
||||
| Regulatory risk (healthcare/finance/gov) | ✅ One section | ❌ Not covered anywhere |
|
||||
| "Passive delegation" vs. "conceptual inquiry" distinction | ✅ Emphasized | ✅ Covered in learning-with-ai.md |
|
||||
|
||||
---
|
||||
|
||||
## 📍 Recommendations
|
||||
|
||||
**Score ≥ 3 → integrate at the margin.**
|
||||
|
||||
Three targeted additions, not a new section:
|
||||
|
||||
1. **Add "comprehension debt" terminology** in `guide/roles/learning-with-ai.md` §2 (The Reality of AI Productivity, around line 83-99). One sentence: "This skill atrophy dynamic is increasingly referred to as *comprehension debt* — the growing gap between code volume and genuine human understanding of the system."
|
||||
- Why: The term is gaining traction. Having it in the guide aids searchability and connects readers who encountered it elsewhere.
|
||||
|
||||
2. **Add speed asymmetry framing** to the code review section (learning-with-ai.md or ai-roles.md). The specific inversion — "junior devs generate faster than seniors can audit" — is a cleaner framing than what the guide currently has.
|
||||
|
||||
3. **Add regulatory paragraph** in `guide/roles/ai-roles.md` or a tech-leads-specific section. Healthcare, finance, government regulation of AI-generated code is absent from the guide and is a genuine forward-looking concern for the tech leads and CTO/CIO audience.
|
||||
|
||||
**Avoid**: Creating a new dedicated "comprehension debt" section. The existing skill atrophy coverage is more rigorous. Adding a parallel section risks diluting it.
|
||||
|
||||
**Priority**: Low-Medium. Terminology + regulation angle are useful. Nothing here is urgent.
|
||||
|
||||
---
|
||||
|
||||
## 🔥 Challenge (technical-writer agent)
|
||||
|
||||
**Score adjusted: 3/5 (down from initial 4/5).**
|
||||
|
||||
> "The resource references arXiv 2601.20245. That study is already integrated into your guide at learning-with-ai.md:1045, with the correct statistics, sample size, p-value, and interpretation. The core empirical anchor is not new to this guide."
|
||||
>
|
||||
> "'Comprehension debt' adds branding, not insight. The framing succeeds as a communications device, not as a conceptual breakthrough."
|
||||
>
|
||||
> "The regulation angle (healthcare, finance, government) is genuine new territory for your guide. That is the only part of the resource that adds something your guide does not already address."
|
||||
>
|
||||
> "The better play: cite the 'comprehension debt' terminology as an alternate framing of the existing problem, and add one paragraph to the Tech Leads section on the regulatory dimension. That is a 15-minute edit, not a new section."
|
||||
|
||||
The challenge stands. Adjusted score is correct.
|
||||
|
||||
---
|
||||
|
||||
## ✅ Fact-Check
|
||||
|
||||
| Claim | Verified | Source |
|
||||
|-------|----------|--------|
|
||||
| 52 software engineers in the study | ✅ | arXiv 2601.20245 HTML: "52 completed main study (26 control, 26 treatment)" |
|
||||
| 17% lower comprehension scores | ✅ | arXiv 2601.20245: "4.15 point difference on 27-point quiz", confirmed as 17% |
|
||||
| Largest decline in debugging | ✅ | arXiv 2601.20245: "largest performance gap appeared in debugging questions" |
|
||||
| AI delegation patterns score below 40% | ✅ | arXiv 2601.20245: AI Delegation ~24%, Progressive Reliance ~39%, Iterative Debugging ~36% |
|
||||
| Conceptual inquiry patterns score above 65% | ✅ | arXiv 2601.20245: Conceptual Inquiry ~65%, Hybrid Code-Explanation ~75%, Generation-Then-Comprehension ~86% |
|
||||
| "50% vs 67%" exact figures | ⚠️ | Not explicitly stated in paper; approximate interpretation of the 17% gap and quiz scale (27 pts). Directionally correct. |
|
||||
| Authors: Judy Hanwen Shen, Alex Tamkin | ✅ | arXiv 2601.20245 confirmed |
|
||||
| Submitted January 2026 | ✅ | Submitted January 28, 2026; revised February 1, 2026 |
|
||||
| "Anthropic study" attribution | ✅ (with nuance) | Anthropic Fellows Program research — not an official Anthropic study but affiliated |
|
||||
|
||||
**Corrections**: The article says "50% vs. 67%" as exact scores. These are directionally correct but are approximate interpretations of "4.15 points on a 27-point scale" — the paper doesn't use percentage scores explicitly for the primary result. No correction needed; the claim is fair representation.
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Final Decision
|
||||
|
||||
- **Score**: 3/5
|
||||
- **Action**: Integrate at the margin (terminology + regulation angle only)
|
||||
- **Confidence**: High (fact-check solid, guide coverage confirmed)
|
||||
- **Effort**: ~30 minutes — two sentence inserts and one paragraph
|
||||
|
||||
**What to add**:
|
||||
1. `learning-with-ai.md` ~line 93: mention "comprehension debt" as alternate framing
|
||||
2. `learning-with-ai.md` or `ai-roles.md`: speed asymmetry framing (juniors generate faster than seniors can audit)
|
||||
3. `ai-roles.md` Tech Leads section: one paragraph on regulatory exposure for AI-generated code in regulated industries
|
||||
|
||||
**What NOT to do**: Create a new section, rewrite existing skill atrophy coverage, or position this article as a primary reference (it's secondary commentary on a study already in the guide).
|
||||
107
docs/resource-evaluations/078-claude-swarm-monitor.md
Normal file
107
docs/resource-evaluations/078-claude-swarm-monitor.md
Normal file
|
|
@ -0,0 +1,107 @@
|
|||
# Resource Evaluation: claude-swarm-monitor
|
||||
|
||||
**Date**: 2026-03-17
|
||||
**Evaluator**: Claude (automated via /eval-resource)
|
||||
**Status**: Watch-list — Re-evaluate at 50+ stars or macOS validation report
|
||||
|
||||
---
|
||||
|
||||
## 📄 Content Summary
|
||||
|
||||
- **TUI dashboard** (Rust + Ratatui) for monitoring multiple Claude Code agents running across git worktrees in parallel
|
||||
- **One swim lane per agent**: lead repo first, then each worktree — sorted and visually separated
|
||||
- **Live status streamed from JSONL session files** (`~/.claude/projects/`): Working / Waiting For You / Idle / Done / Error
|
||||
- **Sub-agent tracking**: agents spawned via the Task tool appear as nested cards within the parent lane (claim unverified — see Fact-check section)
|
||||
- **Docker stack visibility per worktree**: matches Compose stacks via `COMPOSE_PROJECT_NAME` in `docker/.env`, shows live CPU % and memory
|
||||
|
||||
**Source**: [github.com/oinant/claude-swarm-monitor](https://github.com/oinant/claude-swarm-monitor)
|
||||
**Language**: Rust (≥ 1.80) | **License**: MIT | **Stars**: 10 (March 2026) | **Platform**: Linux tested; Windows/Docker Desktop not yet supported
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Score: 3/5
|
||||
|
||||
| Score | Meaning |
|
||||
|-------|---------|
|
||||
| 5 | Essential — Major gap |
|
||||
| 4 | High value — Significant improvement |
|
||||
| **3** | **Pertinent — Useful complement** |
|
||||
| 2 | Marginal — Secondary info |
|
||||
| 1 | Out of scope |
|
||||
|
||||
**Justification**: claude-swarm-monitor fills two genuine gaps not covered by any tool currently in the guide: (1) monitoring via native JSONL session files rather than SSE/polling, and (2) Docker stack visibility per worktree. The JSONL approach is architecturally distinct from agent-chat (which targets Gas Town/multiclaude). However, at 10 stars (6 weeks old) and Linux-only, the adoption signal is too weak for a high-confidence recommendation. Score capped at 3 pending community validation.
|
||||
|
||||
---
|
||||
|
||||
## ⚖️ Comparatif
|
||||
|
||||
| Aspect | claude-swarm-monitor | Guide actuel |
|
||||
|--------|---------------------|-------------|
|
||||
| Monitor agent status across worktrees | ✅ Swim lanes, live status | ⚠️ agent-chat exists but targets Gas Town/multiclaude |
|
||||
| Status from Claude Code session files (JSONL) | ✅ Unique approach | ❌ No tool reads ~/.claude/projects/ natively |
|
||||
| Sub-agent (Task tool) tracking | ✅ Claimed | ❌ Not covered by any listed tool |
|
||||
| Docker stack visibility per worktree | ✅ CPU/mem live | ❌ Gap in current guide |
|
||||
| Cross-platform | ❌ Linux tested only | ⚠️ multiclaude is Linux/macOS, Conductor is macOS only |
|
||||
| Adoption signal | ⚠️ 10 stars, 1 maintainer | ✅ multiclaude 383+, Ruflo 18.9k |
|
||||
| Open source | ✅ MIT | ✅ All listed tools |
|
||||
|
||||
---
|
||||
|
||||
## 📍 Recommendations
|
||||
|
||||
**Current action**: Add to watch-list. Do not integrate into the main guide yet.
|
||||
|
||||
**Conditions for promotion to 4/5 and integration**:
|
||||
1. Re-evaluate at 50+ stars or after a credible community report of production use
|
||||
2. Verify the sub-agent Task tool tracking claim (how are internal spawns tracked if they don't write separate JSONL files?)
|
||||
3. Add a security note on `~/.claude/projects/` read scope (session files contain full conversation history including accidentally-typed secrets)
|
||||
4. Confirm or document macOS compatibility as a hard limitation
|
||||
5. Measure resource overhead with 10+ worktrees and live Docker polling
|
||||
|
||||
**When integrated, placement**: `guide/ecosystem/third-party-tools.md` in the Multi-Agent Orchestration section, with an explicit comparison row against agent-chat noting the key differentiators (JSONL-native vs SSE, Docker visibility, Rust vs JS, Linux vs cross-platform).
|
||||
|
||||
---
|
||||
|
||||
## 🔥 Challenge (technical-writer agent)
|
||||
|
||||
The challenge agent lowered the initial proposed score from 4 to 3, citing:
|
||||
|
||||
- **Adoption signal is the weakest in the ecosystem section** — 10 stars vs multiclaude (383+), Ruflo (18.9k); even Athena Flow (explicitly marked "not recommended yet") has more external validation
|
||||
- **Linux-only limitation is a real constraint** for the target audience (macOS-heavy multi-agent users)
|
||||
- **Security scope not addressed**: the tool reads `~/.claude/projects/` which contains full session history including sensitive context — consistent with guide's pattern of flagging data access scope (see Straude, Packmind entries)
|
||||
- **Sub-agent tracking claim unverified**: Task tool spawns are internal to Claude Code's process; it's unclear whether they write separate JSONL files
|
||||
|
||||
- **Score adjusted**: 3/5 (from proposed 4/5) — agreed
|
||||
- **Points missed**: Resource consumption (polling overhead), security implications, macOS support gap, sub-agent tracking verification
|
||||
- **Risk of non-integration**: Low — agent-chat already covers monitoring; the JSONL/Docker angles are compelling but serve a narrow advanced subset
|
||||
|
||||
---
|
||||
|
||||
## ✅ Fact-Check
|
||||
|
||||
| Claim | Status | Source |
|
||||
|-------|--------|--------|
|
||||
| Rust + Ratatui TUI | ✅ | GitHub repo, Cargo.toml |
|
||||
| Status from JSONL session files | ✅ | README: "streamed directly from Claude Code's JSONL session files" |
|
||||
| Docker stack matching via COMPOSE_PROJECT_NAME | ✅ | README: `docker/.env` → `COMPOSE_PROJECT_NAME` |
|
||||
| Sub-agent tracking via Task tool | ⚠️ Unverified | Claimed in README, mechanism unclear |
|
||||
| MIT license | ✅ | GitHub metadata |
|
||||
| 10 stars | ✅ | GitHub API (2026-03-17) |
|
||||
| Linux tested | ✅ | README: "currently tested on Linux only" |
|
||||
| ~500 lines per file | ✅ | README: "The codebase is small (~500 lines per file, clearly separated modules)" |
|
||||
| Created Feb 2026 | ✅ | GitHub API: created_at 2026-02-22 |
|
||||
|
||||
**Corrections**: None required. All verifiable claims check out. The sub-agent tracking claim needs mechanical verification before being cited as a feature.
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Final Decision
|
||||
|
||||
- **Score**: 3/5
|
||||
- **Action**: Watch-list (not integrated yet)
|
||||
- **Re-evaluate trigger**: 50+ stars OR macOS production report OR sub-agent tracking verified
|
||||
- **Confidence**: Medium (tool is real and functional; uncertainty is on adoption and edge-case claims)
|
||||
|
||||
---
|
||||
|
||||
*Evaluation file: `docs/resource-evaluations/076-claude-swarm-monitor.md`*
|
||||
Loading…
Add table
Add a link
Reference in a new issue