release: v3.20.1 - Vercel AGENTS.md vs Skills evaluation

- New resource evaluation (025): Vercel blog on eager context vs lazy
  skill invocation (Gao, Jan 2026). Score 3/5, 13/13 fact-checked.
- Guide: added 8KB compression benchmark to CLAUDE.md sizing (line 3527)
- Guide: added 56% skill invocation warning to Memory Loading (line 4082)
- Guide: added invocation reliability caveat to skills.sh trade-offs
- Version sync 3.20.0 → 3.20.1

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Florian BRUNIAUX 2026-01-30 21:45:14 +01:00
parent fd4550cbd3
commit 26ee4ef894
8 changed files with 188 additions and 11 deletions

View file

@ -6,7 +6,7 @@
**Written with**: Claude (Anthropic)
**Version**: 3.20.0 | **Last Updated**: January 2026
**Version**: 3.20.1 | **Last Updated**: January 2026
---
@ -484,4 +484,4 @@ where.exe claude; claude doctor; claude mcp list
**Author**: Florian BRUNIAUX | [@Méthode Aristote](https://methode-aristote.fr) | Written with Claude
*Last updated: January 2026 | Version 3.20.0*
*Last updated: January 2026 | Version 3.20.1*

View file

@ -111,7 +111,7 @@ Most developers experience three distinct phases:
| **Targeted Gains** | 2-8 weeks | +20-50% | AI accelerates specific tasks you've learned to delegate effectively |
| **Sustainable Plateau** | 3-6 months | +20-30% | Stable gains, but only for developers who already have strong fundamentals |
**Critical nuance**: These gains are conditional. Studies show experienced developers (5+ years) see larger, sustained gains. Junior developers often see initial spikes followed by regression — because speed without understanding creates technical debt.
**Critical nuance**: These gains are conditional. Studies show experienced developers (5+ years) see larger, sustained gains. Junior developers often see initial spikes followed by regression — because speed without understanding creates technical debt. A 2026 RCT ([Shen & Tamkin, Anthropic Fellows](https://arxiv.org/abs/2601.20245)) measured a **17% reduction in skills acquisition** when developers learned a new library with AI assistance (n=52, p=0.01) — with no significant time savings. Only ~20% of AI users (pure delegation pattern) finished faster, at the cost of learning almost nothing.
### Where AI Helps (And Where It Hurts)
@ -865,6 +865,7 @@ Warning signs you're becoming dependent, and what to do:
| Rejected in interviews | Fundamentals atrophied | Practice whiteboard problems without AI |
| Always ask "how" never "why" | Surface-level usage | Force yourself to ask "why this approach?" |
| Every solution looks the same | AI has patterns, you need variety | Study multiple implementations manually |
| Task feels easy but you can't explain it | **Perception gap** — AI users rate tasks easier while scoring 17% lower ([Shen & Tamkin 2026](https://arxiv.org/abs/2601.20245)) | After each task, explain the solution without looking at code |
### Weekly Self-Audit
@ -886,6 +887,7 @@ If you're faster but not smarter, you're building dependency.
- **GitHub Copilot Impact Study (2024)** — [dl.acm.org](https://dl.acm.org/doi/10.1145/3613904.3642394) — Found productivity gains but identified skill atrophy risks in junior developers
- **Student Dependency Patterns in AI-Assisted Learning** — IACIS 2024 — Documented "learned helplessness" in students over-reliant on AI
- **Junior Developer Career Trajectories with AI Tools** — Software Engineering Institute — 3-year longitudinal study on skill development
- **AI Impacts on Skill Formation (Shen & Tamkin, 2026)** — [arXiv:2601.20245](https://arxiv.org/abs/2601.20245) — Anthropic Fellows RCT (52 devs learning Python Trio with/without GPT-4o): AI group scored 17% lower on skills quiz (Cohen's d=0.738, p=0.01) with no significant speed gain. Identified 6 interaction patterns — 3 preserving learning (conceptual inquiry, hybrid explanation, generation-then-comprehension) via active cognitive engagement.
### Industry Reports
@ -901,6 +903,7 @@ Sources for [§3 The Reality of AI Productivity](#the-reality-of-ai-productivity
- **McKinsey Developer Productivity Report (2024)** — [mckinsey.com](https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/unleashing-developer-productivity-with-generative-ai) — Comprehensive analysis of AI impact across dev workflows
- **Stack Overflow 2024: AI Sentiment** — [stackoverflow.co](https://stackoverflow.co/labs/developer-sentiment-ai-ml/) — Developer attitudes toward AI tools, productivity perceptions
- **Uplevel Engineering Intelligence (2024)** — Burnout and productivity metrics with AI coding tools
- **METR Experienced Developer RCT (2025)** — [arXiv:2507.09089](https://arxiv.org/abs/2507.09089) — Randomized controlled trial (16 experienced devs, 246 issues, repos 1M+ lines): AI tools made developers 19% slower on familiar codebases, despite perceiving themselves 20% faster (39-point perception gap). Strongest evidence for skill atrophy risk in experienced developers.
- **DORA/Google DevOps Research (2024)** — AI tool adoption impact on team performance
### Practitioner Perspectives

View file

@ -10,7 +10,7 @@
**Last updated**: January 2026
**Version**: 3.20.0
**Version**: 3.20.1
---
@ -3524,7 +3524,7 @@ Month 3: 50 rules → 50 mistakes prevented + faster onboarding
**Anti-pattern**: Preemptively documenting everything. Instead, treat CLAUDE.md as a **living document** that grows through actual mistakes caught during development.
**Size guideline**: Keep CLAUDE.md files between **4-8KB total** (all levels combined). Practitioner studies show that context files exceeding 16K tokens degrade model coherence. Include architecture overviews, key conventions, and critical constraints—exclude full API references or extensive code examples (link to them instead).
**Size guideline**: Keep CLAUDE.md files between **4-8KB total** (all levels combined). Practitioner studies show that context files exceeding 16K tokens degrade model coherence. Include architecture overviews, key conventions, and critical constraints—exclude full API references or extensive code examples (link to them instead). Vercel's Next.js team compressed ~40KB of framework docs to an 8KB index with zero performance loss in agent evals ([Gao, 2026](https://vercel.com/blog/agents-md-outperforms-skills-in-our-agent-evals)), confirming the 4-8KB target.
### Level 1: Global (~/.claude/CLAUDE.md)
@ -4079,7 +4079,7 @@ Understanding when each memory method loads is critical for token optimization:
| `.claude/commands/*.md` | Invocation only | Only when invoked | Workflow templates |
| `.claude/skills/*.md` | Invocation only | Only when invoked | Domain knowledge modules |
**Key insight**: `.claude/rules/` is NOT on-demand. Every `.md` file in that directory loads at session start, consuming tokens. Reserve it for always-relevant conventions, not rarely-used guidelines.
**Key insight**: `.claude/rules/` is NOT on-demand. Every `.md` file in that directory loads at session start, consuming tokens. Reserve it for always-relevant conventions, not rarely-used guidelines. Skills are invocation-only and may not be triggered reliably—one eval found agents invoked skills in only 56% of cases ([Gao, 2026](https://vercel.com/blog/agents-md-outperforms-skills-in-our-agent-evals)). Never rely on skills for critical instructions; use CLAUDE.md or rules instead.
> **See also**: [Token Cost Estimation](#token-saving-techniques) for approximate token costs per file size.
@ -5650,6 +5650,7 @@ Full catalog: [skills.sh leaderboard](https://skills.sh/)
- ✅ Format 100% compatible with this guide
- ⚠️ Multi-agent focus (not Claude Code specific)
- ⚠️ Early stage (maturity to prove over time)
- ⚠️ Skills require explicit invocation; agents only auto-invoke them ~56% of the time ([Gao, 2026](https://vercel.com/blog/agents-md-outperforms-skills-in-our-agent-evals)). For critical instructions, prefer always-loaded CLAUDE.md
#### When to Use
@ -15743,4 +15744,4 @@ We'll evaluate and add it to this section if it meets quality criteria.
**Contributions**: Issues and PRs welcome.
**Last updated**: January 2026 | **Version**: 3.20.0
**Last updated**: January 2026 | **Version**: 3.20.1