marketing-shibata50/claude-code-ultimate-guide

Florian BRUNIAUX da8bc09f2d feat: smart-suggest ROI script + hook tuning + guide updates (Mar 16)

- Add examples/scripts/smart-suggest-roi.py: stdlib-only analyzer correlating
  suggestion log with session JSONL files to measure command acceptance rate.
  4 acceptance signals, tier breakdown, daily trend, --json/--since/--no-sessions CLI.
- Tune Aristote smart-suggest hook: tighten 5 over-firing triggers (/tech:commit,
  /tech:sonarqube, /tech:dupes, /check-conventions a11y, /tech:worktree)
- Guide: identity re-injection hook, context engineering maturity grid, code review
  workflow, 1M context window GA update, Spring Break promo, security audit patterns
- Resource evaluations: Nick Tune hooks (3/5), VicKayro security audit (2/5),
  Karl Mazier CLAUDE.md templates, Paul Rayner ContextFlow, Siddhant agent trace,
  Andrew Yng context hub, JP Caparas 1M context window

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-03-16 12:20:40 +01:00

5.4 KiB

Raw Blame History

Resource Evaluation: "What Claude's 1M Token Context Window Means for Your Work"

Source: https://reading.sh/what-claudes-1m-token-context-window-means-for-your-work-3c9f900f04c6 Author: JP Caparas (Medium) Published: ~March 15, 2026 (13 min read, 62 claps) Type: Plain-language explainer / pricing analysis Evaluated: 2026-03-15 Score: 2/5 (Marginal — do not integrate)

Summary

Plain-language explainer covering: what tokens are and what 1M tokens can hold (codebases, legal docs, academic papers), claim that 1M context went GA on March 13 for Opus 4.6, pricing comparison against OpenAI GPT-5.4 / Google Gemini 3.1 Pro / Meta Llama 4, Claude Code workflow impact (compaction reduction, full-codebase sessions), enterprise use cases (compliance review, codebase migration, incident response), "lost in the middle" effect as ongoing limitation, MRCR v2 score (78.3% Opus 4.6), and quotes from Jon Bell (CPO Codeium) and Anton Biryukov (Ramp).

Fact-Check

Claim	Status	Notes
1M context GA March 13, 2026	Unconfirmed	Perplexity shows "beta" still active; March 13 maps to a usage promotion, not a GA announcement
Flat pricing $5/$25 MTok, no surcharge at any length	FALSE	2x input / 1.5x output surcharge confirmed above 200K tokens (awesomeagents.ai, puter.com); this is the article's central thesis
Opus 4.6 base price $5/$25 MTok	Confirmed	Consistent across multiple pricing sources
Sonnet 4.6 $3/$15 MTok	Confirmed	Consistent across multiple pricing sources
Cached reads $0.50/MTok (90% discount)	Plausible	Standard Anthropic caching discount
Cache writes $6.25/MTok	Plausible	Standard Anthropic caching pricing
OpenAI GPT-5.4 2x rate limit above 272K	Unverified	No primary source found
Google Gemini 3.1 Pro surcharge above 200K	Partially confirmed	Google does surcharge; exact numbers differ from article
MRCR v2: 78.3% Opus 4.6	Discrepancy	Guide cites 76% from Anthropic blog; article may reference a different variant or a revised benchmark run
Jon Bell (Codeium) — 15% compaction decrease	Unverifiable	No independent corroboration found
Anton Biryukov (Ramp) quote	Unverifiable	No independent corroboration found
600 images/PDF per request (up from 100)	Unverified	Not confirmed in official API docs
Haiku caps at 200K	Confirmed	Consistent with known specs

Critical finding: The article's differentiating claim is "Anthropic's bet is flat pricing — no surcharge regardless of length." This is factually wrong. Anthropic charges 2x input and 1.5x output above 200K tokens. The entire competitive pricing analysis built on this premise is unreliable.

Comparative Analysis

Aspect	This resource	Our guide
Token basics (what is a token)	Covered, beginner-friendly	Not covered — assumes dev audience
1M context capabilities	Generic scenarios	Covered with MRCR benchmarks + cost tables
Pricing vs competitors	Covered — but factually wrong on flat pricing	Partially covered (Gemini comparison, line 2053)
Compaction events	Mentions 15% reduction (unverifiable quote)	Covered in depth (architecture.md lines 391-438)
"Lost in the middle" effect	Mentioned with arxiv:2307.03172 reference	Not explicitly covered
Enterprise use cases	Hypothetical, no measured data	Covered across multiple sections
MRCR v2 benchmark	78.3% (discrepancy with our 76%)	76% from Anthropic blog
Claude Code workflow impact	Good framing (search phase = 100K+ tokens)	Covered via compaction + context engineering

Challenge Notes

The technical-writer review agreed score 2/5 is justified. An initial proposal of 3/5 was revised downward because the fact-check demolishes the article's core value proposition — wrong pricing data cannot complement a guide that aims for accuracy. The Povilas Korop anecdote (83% context utilization on a Laravel project) is a nice real-world datapoint, but anecdotal and insufficient to shift the score.

Key insight from review: "Fix stale info in the guide" is the real action item, independent of this resource. If 1M context is actually GA, our guide still says "beta" at lines 2028-2070 — that needs a fix regardless of this article.

Decision

Do not integrate.

Specific exclusions

Pricing comparison table (central claim on flat pricing is wrong)
Enterprise use cases (hypothetical, no measured data)
Unverifiable quotes (Jon Bell, Anton Biryukov)

Independent action items triggered by this review

Verify 1M GA status via official Anthropic docs / API changelog
If confirmed GA: Update guide/ultimate-guide.md lines 2028-2070 (currently says "beta")
If confirmed GA: Verify whether the 200K surcharge structure changed at all
Consider adding: One line on "lost in the middle" effect in guide/core/context-engineering.md (well-documented limitation, arxiv:2307.03172 — independent of this article)

Worth independent verification

The Ramp workflow pattern (search Datadog + Braintrust + DB + source = 100K tokens before writing a single fix) is a useful illustration of why large context windows matter for real engineering workflows — worth adding if an independent source confirms it.

Confidence: High. The fact-check identified a critical error in the central thesis; no ambiguity on the decision.

5.4 KiB Raw Blame History