claude-code-ultimate-guide/docs/resource-evaluations/eval-claude-1m-context-window-jp-caparas.md
Florian BRUNIAUX da8bc09f2d feat: smart-suggest ROI script + hook tuning + guide updates (Mar 16)
- Add examples/scripts/smart-suggest-roi.py: stdlib-only analyzer correlating
  suggestion log with session JSONL files to measure command acceptance rate.
  4 acceptance signals, tier breakdown, daily trend, --json/--since/--no-sessions CLI.
- Tune Aristote smart-suggest hook: tighten 5 over-firing triggers (/tech:commit,
  /tech:sonarqube, /tech:dupes, /check-conventions a11y, /tech:worktree)
- Guide: identity re-injection hook, context engineering maturity grid, code review
  workflow, 1M context window GA update, Spring Break promo, security audit patterns
- Resource evaluations: Nick Tune hooks (3/5), VicKayro security audit (2/5),
  Karl Mazier CLAUDE.md templates, Paul Rayner ContextFlow, Siddhant agent trace,
  Andrew Yng context hub, JP Caparas 1M context window

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16 12:20:40 +01:00

5.4 KiB

Resource Evaluation: "What Claude's 1M Token Context Window Means for Your Work"

Source: https://reading.sh/what-claudes-1m-token-context-window-means-for-your-work-3c9f900f04c6 Author: JP Caparas (Medium) Published: ~March 15, 2026 (13 min read, 62 claps) Type: Plain-language explainer / pricing analysis Evaluated: 2026-03-15 Score: 2/5 (Marginal — do not integrate)


Summary

Plain-language explainer covering: what tokens are and what 1M tokens can hold (codebases, legal docs, academic papers), claim that 1M context went GA on March 13 for Opus 4.6, pricing comparison against OpenAI GPT-5.4 / Google Gemini 3.1 Pro / Meta Llama 4, Claude Code workflow impact (compaction reduction, full-codebase sessions), enterprise use cases (compliance review, codebase migration, incident response), "lost in the middle" effect as ongoing limitation, MRCR v2 score (78.3% Opus 4.6), and quotes from Jon Bell (CPO Codeium) and Anton Biryukov (Ramp).


Fact-Check

Claim Status Notes
1M context GA March 13, 2026 Unconfirmed Perplexity shows "beta" still active; March 13 maps to a usage promotion, not a GA announcement
Flat pricing $5/$25 MTok, no surcharge at any length FALSE 2x input / 1.5x output surcharge confirmed above 200K tokens (awesomeagents.ai, puter.com); this is the article's central thesis
Opus 4.6 base price $5/$25 MTok Confirmed Consistent across multiple pricing sources
Sonnet 4.6 $3/$15 MTok Confirmed Consistent across multiple pricing sources
Cached reads $0.50/MTok (90% discount) Plausible Standard Anthropic caching discount
Cache writes $6.25/MTok Plausible Standard Anthropic caching pricing
OpenAI GPT-5.4 2x rate limit above 272K Unverified No primary source found
Google Gemini 3.1 Pro surcharge above 200K Partially confirmed Google does surcharge; exact numbers differ from article
MRCR v2: 78.3% Opus 4.6 Discrepancy Guide cites 76% from Anthropic blog; article may reference a different variant or a revised benchmark run
Jon Bell (Codeium) — 15% compaction decrease Unverifiable No independent corroboration found
Anton Biryukov (Ramp) quote Unverifiable No independent corroboration found
600 images/PDF per request (up from 100) Unverified Not confirmed in official API docs
Haiku caps at 200K Confirmed Consistent with known specs

Critical finding: The article's differentiating claim is "Anthropic's bet is flat pricing — no surcharge regardless of length." This is factually wrong. Anthropic charges 2x input and 1.5x output above 200K tokens. The entire competitive pricing analysis built on this premise is unreliable.


Comparative Analysis

Aspect This resource Our guide
Token basics (what is a token) Covered, beginner-friendly Not covered — assumes dev audience
1M context capabilities Generic scenarios Covered with MRCR benchmarks + cost tables
Pricing vs competitors Covered — but factually wrong on flat pricing Partially covered (Gemini comparison, line 2053)
Compaction events Mentions 15% reduction (unverifiable quote) Covered in depth (architecture.md lines 391-438)
"Lost in the middle" effect Mentioned with arxiv:2307.03172 reference Not explicitly covered
Enterprise use cases Hypothetical, no measured data Covered across multiple sections
MRCR v2 benchmark 78.3% (discrepancy with our 76%) 76% from Anthropic blog
Claude Code workflow impact Good framing (search phase = 100K+ tokens) Covered via compaction + context engineering

Challenge Notes

The technical-writer review agreed score 2/5 is justified. An initial proposal of 3/5 was revised downward because the fact-check demolishes the article's core value proposition — wrong pricing data cannot complement a guide that aims for accuracy. The Povilas Korop anecdote (83% context utilization on a Laravel project) is a nice real-world datapoint, but anecdotal and insufficient to shift the score.

Key insight from review: "Fix stale info in the guide" is the real action item, independent of this resource. If 1M context is actually GA, our guide still says "beta" at lines 2028-2070 — that needs a fix regardless of this article.


Decision

Do not integrate.

Specific exclusions

  • Pricing comparison table (central claim on flat pricing is wrong)
  • Enterprise use cases (hypothetical, no measured data)
  • Unverifiable quotes (Jon Bell, Anton Biryukov)

Independent action items triggered by this review

  1. Verify 1M GA status via official Anthropic docs / API changelog
  2. If confirmed GA: Update guide/ultimate-guide.md lines 2028-2070 (currently says "beta")
  3. If confirmed GA: Verify whether the 200K surcharge structure changed at all
  4. Consider adding: One line on "lost in the middle" effect in guide/core/context-engineering.md (well-documented limitation, arxiv:2307.03172 — independent of this article)

Worth independent verification

The Ramp workflow pattern (search Datadog + Braintrust + DB + source = 100K tokens before writing a single fix) is a useful illustration of why large context windows matter for real engineering workflows — worth adding if an independent source confirms it.


Confidence: High. The fact-check identified a critical error in the central thesis; no ambiguity on the decision.