docs: restore 93% @ 256K with source + add HN community validation

- Restore Opus 4.6 MRCR 93% @ 256K (confirmed: independent analysis of Anthropic data) - Add Harry Potter needle test reference (HN 46905735: 49/50 spells at 733K tokens) - Source: Perplexity deep search cross-validation, Feb 18 2026 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-18 10:41:03 +01:00 · 2026-02-18 10:41:03 +01:00 · 8d6c50403d
commit 8d6c50403d
parent 78f4dc4b42
1 changed files with 6 additions and 6 deletions
--- a/guide/ultimate-guide.md
+++ b/guide/ultimate-guide.md
@ -1757,13 +1757,13 @@ The 1M context window (beta, API + usage tier 4 required) is a significant capab

 **Retrieval accuracy at scale (MRCR v2 8-needle 1M variant)**

-| Model | 1M accuracy | Source |
-|-------|-------------|--------|
-| Opus 4.6 | 76% | Anthropic blog (Feb 2026) |
-| Sonnet 4.5 | 18.5% | Anthropic blog (Feb 2026) |
-| Sonnet 4.6 | Not yet published | — |
+| Model | 256K accuracy | 1M accuracy | Source |
+|-------|--------------|-------------|--------|
+| Opus 4.6 | 93% | 76% | Anthropic blog + independent analysis (Feb 2026) |
+| Sonnet 4.5 | — | 18.5% | Anthropic blog (Feb 2026) |
+| Sonnet 4.6 | Not yet published | Not yet published | — |

-Note: Opus 4.6 retains strong accuracy at 1M (76%), Sonnet 4.5 degrades sharply. The benchmark is specifically the "8-needle 1M variant" measuring retrieval in a 1M-token document. Sonnet 4.6 MRCR scores have not yet been published by Anthropic.
+Note: Opus 4.6 retains strong accuracy at 1M (76%), Sonnet 4.5 degrades sharply. The benchmark is the "8-needle 1M variant" — finding 8 specific facts in a 1M-token document. The 93% figure at 256K comes from independent analysis of Anthropic's published data. Community validation: a developer loaded ~733K tokens (4 Harry Potter books) and Opus 4.6 retrieved 49/50 documented spells in a single prompt ([HN, Feb 2026](https://news.ycombinator.com/item?id=46905735)). Sonnet 4.6 MRCR scores not yet published.

 **Cost per session (approximate)**