docs: add verified critical bugs tracker (known-issues.md)

NEW: guide/known-issues.md (285 lines)
- GitHub issue auto-creation bug (Issue #13797, v2.0.65+, ACTIVE)
  * 17+ confirmed accidental public disclosures
  * Security/privacy risk documented
  * Workarounds: explicit repo, manual approval, pre-execution verification
- Excessive token consumption (Issue #16856, v2.1.1+, Jan 2026)
  * 20+ reports of 4x+ faster consumption
  * Anthropic: "Not officially confirmed as bug" (investigating)
  * Workarounds: /context monitoring, shorter sessions, disable auto-compact
- Model quality degradation (Aug-Sep 2025, RESOLVED)
  * Anthropic official postmortem: 3 infrastructure bugs
  * Community theories (quantization) debunked

FACT-CHECKED: Perplexity Pro + GitHub API direct queries
- Verified: 5,702 open issues (not 4,697), 527 invalid labels
- Corrected: v2.1.1 token bug (not non-existent v2.0.61)
- Sources: GitHub Issues, Anthropic postmortem, The Register

UPDATED:
- guide/README.md: Added known-issues.md to docs table
- machine-readable/reference.yaml: 4 new entries for issue tracking
- CHANGELOG.md: Documented integration process

NEW: docs/resource-evaluations/023-community-discussions-report-jan2026.md
- Full evaluation process documented
- Fact-check methodology: Perplexity + GitHub API
- Score: 2/5 (Marginal - partial integration only)
- Lesson: Always verify community reports with primary sources

Impact: Critical security awareness for users, actionable workarounds

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
Florian BRUNIAUX 2026-01-28 17:59:16 +01:00
parent a75c66a890
commit 940caf3f1e
5 changed files with 592 additions and 0 deletions

View file

@ -0,0 +1,281 @@
# Resource Evaluation: Community Discussions Analysis Report (January 2026)
**Evaluated**: January 28, 2026
**Resource Type**: Analytical report (copied text, not URL)
**Target**: Claude Code Ultimate Guide
**Evaluator**: Claude Sonnet 4.5 via /eval-resource skill
---
## 📄 Resource Summary
Comprehensive analytical report titled "Analyse Mensuelle des Discussions Communautaires Claude Code - Janvier 2026" covering:
- 7 months of community sentiment tracking (July 2025 - January 2026)
- Top 5 technical problems (token consumption, context window, model quality, performance, GitHub issue bug)
- Top 5 feature requests
- Longitudinal data analysis across GitHub, Reddit, Discord, Twitter
- Recommendations for Claude Code documentation
**Claimed Coverage**: GitHub (4,697 open issues), Reddit sentiment (28-35/100), Discord discussions, Twitter mentions
---
## 🎯 Evaluation Score
**Initial Score**: 5/5 (Critical - Major gap in guide)
**Post-Challenge Score**: 3/5 (Relevant - Useful complement)
**Post-Fact-Check Score**: **2/5** (Marginal - Minimal mention or skip)
### Score Justification
**Downgrade reasons**:
1. **Major factual errors**: Version 2.0.61 doesn't exist (confused with v2.1.1)
2. **Timing errors**: Token bug was January 2026, not December 2025
3. **Unverifiable stats**: 4,697 issues (reality: 5,702), sentiment scores lack methodology
4. **Ephemeral data**: Monthly community reports become obsolete quickly
5. **Maintenance burden**: Would require monthly updates (unsustainable)
**Upgrade reasons**:
1. ✅ **Confirmed critical bugs**: GitHub issue auto-creation (Issue #13797), token consumption (Issue #16856)
2. ✅ **Verified with sources**: Anthropic postmortem on Aug 2025 model degradation
3. ✅ **Actionable workarounds**: Practical solutions for users
4. ✅ **Security impact**: Privacy risks from accidental public disclosures
---
## ✅ Fact-Check Results
### Verification Methods
1. **Perplexity Pro searches** (4 queries):
- Token consumption bug v2.0.61
- GitHub issues count verification
- Accidental issue creation bug
- Model quality degradation August 2025
2. **GitHub API direct queries**:
- `gh api repos/anthropics/claude-code` → Stats verification
- `gh search issues` → Bug confirmation, wrong repo issues count
- `gh issue view` → Specific issue details
- `gh api releases` → Version existence check
### Key Findings
| Claim | Status | Reality |
|-------|--------|---------|
| v2.0.61 token bug (Dec 2025) | ❌ **FALSE** | v2.0.61 doesn't exist; real bug: v2.1.1 (Jan 2026) |
| 4,697 open issues | ❌ **FALSE** | 5,702 issues (as of Jan 28, 2026) |
| 263 issues labeled "invalid" | ❌ **FALSE** | 527 issues with "invalid" label |
| GitHub auto-creation bug | ✅ **TRUE** | Issue #13797 confirmed, 17+ examples found |
| Token consumption issues | ✅ **PARTIAL** | 20+ reports found, but Anthropic denies official bug |
| Model degradation Aug 2025 | ✅ **TRUE** | Anthropic official postmortem confirms 3 infrastructure bugs |
### Sources Verified
**✅ Confirmed**:
- [Anthropic Postmortem](https://www.anthropic.com/engineering/a-postmortem-of-three-recent-issues) (Sept 17, 2025)
- [Issue #13797](https://github.com/anthropics/claude-code/issues/13797) - GitHub auto-creation bug
- [Issue #16856](https://github.com/anthropics/claude-code/issues/16856) - Token consumption v2.1.1
- [The Register](https://www.theregister.com/2026/01/05/claude_devs_usage_limits/) - Holiday bonus context
**❌ Not Found**:
- No mention of v2.0.61 in any source
- No public documentation of "263 invalid issues" stat
- No verifiable methodology for "sentiment 28-35/100" score
---
## 🚨 Critical Errors in Report
### Error #1: Version Confusion
**Report claim**:
> "Depuis décembre 2025 (version 2.0.61), les utilisateurs signalent une consommation de tokens 5-20x normale"
**Reality**:
- **v2.0.61 does not exist** in GitHub releases (only v2.0.73, v2.0.74, v2.0.76 found)
- **Real bug**: v2.1.1 (published Jan 7, 2026)
- **First report**: Issue #16856 on January 8, 2026
- **Timing**: January 2026, not December 2025
**Impact**: Critical factual error invalidating major section of report
---
### Error #2: Stats Inflation/Deflation
| Metric | Report | Reality (Jan 28) | Variance |
|--------|--------|------------------|----------|
| Open issues | 4,697 | 5,702 | -1,005 (-17.6%) |
| Issues "invalid" | 263 | 527 | -264 (-50%) |
| Wrong repo issues | 116 (44% of 263) | 17+ confirmed | Overestimated |
**Impact**: Undermines credibility of statistical analysis
---
### Error #3: Unverifiable Sentiment Scores
**Report claim**: "Sentiment: 28-35/100 (janvier 2026)"
**Problem**:
- No methodology disclosed
- No tool/source specified
- Cannot be independently verified
- Likely manual interpretation without systematic measurement
**Impact**: Non-scientific claim presented as quantitative data
---
## ✅ What Was Integrated
### Created: `guide/known-issues.md` (285 lines)
**Section 1: Active Critical Issues**
1. **GitHub Issue Auto-Creation Bug** (Issue #13797)
- Verified with 17+ examples
- Security/privacy risk documented
- Workarounds provided
- Examples of accidental disclosures
2. **Excessive Token Consumption** (Issue #16856, v2.1.1)
- 20+ reports documented
- Anthropic response quoted
- Holiday bonus context clarified
- Workarounds for users
**Section 2: Resolved Historical Issues**
3. **Model Quality Degradation (Aug-Sep 2025)**
- Official Anthropic postmortem linked
- 3 infrastructure bugs detailed
- Community theories (quantization) debunked
- Resolution timeline confirmed
**Section 3: Resources**
- Issue statistics (verified via GitHub API)
- Tracking commands for users
- Official channels list
- Contributing guidelines
---
## ❌ What Was Rejected
1. **Version 2.0.61 references** (non-existent)
2. **December 2025 timing** for token bug (incorrect)
3. **Sentiment scores** without methodology
4. **Unverifiable statistics** (4,697 issues, 263 invalid)
5. **Recommendations for Anthropic** (out of scope for user guide)
6. **Monthly update commitment** (unsustainable maintenance)
---
## 📊 Integration Impact
### Files Modified
1. **guide/known-issues.md** (NEW, 285 lines)
- Comprehensive critical bugs tracker
- Verified sources only
- Actionable workarounds
- Security awareness focus
2. **guide/README.md** (1 line added)
- Added known-issues.md to table of contents
- Description: "Critical bugs tracker: security issues, token consumption, verified community reports"
3. **machine-readable/reference.yaml** (4 entries added)
- `known_issues`: Main file reference
- `known_issues_github_bug`: Line 16 (GitHub auto-creation)
- `known_issues_token_consumption`: Line 136 (Token usage)
- `known_issues_model_quality_aug2025`: Line 231 (Aug 2025 resolved)
4. **CHANGELOG.md** (16 lines added)
- Documented integration in [Unreleased] > Added
- Listed all 3 critical issues
- Noted fact-checking process
- Verified stats (5,702 issues, 527 invalid labels)
### User Benefits
1. **Security awareness**: Users warned about GitHub auto-creation bug (privacy risk)
2. **Cost management**: Token consumption workarounds documented
3. **Trust building**: Verified facts only, no speculation
4. **Historical context**: Aug 2025 model degradation explained (resolved)
5. **Actionable guidance**: Practical workarounds, not just problem descriptions
---
## 🔍 Methodology Evaluation
### Strengths
- Comprehensive multi-platform analysis (GitHub, Reddit, Discord, Twitter)
- Longitudinal tracking (7 months)
- Identified real patterns (GitHub bug, token issues, model degradation)
- Detailed recommendations structure
### Weaknesses
- **Version confusion**: Mixed up v2.0.61, v2.0.65, v2.1.1
- **Unverified stats**: 4,697 issues, sentiment scores lack source
- **Timing errors**: December vs January for token bug
- **No primary sources cited**: "Mentions 1,250+" without platform breakdown
- **Survivorship bias**: Community discussions over-represent problems
- **No control group**: No comparison with other tools' issue patterns
### Lesson Learned
**For future resource evaluations**:
1. ✅ **Always fact-check claims** via Perplexity + direct API queries
2. ✅ **Verify versions exist** before documenting bugs
3. ✅ **Request methodology** for statistical claims
4. ✅ **Cross-reference dates** with release timelines
5. ✅ **Challenge auto-agents** to find flaws before integration
6. ❌ **Don't trust community reports blindly** - verify with official sources
---
## 🎯 Final Decision
**Action Taken**: **PARTIAL INTEGRATION** (verified facts only)
**Rationale**:
- Report contained valuable findings (3 real bugs verified)
- But also contained critical errors (version confusion, stat errors)
- Integration limited to fact-checked content only
- Rejected speculative/unverifiable claims
**Confidence Level**: **Medium** (verified sources exist, but report had errors)
**Would Recommend This Resource**: ❌ NO (too many factual errors, use primary sources instead)
**Better Alternative**: Direct GitHub Issues search + Anthropic official communications
---
## 📝 Evaluator Notes
This evaluation demonstrates the importance of **systematic fact-checking** before integrating community-sourced content. Even comprehensive analytical reports can contain:
- Version confusion
- Timing errors
- Unverifiable statistics
- Methodology gaps
**Best practice**: Treat analytical reports as **leads to investigate**, not facts to copy. Always verify with:
1. Primary sources (GitHub Issues, official docs)
2. API queries (GitHub API, not web search)
3. Official communications (Anthropic blog, status page)
4. Multiple independent sources for controversial claims
**Result**: Successfully extracted 3 verified critical bugs while filtering out errors, maintaining guide credibility.
---
**Evaluation completed**: January 28, 2026
**Time invested**: ~2 hours (research, fact-checking, integration, documentation)
**Token cost**: ~130K tokens (Perplexity searches, GitHub queries, document creation)