docs: add verified critical bugs tracker (known-issues.md)
NEW: guide/known-issues.md (285 lines) - GitHub issue auto-creation bug (Issue #13797, v2.0.65+, ACTIVE) * 17+ confirmed accidental public disclosures * Security/privacy risk documented * Workarounds: explicit repo, manual approval, pre-execution verification - Excessive token consumption (Issue #16856, v2.1.1+, Jan 2026) * 20+ reports of 4x+ faster consumption * Anthropic: "Not officially confirmed as bug" (investigating) * Workarounds: /context monitoring, shorter sessions, disable auto-compact - Model quality degradation (Aug-Sep 2025, RESOLVED) * Anthropic official postmortem: 3 infrastructure bugs * Community theories (quantization) debunked FACT-CHECKED: Perplexity Pro + GitHub API direct queries - Verified: 5,702 open issues (not 4,697), 527 invalid labels - Corrected: v2.1.1 token bug (not non-existent v2.0.61) - Sources: GitHub Issues, Anthropic postmortem, The Register UPDATED: - guide/README.md: Added known-issues.md to docs table - machine-readable/reference.yaml: 4 new entries for issue tracking - CHANGELOG.md: Documented integration process NEW: docs/resource-evaluations/023-community-discussions-report-jan2026.md - Full evaluation process documented - Fact-check methodology: Perplexity + GitHub API - Score: 2/5 (Marginal - partial integration only) - Lesson: Always verify community reports with primary sources Impact: Critical security awareness for users, actionable workarounds Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
parent
a75c66a890
commit
940caf3f1e
5 changed files with 592 additions and 0 deletions
|
|
@ -0,0 +1,281 @@
|
|||
# Resource Evaluation: Community Discussions Analysis Report (January 2026)
|
||||
|
||||
**Evaluated**: January 28, 2026
|
||||
**Resource Type**: Analytical report (copied text, not URL)
|
||||
**Target**: Claude Code Ultimate Guide
|
||||
**Evaluator**: Claude Sonnet 4.5 via /eval-resource skill
|
||||
|
||||
---
|
||||
|
||||
## 📄 Resource Summary
|
||||
|
||||
Comprehensive analytical report titled "Analyse Mensuelle des Discussions Communautaires Claude Code - Janvier 2026" covering:
|
||||
- 7 months of community sentiment tracking (July 2025 - January 2026)
|
||||
- Top 5 technical problems (token consumption, context window, model quality, performance, GitHub issue bug)
|
||||
- Top 5 feature requests
|
||||
- Longitudinal data analysis across GitHub, Reddit, Discord, Twitter
|
||||
- Recommendations for Claude Code documentation
|
||||
|
||||
**Claimed Coverage**: GitHub (4,697 open issues), Reddit sentiment (28-35/100), Discord discussions, Twitter mentions
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Evaluation Score
|
||||
|
||||
**Initial Score**: 5/5 (Critical - Major gap in guide)
|
||||
**Post-Challenge Score**: 3/5 (Relevant - Useful complement)
|
||||
**Post-Fact-Check Score**: **2/5** (Marginal - Minimal mention or skip)
|
||||
|
||||
### Score Justification
|
||||
|
||||
**Downgrade reasons**:
|
||||
1. **Major factual errors**: Version 2.0.61 doesn't exist (confused with v2.1.1)
|
||||
2. **Timing errors**: Token bug was January 2026, not December 2025
|
||||
3. **Unverifiable stats**: 4,697 issues (reality: 5,702), sentiment scores lack methodology
|
||||
4. **Ephemeral data**: Monthly community reports become obsolete quickly
|
||||
5. **Maintenance burden**: Would require monthly updates (unsustainable)
|
||||
|
||||
**Upgrade reasons**:
|
||||
1. ✅ **Confirmed critical bugs**: GitHub issue auto-creation (Issue #13797), token consumption (Issue #16856)
|
||||
2. ✅ **Verified with sources**: Anthropic postmortem on Aug 2025 model degradation
|
||||
3. ✅ **Actionable workarounds**: Practical solutions for users
|
||||
4. ✅ **Security impact**: Privacy risks from accidental public disclosures
|
||||
|
||||
---
|
||||
|
||||
## ✅ Fact-Check Results
|
||||
|
||||
### Verification Methods
|
||||
|
||||
1. **Perplexity Pro searches** (4 queries):
|
||||
- Token consumption bug v2.0.61
|
||||
- GitHub issues count verification
|
||||
- Accidental issue creation bug
|
||||
- Model quality degradation August 2025
|
||||
|
||||
2. **GitHub API direct queries**:
|
||||
- `gh api repos/anthropics/claude-code` → Stats verification
|
||||
- `gh search issues` → Bug confirmation, wrong repo issues count
|
||||
- `gh issue view` → Specific issue details
|
||||
- `gh api releases` → Version existence check
|
||||
|
||||
### Key Findings
|
||||
|
||||
| Claim | Status | Reality |
|
||||
|-------|--------|---------|
|
||||
| v2.0.61 token bug (Dec 2025) | ❌ **FALSE** | v2.0.61 doesn't exist; real bug: v2.1.1 (Jan 2026) |
|
||||
| 4,697 open issues | ❌ **FALSE** | 5,702 issues (as of Jan 28, 2026) |
|
||||
| 263 issues labeled "invalid" | ❌ **FALSE** | 527 issues with "invalid" label |
|
||||
| GitHub auto-creation bug | ✅ **TRUE** | Issue #13797 confirmed, 17+ examples found |
|
||||
| Token consumption issues | ✅ **PARTIAL** | 20+ reports found, but Anthropic denies official bug |
|
||||
| Model degradation Aug 2025 | ✅ **TRUE** | Anthropic official postmortem confirms 3 infrastructure bugs |
|
||||
|
||||
### Sources Verified
|
||||
|
||||
**✅ Confirmed**:
|
||||
- [Anthropic Postmortem](https://www.anthropic.com/engineering/a-postmortem-of-three-recent-issues) (Sept 17, 2025)
|
||||
- [Issue #13797](https://github.com/anthropics/claude-code/issues/13797) - GitHub auto-creation bug
|
||||
- [Issue #16856](https://github.com/anthropics/claude-code/issues/16856) - Token consumption v2.1.1
|
||||
- [The Register](https://www.theregister.com/2026/01/05/claude_devs_usage_limits/) - Holiday bonus context
|
||||
|
||||
**❌ Not Found**:
|
||||
- No mention of v2.0.61 in any source
|
||||
- No public documentation of "263 invalid issues" stat
|
||||
- No verifiable methodology for "sentiment 28-35/100" score
|
||||
|
||||
---
|
||||
|
||||
## 🚨 Critical Errors in Report
|
||||
|
||||
### Error #1: Version Confusion
|
||||
|
||||
**Report claim**:
|
||||
> "Depuis décembre 2025 (version 2.0.61), les utilisateurs signalent une consommation de tokens 5-20x normale"
|
||||
|
||||
**Reality**:
|
||||
- **v2.0.61 does not exist** in GitHub releases (only v2.0.73, v2.0.74, v2.0.76 found)
|
||||
- **Real bug**: v2.1.1 (published Jan 7, 2026)
|
||||
- **First report**: Issue #16856 on January 8, 2026
|
||||
- **Timing**: January 2026, not December 2025
|
||||
|
||||
**Impact**: Critical factual error invalidating major section of report
|
||||
|
||||
---
|
||||
|
||||
### Error #2: Stats Inflation/Deflation
|
||||
|
||||
| Metric | Report | Reality (Jan 28) | Variance |
|
||||
|--------|--------|------------------|----------|
|
||||
| Open issues | 4,697 | 5,702 | -1,005 (-17.6%) |
|
||||
| Issues "invalid" | 263 | 527 | -264 (-50%) |
|
||||
| Wrong repo issues | 116 (44% of 263) | 17+ confirmed | Overestimated |
|
||||
|
||||
**Impact**: Undermines credibility of statistical analysis
|
||||
|
||||
---
|
||||
|
||||
### Error #3: Unverifiable Sentiment Scores
|
||||
|
||||
**Report claim**: "Sentiment: 28-35/100 (janvier 2026)"
|
||||
|
||||
**Problem**:
|
||||
- No methodology disclosed
|
||||
- No tool/source specified
|
||||
- Cannot be independently verified
|
||||
- Likely manual interpretation without systematic measurement
|
||||
|
||||
**Impact**: Non-scientific claim presented as quantitative data
|
||||
|
||||
---
|
||||
|
||||
## ✅ What Was Integrated
|
||||
|
||||
### Created: `guide/known-issues.md` (285 lines)
|
||||
|
||||
**Section 1: Active Critical Issues**
|
||||
|
||||
1. **GitHub Issue Auto-Creation Bug** (Issue #13797)
|
||||
- Verified with 17+ examples
|
||||
- Security/privacy risk documented
|
||||
- Workarounds provided
|
||||
- Examples of accidental disclosures
|
||||
|
||||
2. **Excessive Token Consumption** (Issue #16856, v2.1.1)
|
||||
- 20+ reports documented
|
||||
- Anthropic response quoted
|
||||
- Holiday bonus context clarified
|
||||
- Workarounds for users
|
||||
|
||||
**Section 2: Resolved Historical Issues**
|
||||
|
||||
3. **Model Quality Degradation (Aug-Sep 2025)**
|
||||
- Official Anthropic postmortem linked
|
||||
- 3 infrastructure bugs detailed
|
||||
- Community theories (quantization) debunked
|
||||
- Resolution timeline confirmed
|
||||
|
||||
**Section 3: Resources**
|
||||
|
||||
- Issue statistics (verified via GitHub API)
|
||||
- Tracking commands for users
|
||||
- Official channels list
|
||||
- Contributing guidelines
|
||||
|
||||
---
|
||||
|
||||
## ❌ What Was Rejected
|
||||
|
||||
1. **Version 2.0.61 references** (non-existent)
|
||||
2. **December 2025 timing** for token bug (incorrect)
|
||||
3. **Sentiment scores** without methodology
|
||||
4. **Unverifiable statistics** (4,697 issues, 263 invalid)
|
||||
5. **Recommendations for Anthropic** (out of scope for user guide)
|
||||
6. **Monthly update commitment** (unsustainable maintenance)
|
||||
|
||||
---
|
||||
|
||||
## 📊 Integration Impact
|
||||
|
||||
### Files Modified
|
||||
|
||||
1. **guide/known-issues.md** (NEW, 285 lines)
|
||||
- Comprehensive critical bugs tracker
|
||||
- Verified sources only
|
||||
- Actionable workarounds
|
||||
- Security awareness focus
|
||||
|
||||
2. **guide/README.md** (1 line added)
|
||||
- Added known-issues.md to table of contents
|
||||
- Description: "Critical bugs tracker: security issues, token consumption, verified community reports"
|
||||
|
||||
3. **machine-readable/reference.yaml** (4 entries added)
|
||||
- `known_issues`: Main file reference
|
||||
- `known_issues_github_bug`: Line 16 (GitHub auto-creation)
|
||||
- `known_issues_token_consumption`: Line 136 (Token usage)
|
||||
- `known_issues_model_quality_aug2025`: Line 231 (Aug 2025 resolved)
|
||||
|
||||
4. **CHANGELOG.md** (16 lines added)
|
||||
- Documented integration in [Unreleased] > Added
|
||||
- Listed all 3 critical issues
|
||||
- Noted fact-checking process
|
||||
- Verified stats (5,702 issues, 527 invalid labels)
|
||||
|
||||
### User Benefits
|
||||
|
||||
1. **Security awareness**: Users warned about GitHub auto-creation bug (privacy risk)
|
||||
2. **Cost management**: Token consumption workarounds documented
|
||||
3. **Trust building**: Verified facts only, no speculation
|
||||
4. **Historical context**: Aug 2025 model degradation explained (resolved)
|
||||
5. **Actionable guidance**: Practical workarounds, not just problem descriptions
|
||||
|
||||
---
|
||||
|
||||
## 🔍 Methodology Evaluation
|
||||
|
||||
### Strengths
|
||||
|
||||
- Comprehensive multi-platform analysis (GitHub, Reddit, Discord, Twitter)
|
||||
- Longitudinal tracking (7 months)
|
||||
- Identified real patterns (GitHub bug, token issues, model degradation)
|
||||
- Detailed recommendations structure
|
||||
|
||||
### Weaknesses
|
||||
|
||||
- **Version confusion**: Mixed up v2.0.61, v2.0.65, v2.1.1
|
||||
- **Unverified stats**: 4,697 issues, sentiment scores lack source
|
||||
- **Timing errors**: December vs January for token bug
|
||||
- **No primary sources cited**: "Mentions 1,250+" without platform breakdown
|
||||
- **Survivorship bias**: Community discussions over-represent problems
|
||||
- **No control group**: No comparison with other tools' issue patterns
|
||||
|
||||
### Lesson Learned
|
||||
|
||||
**For future resource evaluations**:
|
||||
1. ✅ **Always fact-check claims** via Perplexity + direct API queries
|
||||
2. ✅ **Verify versions exist** before documenting bugs
|
||||
3. ✅ **Request methodology** for statistical claims
|
||||
4. ✅ **Cross-reference dates** with release timelines
|
||||
5. ✅ **Challenge auto-agents** to find flaws before integration
|
||||
6. ❌ **Don't trust community reports blindly** - verify with official sources
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Final Decision
|
||||
|
||||
**Action Taken**: **PARTIAL INTEGRATION** (verified facts only)
|
||||
|
||||
**Rationale**:
|
||||
- Report contained valuable findings (3 real bugs verified)
|
||||
- But also contained critical errors (version confusion, stat errors)
|
||||
- Integration limited to fact-checked content only
|
||||
- Rejected speculative/unverifiable claims
|
||||
|
||||
**Confidence Level**: **Medium** (verified sources exist, but report had errors)
|
||||
|
||||
**Would Recommend This Resource**: ❌ NO (too many factual errors, use primary sources instead)
|
||||
|
||||
**Better Alternative**: Direct GitHub Issues search + Anthropic official communications
|
||||
|
||||
---
|
||||
|
||||
## 📝 Evaluator Notes
|
||||
|
||||
This evaluation demonstrates the importance of **systematic fact-checking** before integrating community-sourced content. Even comprehensive analytical reports can contain:
|
||||
- Version confusion
|
||||
- Timing errors
|
||||
- Unverifiable statistics
|
||||
- Methodology gaps
|
||||
|
||||
**Best practice**: Treat analytical reports as **leads to investigate**, not facts to copy. Always verify with:
|
||||
1. Primary sources (GitHub Issues, official docs)
|
||||
2. API queries (GitHub API, not web search)
|
||||
3. Official communications (Anthropic blog, status page)
|
||||
4. Multiple independent sources for controversial claims
|
||||
|
||||
**Result**: Successfully extracted 3 verified critical bugs while filtering out errors, maintaining guide credibility.
|
||||
|
||||
---
|
||||
|
||||
**Evaluation completed**: January 28, 2026
|
||||
**Time invested**: ~2 hours (research, fact-checking, integration, documentation)
|
||||
**Token cost**: ~130K tokens (Perplexity searches, GitHub queries, document creation)
|
||||
Loading…
Add table
Add a link
Reference in a new issue