diff --git a/README.md b/README.md index 3f3da51..8dcc5c0 100644 --- a/README.md +++ b/README.md @@ -335,6 +335,12 @@ This guide is the result of several months of daily practice with Claude Code. I - [zebbern/claude-code-guide](https://github.com/zebbern/claude-code-guide) — Comprehensive reference with security focus - [ykdojo/claude-code-tips](https://github.com/ykdojo/claude-code-tips) — Practical productivity techniques +**External Research Tools**: +- [Petri 2.0](https://github.com/safety-research/petri) — Open-source AI behavior audit tool (Anthropic Alignment) + - 70 scenarios for collusion, ethics conflicts, info sensitivity + - Eval-awareness mitigations + benchmarks (Claude Opus 4.5, GPT-5.2, Gemini 3 Pro, Grok 4) + - [Blog](https://alignment.anthropic.com/2026/petri-v2/) +
diff --git a/guide/data-privacy.md b/guide/data-privacy.md index b219a58..54dfa6b 100644 --- a/guide/data-privacy.md +++ b/guide/data-privacy.md @@ -293,7 +293,45 @@ This guide focuses on Claude Code usage—not legal strategy. For IP guidance, c --- +## 9. Claude's Governance & Values + +### Constitutional AI Framework + +Anthropic published Claude's constitution in January 2026 (CC0 license - public domain). This document defines the value hierarchy that guides Claude's behavior: + +**Priority Order** (used to resolve conflicts): + +1. **Broadly safe** - Never compromise human supervision and control +2. **Broadly ethical** - Honesty, harm avoidance, good conduct +3. **Anthropic compliance** - Internal guidelines and policies +4. **Genuinely helpful** - Real utility for users and society + +### What This Means for Claude Code Users + +| Scenario | Expected Behavior | +|----------|-------------------| +| Security-sensitive requests | Claude prioritizes safety over helpfulness (may be more conservative) | +| Borderline biology/chemistry | May decline or ask for context to assess safety implications | +| Ethical conflicts | Will follow hierarchy: safety > ethics > compliance > utility | + +### Why This Matters + +- **Training data source**: Constitution is used to generate synthetic training examples +- **Behavior specification**: Reference document explaining intended vs. accidental outputs +- **Audit & governance**: Provides legal/ethical foundation for compliance reviews +- **Your own agents**: CC0 license allows reuse/adaptation for custom models + +### Resources + +- Constitution full text: https://www.anthropic.com/constitution +- PDF version: https://www-cdn.anthropic.com/.../claudes-constitution.pdf +- Announcement: https://www.anthropic.com/news/claude-new-constitution +- Alignment research: https://alignment.anthropic.com/ + +--- + ## Changelog +- 2026-01: Added Claude's governance & constitutional AI framework section - 2026-01: Added intellectual property considerations section - 2026-01: Initial version - documenting retention policies and protective measures diff --git a/machine-readable/reference.yaml b/machine-readable/reference.yaml index 3e4d9b9..a8629d9 100644 --- a/machine-readable/reference.yaml +++ b/machine-readable/reference.yaml @@ -277,6 +277,26 @@ deep_dive: description: "Real-time monitoring UI for Gas Town and multiclaude (SSE + SQLite)" status: "Early preview (Jan 2026, v0.2.0)" guide_section: "guide/ai-ecosystem.md:850" + # External research & alignment tools + external_research: + claude_constitution: + url: "https://www.anthropic.com/constitution" + pdf: "https://www-cdn.anthropic.com/9214f02e82c4489fb6cf45441d448a1ecd1a3aca/claudes-constitution.pdf" + announcement: "https://www.anthropic.com/news/claude-new-constitution" + description: "Claude's Constitutional AI framework - value hierarchy (safety > ethics > compliance > utility)" + license: "CC0 1.0 (public domain)" + published: "2026-01-21" + guide_section: "guide/data-privacy.md:296" + petri_v2: + repo: "https://github.com/safety-research/petri" + blog: "https://alignment.anthropic.com/2026/petri-v2/" + description: "Open-source AI behavior audit tool (Anthropic Alignment Science)" + features: + - "70 scenarios: collusion, ethics conflicts, info sensitivity" + - "Eval-awareness mitigations" + - "Benchmarks: Claude Opus 4.5, GPT-5.2, Gemini 3 Pro, Grok 4" + published: "2026-01-21" + guide_section: "README.md:338" # Section 9.18 - Codebase Design for Agent Productivity codebase_design_agents: 9976 codebase_design_source: "https://marmelab.com/blog/2026/01/21/agent-experience.html"