marketing-shibata50/claude-code-ultimate-guide

Florian BRUNIAUX 19110eba22 feat(docs): add comprehensive data privacy documentation v3.2.0

- Create guide/data-privacy.md with retention policies (5y/30d/0)
- Add privacy notice to README.md
- Add section 2.6 "Data Flow & Privacy" to ultimate-guide.md
- Add Golden Rule #7 to cheatsheet.md (know what's sent)
- Add Phase 0.5 Privacy Awareness to onboarding-prompt.md
- Add privacy checks to audit-prompt.md
- Add PRIVACY CHECK section to audit-scan.sh (human + JSON)
- Add privacy reminder to check-claude.sh
- Create privacy-warning.sh SessionStart hook

Addresses user awareness of Anthropic data retention and opt-out options.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-14 20:42:19 +01:00

9 KiB

Raw Blame History

Data Privacy & Retention Guide

Critical: Everything you share with Claude Code is sent to Anthropic servers. This guide explains what data leaves your machine and how to protect sensitive information.

TL;DR - Retention Summary

Configuration	Retention Period	Training	How to Enable
Default	5 years	Yes	(default state)
Opt-out	30 days	No	claude.ai/settings
Enterprise (ZDR)	0 days	No	Enterprise contract

Immediate action: Disable training data usage to reduce retention from 5 years to 30 days.

1. Understanding the Data Flow

What Leaves Your Machine

When you use Claude Code, the following data is sent to Anthropic:

┌─────────────────────────────────────────────────────────────┐
│                    YOUR LOCAL MACHINE                       │
├─────────────────────────────────────────────────────────────┤
│  • Prompts you type                                         │
│  • Files Claude reads (including .env if not excluded!)     │
│  • MCP server results (SQL queries, API responses)          │
│  • Bash command outputs                                     │
│  • Error messages and stack traces                          │
└───────────────────────┬─────────────────────────────────────┘
                        │
                        ▼ HTTPS
┌─────────────────────────────────────────────────────────────┐
│                    ANTHROPIC API                            │
├─────────────────────────────────────────────────────────────┤
│  • Processes your request                                   │
│  • Stores conversation based on retention policy            │
│  • May use data for model training (if not opted out)       │
└─────────────────────────────────────────────────────────────┘

What This Means in Practice

Scenario	Data Sent to Anthropic
You ask Claude to read `src/app.ts`	Full file contents
You run `git status` via Claude	Command output
MCP executes `SELECT * FROM users`	Query results with user data
Claude reads `.env` file	API keys, passwords, secrets
Error occurs in your code	Full stack trace with paths

2. Anthropic Retention Policies

Tier 1: Default (Training Enabled)

Retention: 5 years
Usage: Model improvement, training data
Applies to: Free, Pro, Max plans without opt-out

Tier 2: Training Disabled (Opt-Out)

Retention: 30 days
Usage: Safety monitoring, abuse prevention only
How to enable:
1. Go to https://claude.ai/settings/data-privacy-controls
2. Disable "Allow model training on your conversations"
3. Changes apply immediately

Tier 3: Enterprise API (Zero Data Retention)

Retention: 0 days (real-time processing only)
Usage: None - data not stored
Requires: Enterprise contract with Anthropic
Use cases: HIPAA, GDPR, PCI-DSS compliance, government contracts

3. Known Risks

Risk 1: Automatic File Reading

Claude Code reads files to understand context. By default, this includes:

.env and .env.local files (API keys, passwords)
credentials.json, secrets.yaml (service accounts)
SSH keys if in workspace scope
Database connection strings

Mitigation: Configure excludePatterns (see Section 4).

Risk 2: MCP Database Access

When you configure database MCP servers (Neon, Supabase, PlanetScale):

Your Query: "Show me recent orders"
            ↓
MCP Executes: SELECT * FROM orders LIMIT 100
            ↓
Results Sent: 100 rows with customer names, emails, addresses
            ↓
Stored at Anthropic: According to your retention tier

Mitigation: Never connect production databases. Use dev/staging with anonymized data.

Risk 3: Shell Command Output

Bash commands and their output are included in context:

# This output goes to Anthropic:
$ env | grep API
OPENAI_API_KEY=sk-abc123...
STRIPE_SECRET_KEY=sk_live_...

Mitigation: Use hooks to filter sensitive command outputs.

Risk 4: Documented Community Incidents

Incident	Source
Claude reads `.env` by default	r/ClaudeAI, GitHub issues
DROP TABLE attempts on poorly configured MCP	r/ClaudeAI
Credentials exposed via environment variables	GitHub issues
Prompt injection via malicious MCP servers	r/programming

4. Protective Measures

Immediate Actions

4.1 Opt-Out of Training

Visit https://claude.ai/settings/data-privacy-controls
Toggle OFF "Allow model training"
Retention reduces from 5 years to 30 days

4.2 Configure File Exclusions

In .claude/settings.json:

{
  "excludePatterns": [
    ".env",
    ".env.*",
    "**/.env",
    "**/.env.*",
    "**/credentials*",
    "**/secrets*",
    "**/*.pem",
    "**/*.key",
    "**/service-account*.json"
  ]
}

Or create .claudeignore in project root:

# Secrets
.env
.env.*
*.pem
*.key
credentials.json
secrets/

# Sensitive configs
**/config/production.*

4.3 Use Security Hooks

Create .claude/hooks/PreToolUse.sh:

#!/bin/bash
INPUT=$(cat)
TOOL_NAME=$(echo "$INPUT" | jq -r '.tool.name')

if [[ "$TOOL_NAME" == "Read" ]]; then
    FILE_PATH=$(echo "$INPUT" | jq -r '.tool.input.file_path')

    # Block reading sensitive files
    if [[ "$FILE_PATH" =~ \.env|credentials|secrets|\.pem|\.key ]]; then
        echo "BLOCKED: Attempted to read sensitive file: $FILE_PATH" >&2
        exit 2  # Block the operation
    fi
fi

MCP Best Practices

Rule	Rationale
Never connect production databases	All query results sent to Anthropic
Use read-only database users	Prevents DROP/DELETE/UPDATE accidents
Anonymize development data	Reduces PII exposure risk
Create minimal test datasets	Less data = less risk
Audit MCP server sources	Third-party MCPs may have vulnerabilities

For Teams

Environment	Recommendation
Development	Opt-out + exclusions + anonymized data
Staging	Consider Enterprise API if handling real data
Production	NEVER connect Claude Code directly

5. Comparison with Other Tools

Feature	Claude Code + MCP	Cursor	GitHub Copilot
Data scope sent	Full SQL results, files	Code snippets	Code snippets
Production DB access	Yes (via MCP)	Limited	Not designed for
Default retention	5 years	Variable	30 days
Training by default	Yes	Opt-in	Opt-in

Key difference: MCP creates a unique attack surface because MCP servers are separate processes with independent network/filesystem access.

6. Enterprise Considerations

When to Use Enterprise API (ZDR)

Handling PII (names, emails, addresses)
Regulated industries (HIPAA, GDPR, PCI-DSS)
Client data processing
Government contracts
Financial services

Evaluation Checklist

Data classification policy exists for your organization
API tier matches data sensitivity requirements
Team trained on privacy controls
Incident response plan for potential data exposure
Legal/compliance review completed

7. Quick Reference

Links

Resource	URL
Privacy settings	https://claude.ai/settings/data-privacy-controls
Anthropic usage policy	https://www.anthropic.com/policies
Enterprise information	https://www.anthropic.com/enterprise
Terms of service	https://www.anthropic.com/legal/consumer-terms

Commands

# Check current Claude config
claude /config

# Verify exclusions are loaded
claude /status

# Run privacy audit
./examples/scripts/audit-scan.sh

Quick Checklist

Training opt-out enabled at claude.ai/settings
.env* files in excludePatterns or .claudeignore
No production database connections via MCP
Security hooks installed for sensitive file access
Team aware of data flow to Anthropic

Changelog

2026-01: Initial version - documenting retention policies and protective measures

9 KiB Raw Blame History