feat(docs): add LLM Handbook + Google Whitepaper integration v3.3.0

Advanced Guardrails:
- prompt-injection-detector.sh (PreToolUse)
- output-validator.sh (PostToolUse heuristics)
- claudemd-scanner.sh (SessionStart injection detection)
- output-secrets-scanner.sh (PostToolUse secrets leak prevention)

Observability & Monitoring:
- session-logger.sh (JSONL activity logging)
- session-stats.sh (cost tracking & analysis)
- guide/observability.md (full documentation)

LLM-as-a-Judge Evaluation:
- output-evaluator.md agent (Haiku)
- /validate-changes command
- pre-commit-evaluator.sh (opt-in git hook)

Google Agent Whitepaper Integration:
- Context Triage Guide (Section 2.2.4)
- CLAUDE.md Injection Warning (Section 3.1.3)
- Agent Validation Checklist (Section 4.2.4)
- MCP Security: Tool Shadowing & Confused Deputy (Section 8.6)
- Session vs Memory patterns (Section 3.3.3)

Stats: 10 new files, 8 modified, 5 new guide sections

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Florian BRUNIAUX 2026-01-14 21:00:49 +01:00
parent 19110eba22
commit 8a4d116e2e
17 changed files with 2188 additions and 3 deletions

View file

@ -8,6 +8,8 @@ Hooks are scripts that execute automatically on Claude Code events. They enable
|------|-------|---------|----------|
| [dangerous-actions-blocker.sh](./bash/dangerous-actions-blocker.sh) | PreToolUse | Block dangerous commands/edits | Bash |
| [security-check.sh](./bash/security-check.sh) | PreToolUse | Block secrets in commands | Bash |
| [claudemd-scanner.sh](./bash/claudemd-scanner.sh) | SessionStart | Detect CLAUDE.md injection attacks | Bash |
| [output-secrets-scanner.sh](./bash/output-secrets-scanner.sh) | PostToolUse | Detect secrets in tool outputs | Bash |
| [auto-format.sh](./bash/auto-format.sh) | PostToolUse | Auto-format after edits | Bash |
| [notification.sh](./bash/notification.sh) | Notification | Contextual macOS sound alerts | Bash (macOS) |
| [security-check.ps1](./powershell/security-check.ps1) | PreToolUse | Block secrets in commands | PowerShell |
@ -25,6 +27,99 @@ Hooks are scripts that execute automatically on Claude Code events. They enable
| `SessionEnd` | At session end | Cleanup, session summary |
| `Stop` | User interrupts operation | State saving, graceful shutdown |
## Advanced Guardrails (NEW in v3.3.0)
Advanced protection patterns inspired by production LLM systems.
### prompt-injection-detector.sh
**Event**: `PreToolUse`
Detects and blocks prompt injection attempts before they reach Claude:
**Detected Patterns**:
- Role override: "ignore previous instructions", "you are now", "pretend to be"
- Jailbreak attempts: "DAN mode", "developer mode", "no restrictions"
- Delimiter injection: `</system>`, `[INST]`, `<<SYS>>`
- Authority impersonation: "anthropic employee", "authorized to bypass"
- Base64-encoded payloads (decoded and scanned)
- Context manipulation: false claims about previous messages
**Configuration**:
```json
{
"hooks": {
"PreToolUse": [{
"hooks": [{
"type": "command",
"command": "$CLAUDE_PROJECT_DIR/.claude/hooks/prompt-injection-detector.sh",
"timeout": 5000
}]
}]
}
}
```
### output-validator.sh
**Event**: `PostToolUse`
Heuristic validation of Claude's outputs (no LLM call, pure bash):
**Validation Checks**:
- Placeholder paths: `/path/to/`, `/your/project/`
- Placeholder content: `TODO:`, `your-api-key`, `example.com`
- Potential secrets in output (regex patterns)
- Uncertainty indicators (multiple "I'm not sure", "probably")
- Incomplete implementations: `NotImplementedError`, `throw new Error`
- Unverified reference claims
**Behavior**: Warns via `systemMessage`, does not block. For deeper validation, use the `output-evaluator` agent.
### session-logger.sh
**Event**: `PostToolUse`
Logs all Claude operations to JSONL files for monitoring and cost tracking:
**Log Location**: `~/.claude/logs/activity-YYYY-MM-DD.jsonl`
**Logged Data**:
- Timestamp, session ID, tool name
- File paths and commands (truncated)
- Project name
- Token estimates (input/output)
**Analysis**: Use `session-stats.sh` script to analyze logs.
**Environment Variables**:
| Variable | Default | Description |
|----------|---------|-------------|
| `CLAUDE_LOG_DIR` | `~/.claude/logs` | Log directory |
| `CLAUDE_LOG_TOKENS` | `true` | Enable token estimation |
| `CLAUDE_SESSION_ID` | auto | Custom session ID |
See [Observability Guide](../../guide/observability.md) for full documentation.
### pre-commit-evaluator.sh
**Type**: Git pre-commit hook (not Claude hook)
LLM-as-a-Judge evaluation before every commit. **Opt-in only** due to API costs.
**Installation**:
```bash
cp pre-commit-evaluator.sh .git/hooks/pre-commit
chmod +x .git/hooks/pre-commit
export CLAUDE_PRECOMMIT_EVAL=1 # Enable evaluation
```
**Cost**: ~$0.01-0.05 per commit (Haiku model)
**Bypass**: `git commit --no-verify` or `CLAUDE_SKIP_EVAL=1 git commit`
---
## Security Hooks
### dangerous-actions-blocker.sh
@ -77,6 +172,72 @@ Focused on detecting secrets in commands:
- Private keys
- Hardcoded tokens
### claudemd-scanner.sh
**Event**: `SessionStart`
Scans CLAUDE.md files at session start for potential prompt injection attacks:
**Detected Patterns**:
- "ignore previous instructions" variants
- Shell injection: `curl | bash`, `wget | sh`, `eval(`
- Base64 encoded content (potential obfuscation)
- Hidden instructions in HTML comments
- Suspicious long lines (>500 chars)
- Non-ASCII characters near sensitive keywords (homoglyph attacks)
**Files Scanned**:
- `CLAUDE.md` (project root)
- `.claude/CLAUDE.md` (local override)
- Any `.md` files in `.claude/` directory
**Why This Matters**: When you clone an unfamiliar repository, a malicious CLAUDE.md could inject instructions that compromise your system. This hook warns you before Claude processes potentially dangerous instructions.
**Configuration**:
```json
{
"hooks": {
"SessionStart": [{
"hooks": [{
"type": "command",
"command": ".claude/hooks/claudemd-scanner.sh",
"timeout": 5000
}]
}]
}
}
```
### output-secrets-scanner.sh
**Event**: `PostToolUse`
Complements `security-check.sh` by scanning tool **outputs** (not inputs) for leaked secrets.
**Detected Patterns**:
- API Keys: OpenAI, Anthropic, AWS, GCP, Azure, Stripe, Twilio, SendGrid
- Tokens: GitHub, GitLab, NPM, PyPI, JWT
- Private Keys: RSA, EC, DSA, OpenSSH, PGP
- Database URLs with embedded passwords
- Generic `api_key=`, `secret=`, `password=` patterns
**Why This Matters**: Claude might read a `.env` file and include credentials in its response or a commit. This hook catches secrets before they leak.
**Configuration**:
```json
{
"hooks": {
"PostToolUse": [{
"hooks": [{
"type": "command",
"command": ".claude/hooks/output-secrets-scanner.sh",
"timeout": 5000
}]
}]
}
}
```
## Productivity Hooks
### auto-format.sh / auto-format.ps1