Add security sanitizer, opt-in telemetry, and contributor guidelines

Infrastructure: - security/: PII sanitizer with scan/sanitize modes, pre-commit hook, configurable blocklists - telemetry/: GStack-style opt-in usage analytics, local stats viewer, version checker - CONTRIBUTING.md: Privacy-first contributor guidelines with anonymization rules - VERSION: 1.0.0 README updated with Privacy & Security and Telemetry sections.
2026-03-31 08:41:35 -07:00 · 2026-03-31 08:41:35 -07:00 · d4c8c21cb3
commit d4c8c21cb3
parent 36d6ed83e7
12 changed files with 1402 additions and 4 deletions
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -0,0 +1,107 @@
+# Contributing to AI Marketing Skills
+
+AI Marketing Skills is an open-source collection of production marketing automation skills. Thanks for contributing.
+
+- **Repo:** [github.com/singlegrain/ai-marketing-skills](https://github.com/singlegrain/ai-marketing-skills)
+- **README:** [README.md](./README.md)
+
+---
+
+## 🔒 Data Privacy & Anonymization
+
+**This is the #1 rule. No exceptions.**
+
+ALL example outputs, training data, sample data, and test fixtures MUST be fully anonymized before commit. Real client data, revenue figures, or internal metrics are **never** acceptable in any commit.
+
+| Data Type | Rule | Example |
+|-----------|------|---------|
+| Company names | Use fictional names | "Acme Corp", "TechStart Inc" |
+| Person names | Use fictional names | "Jane Smith", "John Doe" |
+| Email addresses | Use example.com domain | jane@example.com |
+| Phone numbers | Use 555-xxxx format | 555-0142 |
+| Dollar amounts | Use round fictional numbers | $50,000 |
+| API keys/tokens | Use obvious placeholders | `sk-your-key-here` |
+
+**Before every commit**, run the sanitizer:
+
+```bash
+python3 security/sanitizer.py --scan --dir . --recursive
+```
+
+The pre-commit hook will block commits with detected PII. See [security/README.md](./security/README.md) for setup.
+
+---
+
+## Skill Structure
+
+Every skill category requires these files:
+
+```
+skill-category/
+├── SKILL.md            # Claude Code skill definition (name, description, steps)
+├── README.md           # Overview, quick start, architecture, examples
+├── requirements.txt    # Python dependencies
+└── *.py                # Implementation scripts
+```
+
+- **SKILL.md** follows Claude Code skill conventions: name, description, numbered steps.
+- **README.md** includes: overview, quick start, architecture, examples, and the standard footer.
+- **Python scripts** use `argparse` for CLI, include clear API stubs with comments, and handle missing dependencies gracefully.
+
+---
+
+## Code Standards
+
+- **Python 3.10+**
+- Use `argparse` for all CLI interfaces with `--help` documentation
+- **Graceful failures:** Never crash on missing API keys. Show a helpful error message instead.
+- Type hints encouraged but not required
+- Prefer stdlib. No external dependencies without justification in your PR description.
+
+```python
+# Good
+if not os.environ.get("API_KEY"):
+    print("Error: Set API_KEY environment variable. Get one at https://...")
+    sys.exit(1)
+
+# Bad
+api_key = os.environ["API_KEY"]  # KeyError if missing
+```
+
+---
+
+## Telemetry Integration
+
+New skills **must** integrate telemetry logging. See [telemetry/README.md](./telemetry/README.md) for the integration guide.
+
+- Add a version check to your SKILL.md preamble
+- **Never log sensitive data through telemetry** (API keys, PII, client data)
+
+---
+
+## Pull Request Process
+
+1. **Fork** the repo
+2. **Branch** from `main` (use descriptive branch names: `feat/email-drip-skill`, `fix/sanitizer-regex`)
+3. **Build** your changes
+4. **Verify** before submitting:
+   - All Python files compile clean: `python3 -m py_compile your_file.py`
+   - Sanitizer scan passes: `python3 security/sanitizer.py --scan --dir . --recursive`
+5. **Open a PR** with a description that includes:
+   - What it does
+   - Which skill category it affects
+   - ✅ Confirmation that all data is anonymized
+
+---
+
+## Reporting Security Issues
+
+Found a vulnerability? **Do not open a public issue.**
+
+Email [security@singlegrain.com](mailto:security@singlegrain.com) with details. We'll respond within 48 hours.
+
+---
+
+<p align="center">
+  Built by <a href="https://www.singlegrain.com/?utm_source=github&utm_medium=repo&utm_campaign=ai-marketing-skills">Single Grain</a>. Powered by <a href="https://www.singlebrain.com/?utm_source=github&utm_medium=repo&utm_campaign=ai-marketing-skills">Single Brain</a>.
+</p>
--- a/README.md
+++ b/README.md
@ -137,15 +137,56 @@ ai-marketing-skills/

 ---

+## 🔒 Privacy & Security
+
+Every skill is built with data privacy in mind:
+
+- **PII Sanitizer** scans code and data for sensitive information before commits (`security/sanitizer.py`)
+- **Pre-commit hook** blocks commits containing detected PII patterns
+- **Configurable blocklists** for company names, person names, and custom patterns
+- See [`security/README.md`](./security/README.md) for setup
+
+```bash
+# Scan for sensitive data
+python3 security/sanitizer.py --scan --dir . --recursive
+
+# Install the pre-commit hook
+cp security/pre-commit-hook.sh .git/hooks/pre-commit && chmod +x .git/hooks/pre-commit
+```
+
+---
+
+## 📡 Telemetry (Opt-In)
+
+Anonymous usage telemetry helps us understand which skills people actually use. Fully opt-in, privacy-first:
+
+- **Local logging always** — see your own usage stats in `~/.ai-marketing-skills/analytics/`
+- **Remote reporting optional** — only if you explicitly opt in on first run
+- **Data collected:** skill name, duration, success/fail, version, OS. Nothing else. No code, no file paths, no repo content.
+- **Version checks** — get notified when new skills are available
+
+```bash
+# View your local usage stats
+python3 telemetry/telemetry_report.py
+
+# Check for updates
+python3 telemetry/version_check.py
+```
+
+See [`telemetry/README.md`](./telemetry/README.md) for details.
+
+---
+
 ## 🤝 Contributing

-Found a bug? Have an improvement? PRs welcome.
+Found a bug? Have an improvement? PRs welcome. Read [`CONTRIBUTING.md`](./CONTRIBUTING.md) for guidelines.

 1. Fork the repo
 2. Create your feature branch (`git checkout -b feature/better-scoring`)
-3. Commit your changes
-4. Push to the branch
-5. Open a Pull Request
+3. Run `python3 security/sanitizer.py --scan` before committing
+4. Commit your changes
+5. Push to the branch
+6. Open a Pull Request

 ---

--- a/1
+++ b/1
@ -0,0 +1 @@
+1.0.0
--- a/security/README.md
+++ b/security/README.md
@ -0,0 +1,80 @@
+# Security Sanitizer
+
+Scans and redacts PII / sensitive data from files in this repo.
+
+## What It Catches
+
+| Type | Examples |
+|------|----------|
+| **EMAIL** | user@example.com |
+| **PHONE** | (555) 123-4567, +1-555-123-4567 |
+| **SSN** | 123-45-6789 |
+| **API_KEY** | sk-xxx, ghp_xxx, op_xxx, Bearer tokens, KEY=value patterns |
+| **IP_ADDRESS** | Public IPv4 addresses (skips localhost/private ranges) |
+| **URL_CREDENTIALS** | https://user:pass@host.com |
+| **AMOUNT** | $1,234.56, $1.2M, $500K |
+| **COMPANY** | Names from blocklist (configurable) |
+| **PERSON** | Names from blocklist + title-prefixed names (Mr./Dr./etc.) |
+| **CUSTOM** | Any regex you add to the config |
+
+## Quick Start
+
+```bash
+# Scan the whole repo (dry run — changes nothing)
+python3 security/sanitizer.py --scan --dir . --recursive
+
+# Scan a single file
+python3 security/sanitizer.py --scan --file path/to/file.py
+
+# Redact PII in place
+python3 security/sanitizer.py --sanitize --file path/to/file.py
+
+# Redact everything recursively
+python3 security/sanitizer.py --sanitize --dir . --recursive
+```
+
+Exit codes: `0` = clean, `1` = PII found (useful for CI).
+
+## Configuration
+
+Edit `security/sanitizer-config.json`:
+
+```json
+{
+  "company_blocklist": ["Single Grain", "ClickFlow", "Nextiva"],
+  "person_blocklist": ["Jane Doe"],
+  "custom_patterns": [
+    "ACME-\\d{6}",
+    {"label": "PROJECT_ID", "pattern": "proj_[A-Za-z0-9]{12}"}
+  ],
+  "skip_paths": ["node_modules", ".git", "__pycache__", ".env.example"],
+  "placeholder_format": "bracket"
+}
+```
+
+- **company_blocklist** — company names to always redact
+- **person_blocklist** — person names to always redact
+- **custom_patterns** — additional regex (string or `{label, pattern}` object)
+- **skip_paths** — directory names to skip during recursive scan
+- **placeholder_format** — `"bracket"` for `[EMAIL]` or `"redacted"` for `[REDACTED]`
+
+## Pre-Commit Hook
+
+Install to block commits containing PII:
+
+```bash
+cp security/pre-commit-hook.sh .git/hooks/pre-commit
+chmod +x .git/hooks/pre-commit
+```
+
+The hook scans staged files and blocks the commit if anything is detected.
+
+To bypass in emergencies: `git commit --no-verify`
+
+## Supported File Types
+
+`.py`, `.md`, `.txt`, `.json`, `.yaml`, `.yml`, `.env`
+
+## No External Dependencies
+
+Uses only Python standard library (`re`, `json`, `os`, `sys`, `argparse`, `pathlib`).
--- a/security/pre-commit-hook.sh
+++ b/security/pre-commit-hook.sh
@ -0,0 +1,72 @@
+#!/usr/bin/env bash
+# Pre-commit hook: scan staged files for PII / sensitive data.
+#
+# Install:
+#   cp security/pre-commit-hook.sh .git/hooks/pre-commit
+#   chmod +x .git/hooks/pre-commit
+#
+# Bypass (emergency only):
+#   git commit --no-verify
+
+set -euo pipefail
+
+REPO_ROOT="$(git rev-parse --show-toplevel)"
+SANITIZER="$REPO_ROOT/security/sanitizer.py"
+
+if [ ! -f "$SANITIZER" ]; then
+    echo "⚠️  Sanitizer not found at $SANITIZER — skipping PII check."
+    exit 0
+fi
+
+# Get list of staged files
+STAGED_FILES=$(git diff --cached --name-only --diff-filter=ACM)
+
+if [ -z "$STAGED_FILES" ]; then
+    exit 0
+fi
+
+FOUND_PII=0
+TEMP_REPORT=$(mktemp)
+
+for FILE in $STAGED_FILES; do
+    FULL_PATH="$REPO_ROOT/$FILE"
+
+    # Only check supported extensions
+    case "$FILE" in
+        *.py|*.md|*.txt|*.json|*.yaml|*.yml|*.env)
+            ;;
+        *)
+            continue
+            ;;
+    esac
+
+    if [ ! -f "$FULL_PATH" ]; then
+        continue
+    fi
+
+    OUTPUT=$(python3 "$SANITIZER" --scan --file "$FULL_PATH" --quiet 2>&1) || true
+
+    if [ -n "$OUTPUT" ] && echo "$OUTPUT" | grep -q "issue"; then
+        echo "$FILE: $OUTPUT" >> "$TEMP_REPORT"
+        FOUND_PII=1
+    fi
+done
+
+if [ "$FOUND_PII" -eq 1 ]; then
+    echo ""
+    echo "🚫 COMMIT BLOCKED — PII / sensitive data detected in staged files:"
+    echo ""
+    cat "$TEMP_REPORT"
+    echo ""
+    echo "To fix:"
+    echo "  1. Run: python3 security/sanitizer.py --scan --dir . --recursive"
+    echo "  2. Review findings and redact manually, or run with --sanitize"
+    echo "  3. Stage the fixed files and commit again"
+    echo ""
+    echo "To bypass (emergency): git commit --no-verify"
+    rm -f "$TEMP_REPORT"
+    exit 1
+fi
+
+rm -f "$TEMP_REPORT"
+exit 0
--- a/security/sanitizer-config.json
+++ b/security/sanitizer-config.json
@ -0,0 +1,22 @@
+{
+  "company_blocklist": [
+    "ClickFlow",
+    "Nextiva"
+  ],
+  "person_blocklist": [],
+  "custom_patterns": [],
+  "skip_paths": [
+    "node_modules",
+    ".git",
+    "__pycache__",
+    ".env.example"
+  ],
+  "allow_patterns": [
+    "api_key=ANTHROPIC_API_KEY",
+    "api_key = get_",
+    "API_KEY=\"your-",
+    "0123456789"
+  ],
+  "placeholder_format": "bracket",
+  "_comment": "Single Grain removed from blocklist since this is our repo. Fork users should add their own company names."
+}
--- a/security/sanitizer.py
+++ b/security/sanitizer.py
@ -0,0 +1,457 @@
+#!/usr/bin/env python3
+"""
+PII / Sensitive Data Sanitizer
+Scans files for personally identifiable information and sensitive data.
+Can report findings (--scan) or redact them in place (--sanitize).
+
+Usage:
+    python3 security/sanitizer.py --scan --file path/to/file.py
+    python3 security/sanitizer.py --scan --dir . --recursive
+    python3 security/sanitizer.py --sanitize --file path/to/file.py
+"""
+
+import argparse
+import json
+import os
+import re
+import sys
+from pathlib import Path
+from collections import defaultdict
+
+# ---------------------------------------------------------------------------
+# Default configuration
+# ---------------------------------------------------------------------------
+
+SCRIPT_DIR = Path(__file__).resolve().parent
+DEFAULT_CONFIG_PATH = SCRIPT_DIR / "sanitizer-config.json"
+
+SUPPORTED_EXTENSIONS = {".py", ".md", ".txt", ".json", ".yaml", ".yml", ".env"}
+
+DEFAULT_SKIP_PATHS = {"node_modules", ".git", "__pycache__", ".env.example"}
+
+# ---------------------------------------------------------------------------
+# Detection patterns
+# ---------------------------------------------------------------------------
+
+# Each entry: (label, placeholder_bracket, placeholder_redacted, regex, flags)
+# Order matters – more specific patterns should come first.
+
+PATTERNS: list[tuple[str, str, str, str, int]] = [
+    # SSN (xxx-xx-xxxx)
+    (
+        "SSN",
+        "[SSN]",
+        "[REDACTED]",
+        r"\b\d{3}-\d{2}-\d{4}\b",
+        0,
+    ),
+    # API keys / tokens  (sk-..., ghp_..., op_..., Bearer ...)
+    (
+        "API_KEY",
+        "[API_KEY]",
+        "[REDACTED]",
+        r"(?i)\b(?:sk-[A-Za-z0-9_\-]{20,}|ghp_[A-Za-z0-9]{36,}|op_[A-Za-z0-9_\-]{20,}|gho_[A-Za-z0-9]{36,}|xox[bposarc]-[A-Za-z0-9\-]{10,})\b",
+        0,
+    ),
+    # Bearer tokens
+    (
+        "API_KEY",
+        "[API_KEY]",
+        "[REDACTED]",
+        r"(?i)Bearer\s+[A-Za-z0-9_\-\.]{20,}",
+        0,
+    ),
+    # Generic secret assignment patterns (API_KEY=..., SECRET=..., TOKEN=...)
+    (
+        "API_KEY",
+        "[API_KEY]",
+        "[REDACTED]",
+        r"""(?i)(?:api[_-]?key|secret[_-]?key|access[_-]?token|auth[_-]?token|private[_-]?key)\s*[=:]\s*["']?[A-Za-z0-9_\-\.\/\+]{16,}["']?""",
+        0,
+    ),
+    # URLs with embedded credentials  (https://user:pass@host)
+    (
+        "URL_CREDENTIALS",
+        "[URL_CREDENTIALS]",
+        "[REDACTED]",
+        r"https?://[^\s:]+:[^\s@]+@[^\s]+",
+        0,
+    ),
+    # Email addresses
+    (
+        "EMAIL",
+        "[EMAIL]",
+        "[REDACTED]",
+        r"\b[A-Za-z0-9._%+\-]+@[A-Za-z0-9.\-]+\.[A-Za-z]{2,}\b",
+        0,
+    ),
+    # IP addresses (IPv4) – skip 0.0.0.0, 127.0.0.1, 255.255.255.255 common dev IPs
+    (
+        "IP_ADDRESS",
+        "[IP_ADDRESS]",
+        "[REDACTED]",
+        r"\b(?!0\.0\.0\.0|127\.0\.0\.1|255\.255\.255\.255|192\.168\.\d{1,3}\.\d{1,3}|10\.\d{1,3}\.\d{1,3}\.\d{1,3}|172\.(?:1[6-9]|2\d|3[01])\.\d{1,3}\.\d{1,3})\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b",
+        0,
+    ),
+    # Phone numbers (US-style: +1, (xxx), xxx-xxx-xxxx, etc.)
+    (
+        "PHONE",
+        "[PHONE]",
+        "[REDACTED]",
+        r"(?<!\d)(?:\+?1[\s\-]?)?(?:\(\d{3}\)|\d{3})[\s\-]?\d{3}[\s\-]?\d{4}(?!\d)",
+        0,
+    ),
+    # Dollar amounts / revenue figures  ($1,234  $1,234.56  $1M  $1.2B)
+    (
+        "AMOUNT",
+        "[AMOUNT]",
+        "[REDACTED]",
+        r"\$\s?\d[\d,]*(?:\.\d{1,2})?(?:\s?[MBKmkb](?:illion|illion)?)?",
+        0,
+    ),
+]
+
+# Person-name heuristic: two or more capitalized words that look like names.
+# Kept intentionally conservative to reduce false positives.
+PERSON_NAME_PATTERN = re.compile(
+    r"\b(?:Mr\.|Mrs\.|Ms\.|Dr\.|Prof\.)\s+[A-Z][a-z]+(?:\s+[A-Z][a-z]+)+\b"
+)
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+
+def load_config(config_path: Path | None = None) -> dict:
+    path = config_path or DEFAULT_CONFIG_PATH
+    if path.exists():
+        with open(path, "r") as f:
+            return json.load(f)
+    return {}
+
+
+def should_skip_path(path: Path, skip_paths: set[str]) -> bool:
+    parts = path.parts
+    for skip in skip_paths:
+        if skip in parts:
+            return True
+    return False
+
+
+def is_import_line(line: str) -> bool:
+    stripped = line.strip()
+    return stripped.startswith(("import ", "from ", "#!", "# ", "//", "/*"))
+
+
+class Finding:
+    __slots__ = ("label", "match", "line_num", "line")
+
+    def __init__(self, label: str, match: str, line_num: int, line: str):
+        self.label = label
+        self.match = match
+        self.line_num = line_num
+        self.line = line
+
+    def __repr__(self) -> str:
+        return f"[{self.label}] line {self.line_num}: {self.match!r}"
+
+
+def scan_line(
+    line: str,
+    line_num: int,
+    compiled_patterns: list[tuple[str, str, str, re.Pattern]],
+    company_patterns: list[tuple[re.Pattern, str, str]],
+    person_patterns: list[tuple[re.Pattern, str, str]],
+    placeholder_format: str,
+) -> list[Finding]:
+    """Return findings for a single line."""
+    if is_import_line(line):
+        return []
+
+    findings: list[Finding] = []
+
+    for label, ph_bracket, ph_redacted, pat in compiled_patterns:
+        for m in pat.finditer(line):
+            findings.append(Finding(label, m.group(), line_num, line.rstrip()))
+
+    # Company blocklist
+    for cpat, ph_bracket, ph_redacted in company_patterns:
+        for m in cpat.finditer(line):
+            findings.append(Finding("COMPANY", m.group(), line_num, line.rstrip()))
+
+    # Person blocklist
+    for ppat, ph_bracket, ph_redacted in person_patterns:
+        for m in ppat.finditer(line):
+            findings.append(Finding("PERSON", m.group(), line_num, line.rstrip()))
+
+    # Person-name heuristic
+    for m in PERSON_NAME_PATTERN.finditer(line):
+        findings.append(Finding("PERSON", m.group(), line_num, line.rstrip()))
+
+    return findings
+
+
+def compile_patterns(
+    config: dict,
+) -> tuple[
+    list[tuple[str, str, str, re.Pattern]],
+    list[tuple[re.Pattern, str, str]],
+    list[tuple[re.Pattern, str, str]],
+]:
+    """Compile all regex patterns from defaults + config."""
+    placeholder_fmt = config.get("placeholder_format", "bracket")
+
+    compiled = []
+    for label, ph_b, ph_r, raw, flags in PATTERNS:
+        compiled.append((label, ph_b, ph_r, re.compile(raw, flags)))
+
+    # Custom patterns from config
+    for entry in config.get("custom_patterns", []):
+        if isinstance(entry, str):
+            compiled.append(
+                ("CUSTOM", "[CUSTOM]", "[REDACTED]", re.compile(entry))
+            )
+        elif isinstance(entry, dict):
+            compiled.append((
+                entry.get("label", "CUSTOM"),
+                f"[{entry.get('label', 'CUSTOM')}]",
+                "[REDACTED]",
+                re.compile(entry["pattern"]),
+            ))
+
+    # Company blocklist
+    company_patterns = []
+    for name in config.get("company_blocklist", []):
+        company_patterns.append((
+            re.compile(re.escape(name), re.IGNORECASE),
+            "[COMPANY]",
+            "[REDACTED]",
+        ))
+
+    # Person blocklist
+    person_patterns = []
+    for name in config.get("person_blocklist", []):
+        person_patterns.append((
+            re.compile(re.escape(name), re.IGNORECASE),
+            "[PERSON]",
+            "[REDACTED]",
+        ))
+
+    return compiled, company_patterns, person_patterns
+
+
+def get_placeholder(label: str, placeholder_format: str) -> str:
+    if placeholder_format == "redacted":
+        return "[REDACTED]"
+    return f"[{label}]"
+
+
+# ---------------------------------------------------------------------------
+# File processing
+# ---------------------------------------------------------------------------
+
+
+def scan_file(
+    filepath: Path,
+    compiled: list,
+    company_pats: list,
+    person_pats: list,
+    placeholder_format: str,
+) -> list[Finding]:
+    try:
+        text = filepath.read_text(errors="replace")
+    except (PermissionError, OSError) as e:
+        print(f"  ⚠️  Cannot read {filepath}: {e}", file=sys.stderr)
+        return []
+
+    findings = []
+    for i, line in enumerate(text.splitlines(), 1):
+        findings.extend(
+            scan_line(line, i, compiled, company_pats, person_pats, placeholder_format)
+        )
+    return findings
+
+
+def sanitize_file(
+    filepath: Path,
+    compiled: list,
+    company_pats: list,
+    person_pats: list,
+    placeholder_format: str,
+) -> list[Finding]:
+    """Scan and replace PII in-place. Returns findings for reporting."""
+    try:
+        text = filepath.read_text(errors="replace")
+    except (PermissionError, OSError) as e:
+        print(f"  ⚠️  Cannot read {filepath}: {e}", file=sys.stderr)
+        return []
+
+    findings = []
+    new_lines = []
+
+    for i, line in enumerate(text.splitlines(), 1):
+        line_findings = scan_line(
+            line, i, compiled, company_pats, person_pats, placeholder_format
+        )
+        findings.extend(line_findings)
+
+        if line_findings and not is_import_line(line):
+            # Replace matches (longest first to avoid partial replacements)
+            matches = sorted(
+                [(f.match, f.label) for f in line_findings],
+                key=lambda x: len(x[0]),
+                reverse=True,
+            )
+            for match_text, label in matches:
+                placeholder = get_placeholder(label, placeholder_format)
+                line = line.replace(match_text, placeholder)
+
+        new_lines.append(line)
+
+    if findings:
+        filepath.write_text("\n".join(new_lines) + ("\n" if text.endswith("\n") else ""))
+
+    return findings
+
+
+def collect_files(target: Path, recursive: bool, skip_paths: set[str]) -> list[Path]:
+    """Collect files with supported extensions."""
+    if target.is_file():
+        if target.suffix in SUPPORTED_EXTENSIONS:
+            return [target]
+        return []
+
+    files = []
+    if recursive:
+        for fp in target.rglob("*"):
+            if fp.is_file() and fp.suffix in SUPPORTED_EXTENSIONS and not should_skip_path(fp, skip_paths):
+                files.append(fp)
+    else:
+        for fp in target.iterdir():
+            if fp.is_file() and fp.suffix in SUPPORTED_EXTENSIONS and not should_skip_path(fp, skip_paths):
+                files.append(fp)
+
+    return sorted(files)
+
+
+# ---------------------------------------------------------------------------
+# Reporting
+# ---------------------------------------------------------------------------
+
+
+def print_report(
+    all_findings: dict[str, list[Finding]], mode: str
+) -> None:
+    total = sum(len(f) for f in all_findings.values())
+    files_affected = len(all_findings)
+
+    if total == 0:
+        print("\n✅ No PII or sensitive data found.")
+        return
+
+    action = "Found" if mode == "scan" else "Sanitized"
+    print(f"\n{'=' * 60}")
+    print(f"🔍 {action} {total} issue(s) across {files_affected} file(s)")
+    print(f"{'=' * 60}")
+
+    # Aggregate by type
+    by_type: dict[str, int] = defaultdict(int)
+    for filepath, findings in all_findings.items():
+        for f in findings:
+            by_type[f.label] += 1
+
+    print("\nBy type:")
+    for label, count in sorted(by_type.items(), key=lambda x: -x[1]):
+        print(f"  {label:20s} {count:>4d}")
+
+    print(f"\nBy file:")
+    for filepath, findings in sorted(all_findings.items()):
+        print(f"\n  📄 {filepath} ({len(findings)} finding(s))")
+        for f in findings:
+            print(f"     Line {f.line_num:>4d} [{f.label}]: {f.match}")
+
+    if mode == "scan":
+        print(f"\n💡 Run with --sanitize to redact these findings.")
+
+
+# ---------------------------------------------------------------------------
+# Main
+# ---------------------------------------------------------------------------
+
+
+def main() -> int:
+    parser = argparse.ArgumentParser(
+        description="Scan or sanitize files for PII and sensitive data.",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  %(prog)s --scan --file config.py
+  %(prog)s --scan --dir . --recursive
+  %(prog)s --sanitize --dir src/ --recursive
+  %(prog)s --scan --dir . --recursive --config security/sanitizer-config.json
+        """,
+    )
+
+    mode_group = parser.add_mutually_exclusive_group(required=True)
+    mode_group.add_argument("--scan", action="store_true", help="Report findings without modifying files")
+    mode_group.add_argument("--sanitize", action="store_true", help="Replace PII with safe placeholders")
+
+    target_group = parser.add_mutually_exclusive_group(required=True)
+    target_group.add_argument("--file", type=str, help="Scan a single file")
+    target_group.add_argument("--dir", type=str, help="Scan a directory")
+
+    parser.add_argument("--recursive", "-r", action="store_true", help="Recurse into subdirectories (with --dir)")
+    parser.add_argument("--config", type=str, help="Path to config JSON (default: security/sanitizer-config.json)")
+    parser.add_argument("--quiet", "-q", action="store_true", help="Only print summary, not individual findings")
+
+    args = parser.parse_args()
+
+    # Load config
+    config_path = Path(args.config) if args.config else None
+    config = load_config(config_path)
+
+    placeholder_format = config.get("placeholder_format", "bracket")
+    skip_paths = set(config.get("skip_paths", [])) | DEFAULT_SKIP_PATHS
+
+    compiled, company_pats, person_pats = compile_patterns(config)
+
+    # Determine target
+    if args.file:
+        target = Path(args.file)
+        if not target.exists():
+            print(f"❌ File not found: {target}", file=sys.stderr)
+            return 2
+    else:
+        target = Path(args.dir)
+        if not target.exists():
+            print(f"❌ Directory not found: {target}", file=sys.stderr)
+            return 2
+
+    files = collect_files(target, args.recursive, skip_paths)
+    if not files:
+        print("No supported files found.")
+        return 0
+
+    mode = "scan" if args.scan else "sanitize"
+    process_fn = scan_file if args.scan else sanitize_file
+
+    all_findings: dict[str, list[Finding]] = {}
+    for fp in files:
+        findings = process_fn(fp, compiled, company_pats, person_pats, placeholder_format)
+        if findings:
+            all_findings[str(fp)] = findings
+
+    if not args.quiet:
+        print_report(all_findings, mode)
+    else:
+        total = sum(len(f) for f in all_findings.values())
+        if total > 0:
+            action = "Found" if mode == "scan" else "Sanitized"
+            print(f"{action} {total} issue(s) in {len(all_findings)} file(s)")
+
+    # Exit code: 0 clean, 1 PII found
+    return 1 if all_findings else 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())
--- a/telemetry/README.md
+++ b/telemetry/README.md
@ -0,0 +1,94 @@
+# Telemetry
+
+Opt-in, local-first, privacy-respecting usage telemetry for AI Marketing Skills.
+
+## What's Collected
+
+When you opt in, the following **anonymous** data is sent:
+
+| Field | Example | Purpose |
+|-------|---------|---------|
+| Skill name | `growth-engine` | Know which skills are used |
+| Duration (ms) | `4500` | Track performance |
+| Success/fail | `true` | Track reliability |
+| Version | `1.0.0` | Know which versions are in use |
+| OS | `Darwin` | Platform compatibility |
+| Architecture | `arm64` | Platform compatibility |
+| Python version | `3.12` | Runtime compatibility |
+| Timestamp | `2026-03-31T12:00:00Z` | Usage patterns |
+| Device ID | `<random-uuid>` | Deduplicate (not tied to identity) |
+
+## What's NOT Collected — Ever
+
+- ❌ Code content
+- ❌ File paths
+- ❌ Repository names
+- ❌ Usernames or emails
+- ❌ Environment variables
+- ❌ API keys or secrets
+- ❌ Any content you're working on
+
+## How to Opt In or Out
+
+### First run (interactive)
+```bash
+python3 telemetry/telemetry_init.py
+```
+You'll be asked to choose. Your choice is saved.
+
+### Non-interactive
+```bash
+python3 telemetry/telemetry_init.py --yes   # Opt in
+python3 telemetry/telemetry_init.py --no    # Opt out
+```
+
+### Change your mind later
+Delete the config and re-run:
+```bash
+rm ~/.ai-marketing-skills/telemetry-config.json
+python3 telemetry/telemetry_init.py
+```
+
+## Local Data — Always Available
+
+**Regardless of opt-in**, all skill runs are logged locally so you can see your own usage:
+
+```
+~/.ai-marketing-skills/analytics/skill-usage.jsonl
+```
+
+This data never leaves your machine unless you opt in.
+
+## View Your Stats
+
+```bash
+python3 telemetry/telemetry_report.py
+```
+
+Shows: total runs, runs per skill, success rates, average durations, most used skill, and more.
+
+### Options
+```bash
+python3 telemetry/telemetry_report.py --json          # Machine-readable JSON
+python3 telemetry/telemetry_report.py --skill seo-bot  # Filter to one skill
+```
+
+## Check for Updates
+
+```bash
+python3 telemetry/version_check.py
+```
+
+- Compares your local version against the latest GitHub release
+- Silent when up to date
+- Caches the result for 24 hours to avoid excess API calls
+- Never blocks execution if offline
+
+## Privacy Commitment
+
+1. **Opt-in only** — nothing is sent without your explicit consent
+2. **Local-first** — your data is always stored locally for your own use
+3. **Minimal data** — only what's needed to improve the skills
+4. **No PII** — no names, emails, paths, or content
+5. **Transparent** — all telemetry code is right here, read it yourself
+6. **Revocable** — opt out any time, delete your config file
--- a/telemetry/telemetry_init.py
+++ b/telemetry/telemetry_init.py
@ -0,0 +1,94 @@
+#!/usr/bin/env python3
+"""First-run opt-in prompt for anonymous usage telemetry."""
+
+import argparse
+import json
+import os
+import sys
+import uuid
+from datetime import datetime, timezone
+from pathlib import Path
+
+CONFIG_DIR = Path.home() / ".ai-marketing-skills"
+CONFIG_FILE = CONFIG_DIR / "telemetry-config.json"
+
+
+def load_config():
+    """Load existing telemetry config, or return None if not found."""
+    if CONFIG_FILE.exists():
+        try:
+            with open(CONFIG_FILE, "r") as f:
+                return json.load(f)
+        except (json.JSONDecodeError, OSError):
+            return None
+    return None
+
+
+def save_config(opted_in: bool) -> dict:
+    """Save telemetry config and return it."""
+    CONFIG_DIR.mkdir(parents=True, exist_ok=True)
+    config = {
+        "opted_in": opted_in,
+        "device_id": str(uuid.uuid4()),
+        "created": datetime.now(timezone.utc).isoformat(),
+    }
+    with open(CONFIG_FILE, "w") as f:
+        json.dump(config, f, indent=2)
+    return config
+
+
+def prompt_user() -> bool:
+    """Interactive opt-in prompt. Returns True if user opts in."""
+    print(
+        "Would you like to opt into anonymous usage telemetry?\n"
+        "This helps us improve skills.\n"
+        "\n"
+        "Data collected: skill name, duration, success/fail, version, OS.\n"
+        "No code, file paths, or repo content is ever sent.\n"
+    )
+    while True:
+        answer = input("(y/n): ").strip().lower()
+        if answer in ("y", "yes"):
+            return True
+        if answer in ("n", "no"):
+            return False
+        print("Please enter y or n.")
+
+
+def init_telemetry(yes: bool = False, no: bool = False) -> dict:
+    """Initialize telemetry. Returns config dict.
+
+    Args:
+        yes: Non-interactive opt-in.
+        no: Non-interactive opt-out.
+    """
+    existing = load_config()
+    if existing is not None:
+        return existing
+
+    if yes:
+        opted_in = True
+    elif no:
+        opted_in = False
+    else:
+        opted_in = prompt_user()
+
+    config = save_config(opted_in)
+    status = "enabled" if opted_in else "disabled"
+    print(f"Telemetry {status}. Config saved to {CONFIG_FILE}")
+    return config
+
+
+def main():
+    parser = argparse.ArgumentParser(description="Initialize telemetry opt-in.")
+    group = parser.add_mutually_exclusive_group()
+    group.add_argument("--yes", action="store_true", help="Opt in non-interactively.")
+    group.add_argument("--no", action="store_true", help="Opt out non-interactively.")
+    args = parser.parse_args()
+
+    config = init_telemetry(yes=args.yes, no=args.no)
+    print(json.dumps(config, indent=2))
+
+
+if __name__ == "__main__":
+    main()
--- a/telemetry/telemetry_log.py
+++ b/telemetry/telemetry_log.py
@ -0,0 +1,118 @@
+#!/usr/bin/env python3
+"""Log a skill run event. Called by each skill's preamble.
+
+Usage:
+    python3 telemetry/telemetry_log.py --skill <name> --duration <ms> --success <true/false> --version <ver>
+
+Always logs locally. If opted in, also sends to analytics endpoint.
+Never logs: code content, file paths, repo names, usernames, environment variables.
+"""
+
+import argparse
+import json
+import os
+import platform
+import sys
+import urllib.request
+import urllib.error
+from datetime import datetime, timezone
+from pathlib import Path
+
+CONFIG_DIR = Path.home() / ".ai-marketing-skills"
+CONFIG_FILE = CONFIG_DIR / "telemetry-config.json"
+ANALYTICS_DIR = CONFIG_DIR / "analytics"
+USAGE_LOG = ANALYTICS_DIR / "skill-usage.jsonl"
+
+# Replace with your analytics endpoint
+ANALYTICS_ENDPOINT = "https://example.com/api/telemetry"  # no-op stub — Replace with your analytics endpoint
+
+
+def load_config() -> dict:
+    """Load telemetry config. Returns empty dict if not found."""
+    if CONFIG_FILE.exists():
+        try:
+            with open(CONFIG_FILE, "r") as f:
+                return json.load(f)
+        except (json.JSONDecodeError, OSError):
+            pass
+    return {}
+
+
+def python_version() -> str:
+    """Return major.minor Python version string."""
+    return f"{sys.version_info.major}.{sys.version_info.minor}"
+
+
+def build_entry(skill: str, duration_ms: int, success: bool, version: str, device_id: str) -> dict:
+    """Build a log entry. Only safe, anonymous fields."""
+    return {
+        "skill": skill,
+        "duration_ms": duration_ms,
+        "success": success,
+        "version": version,
+        "os": platform.system(),
+        "arch": platform.machine(),
+        "python": python_version(),
+        "timestamp": datetime.now(timezone.utc).isoformat(),
+        "device_id": device_id,
+    }
+
+
+def log_locally(entry: dict):
+    """Append entry to local JSONL log."""
+    ANALYTICS_DIR.mkdir(parents=True, exist_ok=True)
+    with open(USAGE_LOG, "a") as f:
+        f.write(json.dumps(entry) + "\n")
+
+
+def send_remote(entry: dict):
+    """Send entry to remote analytics endpoint. Fails silently."""
+    try:
+        data = json.dumps(entry).encode("utf-8")
+        req = urllib.request.Request(
+            ANALYTICS_ENDPOINT,
+            data=data,
+            headers={"Content-Type": "application/json"},
+            method="POST",
+        )
+        urllib.request.urlopen(req, timeout=5)
+    except Exception:
+        # Never block skill execution
+        pass
+
+
+def parse_bool(value: str) -> bool:
+    """Parse a boolean string."""
+    return value.lower() in ("true", "1", "yes")
+
+
+def main():
+    parser = argparse.ArgumentParser(description="Log a skill run event.")
+    parser.add_argument("--skill", required=True, help="Skill name.")
+    parser.add_argument("--duration", required=True, type=int, help="Duration in milliseconds.")
+    parser.add_argument("--success", required=True, help="true/false")
+    parser.add_argument("--version", required=True, help="Skill version.")
+    args = parser.parse_args()
+
+    config = load_config()
+    device_id = config.get("device_id", "unknown")
+    opted_in = config.get("opted_in", False)
+
+    entry = build_entry(
+        skill=args.skill,
+        duration_ms=args.duration,
+        success=parse_bool(args.success),
+        version=args.version,
+        device_id=device_id,
+    )
+
+    # Always log locally
+    log_locally(entry)
+
+    # Send remotely only if opted in
+    if opted_in:
+        send_remote(entry)
+
+
+if __name__ == "__main__":
+    main()
--- a/telemetry/telemetry_report.py
+++ b/telemetry/telemetry_report.py
@ -0,0 +1,177 @@
+#!/usr/bin/env python3
+"""Local stats viewer for skill usage data.
+
+Usage:
+    python3 telemetry/telemetry_report.py           # Full report
+    python3 telemetry/telemetry_report.py --json     # Machine-readable output
+    python3 telemetry/telemetry_report.py --skill X  # Filter to one skill
+"""
+
+import argparse
+import json
+import sys
+from collections import defaultdict
+from datetime import datetime, timezone, timedelta
+from pathlib import Path
+
+CONFIG_DIR = Path.home() / ".ai-marketing-skills"
+CONFIG_FILE = CONFIG_DIR / "telemetry-config.json"
+USAGE_LOG = CONFIG_DIR / "analytics" / "skill-usage.jsonl"
+
+
+def load_entries(skill_filter: str = None) -> list:
+    """Load all log entries, optionally filtered by skill."""
+    if not USAGE_LOG.exists():
+        return []
+    entries = []
+    with open(USAGE_LOG, "r") as f:
+        for line in f:
+            line = line.strip()
+            if not line:
+                continue
+            try:
+                entry = json.loads(line)
+                if skill_filter and entry.get("skill") != skill_filter:
+                    continue
+                entries.append(entry)
+            except json.JSONDecodeError:
+                continue
+    return entries
+
+
+def load_config() -> dict:
+    if CONFIG_FILE.exists():
+        try:
+            with open(CONFIG_FILE, "r") as f:
+                return json.load(f)
+        except (json.JSONDecodeError, OSError):
+            pass
+    return {}
+
+
+def parse_timestamp(ts: str) -> datetime:
+    """Parse ISO timestamp string."""
+    # Handle both formats with and without timezone
+    try:
+        return datetime.fromisoformat(ts)
+    except ValueError:
+        return datetime.fromisoformat(ts.replace("Z", "+00:00"))
+
+
+def generate_report(entries: list, config: dict) -> dict:
+    """Generate stats from entries."""
+    now = datetime.now(timezone.utc)
+    seven_days_ago = now - timedelta(days=7)
+    thirty_days_ago = now - timedelta(days=30)
+
+    total = len(entries)
+    last_7 = 0
+    last_30 = 0
+    skill_runs = defaultdict(int)
+    skill_successes = defaultdict(int)
+    skill_durations = defaultdict(list)
+    last_timestamp = None
+
+    for e in entries:
+        skill = e.get("skill", "unknown")
+        skill_runs[skill] += 1
+
+        if e.get("success"):
+            skill_successes[skill] += 1
+
+        duration = e.get("duration_ms")
+        if duration is not None:
+            skill_durations[skill].append(duration)
+
+        ts_str = e.get("timestamp")
+        if ts_str:
+            try:
+                ts = parse_timestamp(ts_str)
+                if ts.tzinfo is None:
+                    ts = ts.replace(tzinfo=timezone.utc)
+                if ts >= seven_days_ago:
+                    last_7 += 1
+                if ts >= thirty_days_ago:
+                    last_30 += 1
+                if last_timestamp is None or ts > last_timestamp:
+                    last_timestamp = ts
+            except (ValueError, TypeError):
+                pass
+
+    # Per-skill stats
+    per_skill = {}
+    for skill, count in sorted(skill_runs.items(), key=lambda x: -x[1]):
+        avg_dur = None
+        if skill_durations[skill]:
+            avg_dur = round(sum(skill_durations[skill]) / len(skill_durations[skill]), 1)
+        success_rate = round(skill_successes[skill] / count * 100, 1) if count > 0 else 0
+        per_skill[skill] = {
+            "runs": count,
+            "success_rate_pct": success_rate,
+            "avg_duration_ms": avg_dur,
+        }
+
+    most_used = max(skill_runs, key=skill_runs.get) if skill_runs else None
+
+    return {
+        "total_runs": total,
+        "last_7_days": last_7,
+        "last_30_days": last_30,
+        "most_used_skill": most_used,
+        "last_run": last_timestamp.isoformat() if last_timestamp else None,
+        "opted_in": config.get("opted_in", False),
+        "per_skill": per_skill,
+    }
+
+
+def print_report(report: dict):
+    """Pretty-print the report."""
+    print("=" * 50)
+    print("  AI Marketing Skills — Usage Report")
+    print("=" * 50)
+    print()
+    print(f"  Total runs (all time):  {report['total_runs']}")
+    print(f"  Last 7 days:            {report['last_7_days']}")
+    print(f"  Last 30 days:           {report['last_30_days']}")
+    print(f"  Most used skill:        {report['most_used_skill'] or 'N/A'}")
+    print(f"  Last run:               {report['last_run'] or 'N/A'}")
+    print(f"  Telemetry opt-in:       {'Yes' if report['opted_in'] else 'No'}")
+    print()
+
+    if report["per_skill"]:
+        print("  Per-Skill Breakdown:")
+        print("  " + "-" * 46)
+        print(f"  {'Skill':<25} {'Runs':>5} {'Success':>8} {'Avg ms':>8}")
+        print("  " + "-" * 46)
+        for skill, stats in report["per_skill"].items():
+            avg = f"{stats['avg_duration_ms']:.0f}" if stats["avg_duration_ms"] is not None else "N/A"
+            print(f"  {skill:<25} {stats['runs']:>5} {stats['success_rate_pct']:>7.1f}% {avg:>8}")
+    else:
+        print("  No usage data found.")
+    print()
+
+
+def main():
+    parser = argparse.ArgumentParser(description="View local skill usage stats.")
+    parser.add_argument("--json", action="store_true", help="Output as JSON.")
+    parser.add_argument("--skill", help="Filter to a specific skill.")
+    args = parser.parse_args()
+
+    config = load_config()
+    entries = load_entries(skill_filter=args.skill)
+
+    if not entries and not args.json:
+        print("No usage data found. Run some skills first!")
+        print(f"Data location: {USAGE_LOG}")
+        return
+
+    report = generate_report(entries, config)
+
+    if args.json:
+        print(json.dumps(report, indent=2))
+    else:
+        print_report(report)
+
+
+if __name__ == "__main__":
+    main()
--- a/telemetry/version_check.py
+++ b/telemetry/version_check.py
@ -0,0 +1,135 @@
+#!/usr/bin/env python3
+"""Check for updates against GitHub releases.
+
+Usage:
+    python3 telemetry/version_check.py
+
+Silent when up to date. Prints update notice if newer version available.
+Caches result for 24 hours. Gracefully handles offline/errors.
+"""
+
+import json
+import os
+import sys
+import urllib.request
+import urllib.error
+from datetime import datetime, timezone, timedelta
+from pathlib import Path
+
+REPO_ROOT = Path(__file__).resolve().parent.parent
+VERSION_FILE = REPO_ROOT / "VERSION"
+CACHE_DIR = Path.home() / ".ai-marketing-skills"
+CACHE_FILE = CACHE_DIR / "version-cache.json"
+GITHUB_API_URL = "https://api.github.com/repos/ericosiu/ai-marketing-skills/releases/latest"
+CACHE_TTL_HOURS = 24
+
+
+def read_local_version() -> str:
+    """Read version from local VERSION file."""
+    try:
+        return VERSION_FILE.read_text().strip()
+    except OSError:
+        return "0.0.0"
+
+
+def parse_semver(version: str) -> tuple:
+    """Parse semver string into comparable tuple. Strips leading 'v'."""
+    v = version.lstrip("v")
+    parts = v.split(".")
+    result = []
+    for p in parts:
+        try:
+            result.append(int(p))
+        except ValueError:
+            result.append(0)
+    while len(result) < 3:
+        result.append(0)
+    return tuple(result[:3])
+
+
+def load_cache() -> dict:
+    """Load cached version check result."""
+    if not CACHE_FILE.exists():
+        return {}
+    try:
+        with open(CACHE_FILE, "r") as f:
+            return json.load(f)
+    except (json.JSONDecodeError, OSError):
+        return {}
+
+
+def save_cache(latest_version: str):
+    """Save version check result to cache."""
+    CACHE_DIR.mkdir(parents=True, exist_ok=True)
+    cache = {
+        "latest_version": latest_version,
+        "checked_at": datetime.now(timezone.utc).isoformat(),
+    }
+    try:
+        with open(CACHE_FILE, "w") as f:
+            json.dump(cache, f, indent=2)
+    except OSError:
+        pass
+
+
+def cache_is_fresh() -> bool:
+    """Check if cache is less than CACHE_TTL_HOURS old."""
+    cache = load_cache()
+    checked_at = cache.get("checked_at")
+    if not checked_at:
+        return False
+    try:
+        ts = datetime.fromisoformat(checked_at)
+        if ts.tzinfo is None:
+            ts = ts.replace(tzinfo=timezone.utc)
+        return datetime.now(timezone.utc) - ts < timedelta(hours=CACHE_TTL_HOURS)
+    except (ValueError, TypeError):
+        return False
+
+
+def fetch_latest_version() -> str:
+    """Fetch latest version from GitHub API. Returns version string or None."""
+    try:
+        req = urllib.request.Request(
+            GITHUB_API_URL,
+            headers={"Accept": "application/vnd.github.v3+json", "User-Agent": "ai-marketing-skills"},
+        )
+        with urllib.request.urlopen(req, timeout=10) as resp:
+            data = json.loads(resp.read().decode("utf-8"))
+            return data.get("tag_name", "").lstrip("v")
+    except Exception:
+        return None
+
+
+def check_version():
+    """Main version check logic."""
+    local = read_local_version()
+
+    # Check cache first
+    cache = load_cache()
+    if cache_is_fresh() and cache.get("latest_version"):
+        latest = cache["latest_version"]
+    else:
+        latest = fetch_latest_version()
+        if latest is None:
+            # Offline or API error — silently exit
+            return
+        save_cache(latest)
+
+    local_parsed = parse_semver(local)
+    latest_parsed = parse_semver(latest)
+
+    if latest_parsed > local_parsed:
+        print(f"🆕 AI Marketing Skills v{latest} available (you have v{local}). Run `git pull` to update.")
+
+
+def main():
+    try:
+        check_version()
+    except Exception:
+        # Never block skill execution
+        pass
+
+
+if __name__ == "__main__":
+    main()