Add security sanitizer, opt-in telemetry, and contributor guidelines
Infrastructure: - security/: PII sanitizer with scan/sanitize modes, pre-commit hook, configurable blocklists - telemetry/: GStack-style opt-in usage analytics, local stats viewer, version checker - CONTRIBUTING.md: Privacy-first contributor guidelines with anonymization rules - VERSION: 1.0.0 README updated with Privacy & Security and Telemetry sections.
This commit is contained in:
parent
36d6ed83e7
commit
d4c8c21cb3
12 changed files with 1402 additions and 4 deletions
107
CONTRIBUTING.md
Normal file
107
CONTRIBUTING.md
Normal file
|
|
@ -0,0 +1,107 @@
|
||||||
|
# Contributing to AI Marketing Skills
|
||||||
|
|
||||||
|
AI Marketing Skills is an open-source collection of production marketing automation skills. Thanks for contributing.
|
||||||
|
|
||||||
|
- **Repo:** [github.com/singlegrain/ai-marketing-skills](https://github.com/singlegrain/ai-marketing-skills)
|
||||||
|
- **README:** [README.md](./README.md)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🔒 Data Privacy & Anonymization
|
||||||
|
|
||||||
|
**This is the #1 rule. No exceptions.**
|
||||||
|
|
||||||
|
ALL example outputs, training data, sample data, and test fixtures MUST be fully anonymized before commit. Real client data, revenue figures, or internal metrics are **never** acceptable in any commit.
|
||||||
|
|
||||||
|
| Data Type | Rule | Example |
|
||||||
|
|-----------|------|---------|
|
||||||
|
| Company names | Use fictional names | "Acme Corp", "TechStart Inc" |
|
||||||
|
| Person names | Use fictional names | "Jane Smith", "John Doe" |
|
||||||
|
| Email addresses | Use example.com domain | jane@example.com |
|
||||||
|
| Phone numbers | Use 555-xxxx format | 555-0142 |
|
||||||
|
| Dollar amounts | Use round fictional numbers | $50,000 |
|
||||||
|
| API keys/tokens | Use obvious placeholders | `sk-your-key-here` |
|
||||||
|
|
||||||
|
**Before every commit**, run the sanitizer:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python3 security/sanitizer.py --scan --dir . --recursive
|
||||||
|
```
|
||||||
|
|
||||||
|
The pre-commit hook will block commits with detected PII. See [security/README.md](./security/README.md) for setup.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Skill Structure
|
||||||
|
|
||||||
|
Every skill category requires these files:
|
||||||
|
|
||||||
|
```
|
||||||
|
skill-category/
|
||||||
|
├── SKILL.md # Claude Code skill definition (name, description, steps)
|
||||||
|
├── README.md # Overview, quick start, architecture, examples
|
||||||
|
├── requirements.txt # Python dependencies
|
||||||
|
└── *.py # Implementation scripts
|
||||||
|
```
|
||||||
|
|
||||||
|
- **SKILL.md** follows Claude Code skill conventions: name, description, numbered steps.
|
||||||
|
- **README.md** includes: overview, quick start, architecture, examples, and the standard footer.
|
||||||
|
- **Python scripts** use `argparse` for CLI, include clear API stubs with comments, and handle missing dependencies gracefully.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Code Standards
|
||||||
|
|
||||||
|
- **Python 3.10+**
|
||||||
|
- Use `argparse` for all CLI interfaces with `--help` documentation
|
||||||
|
- **Graceful failures:** Never crash on missing API keys. Show a helpful error message instead.
|
||||||
|
- Type hints encouraged but not required
|
||||||
|
- Prefer stdlib. No external dependencies without justification in your PR description.
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Good
|
||||||
|
if not os.environ.get("API_KEY"):
|
||||||
|
print("Error: Set API_KEY environment variable. Get one at https://...")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
# Bad
|
||||||
|
api_key = os.environ["API_KEY"] # KeyError if missing
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Telemetry Integration
|
||||||
|
|
||||||
|
New skills **must** integrate telemetry logging. See [telemetry/README.md](./telemetry/README.md) for the integration guide.
|
||||||
|
|
||||||
|
- Add a version check to your SKILL.md preamble
|
||||||
|
- **Never log sensitive data through telemetry** (API keys, PII, client data)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Pull Request Process
|
||||||
|
|
||||||
|
1. **Fork** the repo
|
||||||
|
2. **Branch** from `main` (use descriptive branch names: `feat/email-drip-skill`, `fix/sanitizer-regex`)
|
||||||
|
3. **Build** your changes
|
||||||
|
4. **Verify** before submitting:
|
||||||
|
- All Python files compile clean: `python3 -m py_compile your_file.py`
|
||||||
|
- Sanitizer scan passes: `python3 security/sanitizer.py --scan --dir . --recursive`
|
||||||
|
5. **Open a PR** with a description that includes:
|
||||||
|
- What it does
|
||||||
|
- Which skill category it affects
|
||||||
|
- ✅ Confirmation that all data is anonymized
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Reporting Security Issues
|
||||||
|
|
||||||
|
Found a vulnerability? **Do not open a public issue.**
|
||||||
|
|
||||||
|
Email [security@singlegrain.com](mailto:security@singlegrain.com) with details. We'll respond within 48 hours.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
Built by <a href="https://www.singlegrain.com/?utm_source=github&utm_medium=repo&utm_campaign=ai-marketing-skills">Single Grain</a>. Powered by <a href="https://www.singlebrain.com/?utm_source=github&utm_medium=repo&utm_campaign=ai-marketing-skills">Single Brain</a>.
|
||||||
|
</p>
|
||||||
49
README.md
49
README.md
|
|
@ -137,15 +137,56 @@ ai-marketing-skills/
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
## 🔒 Privacy & Security
|
||||||
|
|
||||||
|
Every skill is built with data privacy in mind:
|
||||||
|
|
||||||
|
- **PII Sanitizer** scans code and data for sensitive information before commits (`security/sanitizer.py`)
|
||||||
|
- **Pre-commit hook** blocks commits containing detected PII patterns
|
||||||
|
- **Configurable blocklists** for company names, person names, and custom patterns
|
||||||
|
- See [`security/README.md`](./security/README.md) for setup
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Scan for sensitive data
|
||||||
|
python3 security/sanitizer.py --scan --dir . --recursive
|
||||||
|
|
||||||
|
# Install the pre-commit hook
|
||||||
|
cp security/pre-commit-hook.sh .git/hooks/pre-commit && chmod +x .git/hooks/pre-commit
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📡 Telemetry (Opt-In)
|
||||||
|
|
||||||
|
Anonymous usage telemetry helps us understand which skills people actually use. Fully opt-in, privacy-first:
|
||||||
|
|
||||||
|
- **Local logging always** — see your own usage stats in `~/.ai-marketing-skills/analytics/`
|
||||||
|
- **Remote reporting optional** — only if you explicitly opt in on first run
|
||||||
|
- **Data collected:** skill name, duration, success/fail, version, OS. Nothing else. No code, no file paths, no repo content.
|
||||||
|
- **Version checks** — get notified when new skills are available
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# View your local usage stats
|
||||||
|
python3 telemetry/telemetry_report.py
|
||||||
|
|
||||||
|
# Check for updates
|
||||||
|
python3 telemetry/version_check.py
|
||||||
|
```
|
||||||
|
|
||||||
|
See [`telemetry/README.md`](./telemetry/README.md) for details.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## 🤝 Contributing
|
## 🤝 Contributing
|
||||||
|
|
||||||
Found a bug? Have an improvement? PRs welcome.
|
Found a bug? Have an improvement? PRs welcome. Read [`CONTRIBUTING.md`](./CONTRIBUTING.md) for guidelines.
|
||||||
|
|
||||||
1. Fork the repo
|
1. Fork the repo
|
||||||
2. Create your feature branch (`git checkout -b feature/better-scoring`)
|
2. Create your feature branch (`git checkout -b feature/better-scoring`)
|
||||||
3. Commit your changes
|
3. Run `python3 security/sanitizer.py --scan` before committing
|
||||||
4. Push to the branch
|
4. Commit your changes
|
||||||
5. Open a Pull Request
|
5. Push to the branch
|
||||||
|
6. Open a Pull Request
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
|
||||||
1
VERSION
Normal file
1
VERSION
Normal file
|
|
@ -0,0 +1 @@
|
||||||
|
1.0.0
|
||||||
80
security/README.md
Normal file
80
security/README.md
Normal file
|
|
@ -0,0 +1,80 @@
|
||||||
|
# Security Sanitizer
|
||||||
|
|
||||||
|
Scans and redacts PII / sensitive data from files in this repo.
|
||||||
|
|
||||||
|
## What It Catches
|
||||||
|
|
||||||
|
| Type | Examples |
|
||||||
|
|------|----------|
|
||||||
|
| **EMAIL** | user@example.com |
|
||||||
|
| **PHONE** | (555) 123-4567, +1-555-123-4567 |
|
||||||
|
| **SSN** | 123-45-6789 |
|
||||||
|
| **API_KEY** | sk-xxx, ghp_xxx, op_xxx, Bearer tokens, KEY=value patterns |
|
||||||
|
| **IP_ADDRESS** | Public IPv4 addresses (skips localhost/private ranges) |
|
||||||
|
| **URL_CREDENTIALS** | https://user:pass@host.com |
|
||||||
|
| **AMOUNT** | $1,234.56, $1.2M, $500K |
|
||||||
|
| **COMPANY** | Names from blocklist (configurable) |
|
||||||
|
| **PERSON** | Names from blocklist + title-prefixed names (Mr./Dr./etc.) |
|
||||||
|
| **CUSTOM** | Any regex you add to the config |
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Scan the whole repo (dry run — changes nothing)
|
||||||
|
python3 security/sanitizer.py --scan --dir . --recursive
|
||||||
|
|
||||||
|
# Scan a single file
|
||||||
|
python3 security/sanitizer.py --scan --file path/to/file.py
|
||||||
|
|
||||||
|
# Redact PII in place
|
||||||
|
python3 security/sanitizer.py --sanitize --file path/to/file.py
|
||||||
|
|
||||||
|
# Redact everything recursively
|
||||||
|
python3 security/sanitizer.py --sanitize --dir . --recursive
|
||||||
|
```
|
||||||
|
|
||||||
|
Exit codes: `0` = clean, `1` = PII found (useful for CI).
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
Edit `security/sanitizer-config.json`:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"company_blocklist": ["Single Grain", "ClickFlow", "Nextiva"],
|
||||||
|
"person_blocklist": ["Jane Doe"],
|
||||||
|
"custom_patterns": [
|
||||||
|
"ACME-\\d{6}",
|
||||||
|
{"label": "PROJECT_ID", "pattern": "proj_[A-Za-z0-9]{12}"}
|
||||||
|
],
|
||||||
|
"skip_paths": ["node_modules", ".git", "__pycache__", ".env.example"],
|
||||||
|
"placeholder_format": "bracket"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
- **company_blocklist** — company names to always redact
|
||||||
|
- **person_blocklist** — person names to always redact
|
||||||
|
- **custom_patterns** — additional regex (string or `{label, pattern}` object)
|
||||||
|
- **skip_paths** — directory names to skip during recursive scan
|
||||||
|
- **placeholder_format** — `"bracket"` for `[EMAIL]` or `"redacted"` for `[REDACTED]`
|
||||||
|
|
||||||
|
## Pre-Commit Hook
|
||||||
|
|
||||||
|
Install to block commits containing PII:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cp security/pre-commit-hook.sh .git/hooks/pre-commit
|
||||||
|
chmod +x .git/hooks/pre-commit
|
||||||
|
```
|
||||||
|
|
||||||
|
The hook scans staged files and blocks the commit if anything is detected.
|
||||||
|
|
||||||
|
To bypass in emergencies: `git commit --no-verify`
|
||||||
|
|
||||||
|
## Supported File Types
|
||||||
|
|
||||||
|
`.py`, `.md`, `.txt`, `.json`, `.yaml`, `.yml`, `.env`
|
||||||
|
|
||||||
|
## No External Dependencies
|
||||||
|
|
||||||
|
Uses only Python standard library (`re`, `json`, `os`, `sys`, `argparse`, `pathlib`).
|
||||||
72
security/pre-commit-hook.sh
Executable file
72
security/pre-commit-hook.sh
Executable file
|
|
@ -0,0 +1,72 @@
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
# Pre-commit hook: scan staged files for PII / sensitive data.
|
||||||
|
#
|
||||||
|
# Install:
|
||||||
|
# cp security/pre-commit-hook.sh .git/hooks/pre-commit
|
||||||
|
# chmod +x .git/hooks/pre-commit
|
||||||
|
#
|
||||||
|
# Bypass (emergency only):
|
||||||
|
# git commit --no-verify
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
REPO_ROOT="$(git rev-parse --show-toplevel)"
|
||||||
|
SANITIZER="$REPO_ROOT/security/sanitizer.py"
|
||||||
|
|
||||||
|
if [ ! -f "$SANITIZER" ]; then
|
||||||
|
echo "⚠️ Sanitizer not found at $SANITIZER — skipping PII check."
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Get list of staged files
|
||||||
|
STAGED_FILES=$(git diff --cached --name-only --diff-filter=ACM)
|
||||||
|
|
||||||
|
if [ -z "$STAGED_FILES" ]; then
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
FOUND_PII=0
|
||||||
|
TEMP_REPORT=$(mktemp)
|
||||||
|
|
||||||
|
for FILE in $STAGED_FILES; do
|
||||||
|
FULL_PATH="$REPO_ROOT/$FILE"
|
||||||
|
|
||||||
|
# Only check supported extensions
|
||||||
|
case "$FILE" in
|
||||||
|
*.py|*.md|*.txt|*.json|*.yaml|*.yml|*.env)
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
continue
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
|
||||||
|
if [ ! -f "$FULL_PATH" ]; then
|
||||||
|
continue
|
||||||
|
fi
|
||||||
|
|
||||||
|
OUTPUT=$(python3 "$SANITIZER" --scan --file "$FULL_PATH" --quiet 2>&1) || true
|
||||||
|
|
||||||
|
if [ -n "$OUTPUT" ] && echo "$OUTPUT" | grep -q "issue"; then
|
||||||
|
echo "$FILE: $OUTPUT" >> "$TEMP_REPORT"
|
||||||
|
FOUND_PII=1
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
|
||||||
|
if [ "$FOUND_PII" -eq 1 ]; then
|
||||||
|
echo ""
|
||||||
|
echo "🚫 COMMIT BLOCKED — PII / sensitive data detected in staged files:"
|
||||||
|
echo ""
|
||||||
|
cat "$TEMP_REPORT"
|
||||||
|
echo ""
|
||||||
|
echo "To fix:"
|
||||||
|
echo " 1. Run: python3 security/sanitizer.py --scan --dir . --recursive"
|
||||||
|
echo " 2. Review findings and redact manually, or run with --sanitize"
|
||||||
|
echo " 3. Stage the fixed files and commit again"
|
||||||
|
echo ""
|
||||||
|
echo "To bypass (emergency): git commit --no-verify"
|
||||||
|
rm -f "$TEMP_REPORT"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
rm -f "$TEMP_REPORT"
|
||||||
|
exit 0
|
||||||
22
security/sanitizer-config.json
Normal file
22
security/sanitizer-config.json
Normal file
|
|
@ -0,0 +1,22 @@
|
||||||
|
{
|
||||||
|
"company_blocklist": [
|
||||||
|
"ClickFlow",
|
||||||
|
"Nextiva"
|
||||||
|
],
|
||||||
|
"person_blocklist": [],
|
||||||
|
"custom_patterns": [],
|
||||||
|
"skip_paths": [
|
||||||
|
"node_modules",
|
||||||
|
".git",
|
||||||
|
"__pycache__",
|
||||||
|
".env.example"
|
||||||
|
],
|
||||||
|
"allow_patterns": [
|
||||||
|
"api_key=ANTHROPIC_API_KEY",
|
||||||
|
"api_key = get_",
|
||||||
|
"API_KEY=\"your-",
|
||||||
|
"0123456789"
|
||||||
|
],
|
||||||
|
"placeholder_format": "bracket",
|
||||||
|
"_comment": "Single Grain removed from blocklist since this is our repo. Fork users should add their own company names."
|
||||||
|
}
|
||||||
457
security/sanitizer.py
Normal file
457
security/sanitizer.py
Normal file
|
|
@ -0,0 +1,457 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
PII / Sensitive Data Sanitizer
|
||||||
|
Scans files for personally identifiable information and sensitive data.
|
||||||
|
Can report findings (--scan) or redact them in place (--sanitize).
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python3 security/sanitizer.py --scan --file path/to/file.py
|
||||||
|
python3 security/sanitizer.py --scan --dir . --recursive
|
||||||
|
python3 security/sanitizer.py --sanitize --file path/to/file.py
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
from collections import defaultdict
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Default configuration
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
SCRIPT_DIR = Path(__file__).resolve().parent
|
||||||
|
DEFAULT_CONFIG_PATH = SCRIPT_DIR / "sanitizer-config.json"
|
||||||
|
|
||||||
|
SUPPORTED_EXTENSIONS = {".py", ".md", ".txt", ".json", ".yaml", ".yml", ".env"}
|
||||||
|
|
||||||
|
DEFAULT_SKIP_PATHS = {"node_modules", ".git", "__pycache__", ".env.example"}
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Detection patterns
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
# Each entry: (label, placeholder_bracket, placeholder_redacted, regex, flags)
|
||||||
|
# Order matters – more specific patterns should come first.
|
||||||
|
|
||||||
|
PATTERNS: list[tuple[str, str, str, str, int]] = [
|
||||||
|
# SSN (xxx-xx-xxxx)
|
||||||
|
(
|
||||||
|
"SSN",
|
||||||
|
"[SSN]",
|
||||||
|
"[REDACTED]",
|
||||||
|
r"\b\d{3}-\d{2}-\d{4}\b",
|
||||||
|
0,
|
||||||
|
),
|
||||||
|
# API keys / tokens (sk-..., ghp_..., op_..., Bearer ...)
|
||||||
|
(
|
||||||
|
"API_KEY",
|
||||||
|
"[API_KEY]",
|
||||||
|
"[REDACTED]",
|
||||||
|
r"(?i)\b(?:sk-[A-Za-z0-9_\-]{20,}|ghp_[A-Za-z0-9]{36,}|op_[A-Za-z0-9_\-]{20,}|gho_[A-Za-z0-9]{36,}|xox[bposarc]-[A-Za-z0-9\-]{10,})\b",
|
||||||
|
0,
|
||||||
|
),
|
||||||
|
# Bearer tokens
|
||||||
|
(
|
||||||
|
"API_KEY",
|
||||||
|
"[API_KEY]",
|
||||||
|
"[REDACTED]",
|
||||||
|
r"(?i)Bearer\s+[A-Za-z0-9_\-\.]{20,}",
|
||||||
|
0,
|
||||||
|
),
|
||||||
|
# Generic secret assignment patterns (API_KEY=..., SECRET=..., TOKEN=...)
|
||||||
|
(
|
||||||
|
"API_KEY",
|
||||||
|
"[API_KEY]",
|
||||||
|
"[REDACTED]",
|
||||||
|
r"""(?i)(?:api[_-]?key|secret[_-]?key|access[_-]?token|auth[_-]?token|private[_-]?key)\s*[=:]\s*["']?[A-Za-z0-9_\-\.\/\+]{16,}["']?""",
|
||||||
|
0,
|
||||||
|
),
|
||||||
|
# URLs with embedded credentials (https://user:pass@host)
|
||||||
|
(
|
||||||
|
"URL_CREDENTIALS",
|
||||||
|
"[URL_CREDENTIALS]",
|
||||||
|
"[REDACTED]",
|
||||||
|
r"https?://[^\s:]+:[^\s@]+@[^\s]+",
|
||||||
|
0,
|
||||||
|
),
|
||||||
|
# Email addresses
|
||||||
|
(
|
||||||
|
"EMAIL",
|
||||||
|
"[EMAIL]",
|
||||||
|
"[REDACTED]",
|
||||||
|
r"\b[A-Za-z0-9._%+\-]+@[A-Za-z0-9.\-]+\.[A-Za-z]{2,}\b",
|
||||||
|
0,
|
||||||
|
),
|
||||||
|
# IP addresses (IPv4) – skip 0.0.0.0, 127.0.0.1, 255.255.255.255 common dev IPs
|
||||||
|
(
|
||||||
|
"IP_ADDRESS",
|
||||||
|
"[IP_ADDRESS]",
|
||||||
|
"[REDACTED]",
|
||||||
|
r"\b(?!0\.0\.0\.0|127\.0\.0\.1|255\.255\.255\.255|192\.168\.\d{1,3}\.\d{1,3}|10\.\d{1,3}\.\d{1,3}\.\d{1,3}|172\.(?:1[6-9]|2\d|3[01])\.\d{1,3}\.\d{1,3})\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b",
|
||||||
|
0,
|
||||||
|
),
|
||||||
|
# Phone numbers (US-style: +1, (xxx), xxx-xxx-xxxx, etc.)
|
||||||
|
(
|
||||||
|
"PHONE",
|
||||||
|
"[PHONE]",
|
||||||
|
"[REDACTED]",
|
||||||
|
r"(?<!\d)(?:\+?1[\s\-]?)?(?:\(\d{3}\)|\d{3})[\s\-]?\d{3}[\s\-]?\d{4}(?!\d)",
|
||||||
|
0,
|
||||||
|
),
|
||||||
|
# Dollar amounts / revenue figures ($1,234 $1,234.56 $1M $1.2B)
|
||||||
|
(
|
||||||
|
"AMOUNT",
|
||||||
|
"[AMOUNT]",
|
||||||
|
"[REDACTED]",
|
||||||
|
r"\$\s?\d[\d,]*(?:\.\d{1,2})?(?:\s?[MBKmkb](?:illion|illion)?)?",
|
||||||
|
0,
|
||||||
|
),
|
||||||
|
]
|
||||||
|
|
||||||
|
# Person-name heuristic: two or more capitalized words that look like names.
|
||||||
|
# Kept intentionally conservative to reduce false positives.
|
||||||
|
PERSON_NAME_PATTERN = re.compile(
|
||||||
|
r"\b(?:Mr\.|Mrs\.|Ms\.|Dr\.|Prof\.)\s+[A-Z][a-z]+(?:\s+[A-Z][a-z]+)+\b"
|
||||||
|
)
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Helpers
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def load_config(config_path: Path | None = None) -> dict:
|
||||||
|
path = config_path or DEFAULT_CONFIG_PATH
|
||||||
|
if path.exists():
|
||||||
|
with open(path, "r") as f:
|
||||||
|
return json.load(f)
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
def should_skip_path(path: Path, skip_paths: set[str]) -> bool:
|
||||||
|
parts = path.parts
|
||||||
|
for skip in skip_paths:
|
||||||
|
if skip in parts:
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def is_import_line(line: str) -> bool:
|
||||||
|
stripped = line.strip()
|
||||||
|
return stripped.startswith(("import ", "from ", "#!", "# ", "//", "/*"))
|
||||||
|
|
||||||
|
|
||||||
|
class Finding:
|
||||||
|
__slots__ = ("label", "match", "line_num", "line")
|
||||||
|
|
||||||
|
def __init__(self, label: str, match: str, line_num: int, line: str):
|
||||||
|
self.label = label
|
||||||
|
self.match = match
|
||||||
|
self.line_num = line_num
|
||||||
|
self.line = line
|
||||||
|
|
||||||
|
def __repr__(self) -> str:
|
||||||
|
return f"[{self.label}] line {self.line_num}: {self.match!r}"
|
||||||
|
|
||||||
|
|
||||||
|
def scan_line(
|
||||||
|
line: str,
|
||||||
|
line_num: int,
|
||||||
|
compiled_patterns: list[tuple[str, str, str, re.Pattern]],
|
||||||
|
company_patterns: list[tuple[re.Pattern, str, str]],
|
||||||
|
person_patterns: list[tuple[re.Pattern, str, str]],
|
||||||
|
placeholder_format: str,
|
||||||
|
) -> list[Finding]:
|
||||||
|
"""Return findings for a single line."""
|
||||||
|
if is_import_line(line):
|
||||||
|
return []
|
||||||
|
|
||||||
|
findings: list[Finding] = []
|
||||||
|
|
||||||
|
for label, ph_bracket, ph_redacted, pat in compiled_patterns:
|
||||||
|
for m in pat.finditer(line):
|
||||||
|
findings.append(Finding(label, m.group(), line_num, line.rstrip()))
|
||||||
|
|
||||||
|
# Company blocklist
|
||||||
|
for cpat, ph_bracket, ph_redacted in company_patterns:
|
||||||
|
for m in cpat.finditer(line):
|
||||||
|
findings.append(Finding("COMPANY", m.group(), line_num, line.rstrip()))
|
||||||
|
|
||||||
|
# Person blocklist
|
||||||
|
for ppat, ph_bracket, ph_redacted in person_patterns:
|
||||||
|
for m in ppat.finditer(line):
|
||||||
|
findings.append(Finding("PERSON", m.group(), line_num, line.rstrip()))
|
||||||
|
|
||||||
|
# Person-name heuristic
|
||||||
|
for m in PERSON_NAME_PATTERN.finditer(line):
|
||||||
|
findings.append(Finding("PERSON", m.group(), line_num, line.rstrip()))
|
||||||
|
|
||||||
|
return findings
|
||||||
|
|
||||||
|
|
||||||
|
def compile_patterns(
|
||||||
|
config: dict,
|
||||||
|
) -> tuple[
|
||||||
|
list[tuple[str, str, str, re.Pattern]],
|
||||||
|
list[tuple[re.Pattern, str, str]],
|
||||||
|
list[tuple[re.Pattern, str, str]],
|
||||||
|
]:
|
||||||
|
"""Compile all regex patterns from defaults + config."""
|
||||||
|
placeholder_fmt = config.get("placeholder_format", "bracket")
|
||||||
|
|
||||||
|
compiled = []
|
||||||
|
for label, ph_b, ph_r, raw, flags in PATTERNS:
|
||||||
|
compiled.append((label, ph_b, ph_r, re.compile(raw, flags)))
|
||||||
|
|
||||||
|
# Custom patterns from config
|
||||||
|
for entry in config.get("custom_patterns", []):
|
||||||
|
if isinstance(entry, str):
|
||||||
|
compiled.append(
|
||||||
|
("CUSTOM", "[CUSTOM]", "[REDACTED]", re.compile(entry))
|
||||||
|
)
|
||||||
|
elif isinstance(entry, dict):
|
||||||
|
compiled.append((
|
||||||
|
entry.get("label", "CUSTOM"),
|
||||||
|
f"[{entry.get('label', 'CUSTOM')}]",
|
||||||
|
"[REDACTED]",
|
||||||
|
re.compile(entry["pattern"]),
|
||||||
|
))
|
||||||
|
|
||||||
|
# Company blocklist
|
||||||
|
company_patterns = []
|
||||||
|
for name in config.get("company_blocklist", []):
|
||||||
|
company_patterns.append((
|
||||||
|
re.compile(re.escape(name), re.IGNORECASE),
|
||||||
|
"[COMPANY]",
|
||||||
|
"[REDACTED]",
|
||||||
|
))
|
||||||
|
|
||||||
|
# Person blocklist
|
||||||
|
person_patterns = []
|
||||||
|
for name in config.get("person_blocklist", []):
|
||||||
|
person_patterns.append((
|
||||||
|
re.compile(re.escape(name), re.IGNORECASE),
|
||||||
|
"[PERSON]",
|
||||||
|
"[REDACTED]",
|
||||||
|
))
|
||||||
|
|
||||||
|
return compiled, company_patterns, person_patterns
|
||||||
|
|
||||||
|
|
||||||
|
def get_placeholder(label: str, placeholder_format: str) -> str:
|
||||||
|
if placeholder_format == "redacted":
|
||||||
|
return "[REDACTED]"
|
||||||
|
return f"[{label}]"
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# File processing
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def scan_file(
|
||||||
|
filepath: Path,
|
||||||
|
compiled: list,
|
||||||
|
company_pats: list,
|
||||||
|
person_pats: list,
|
||||||
|
placeholder_format: str,
|
||||||
|
) -> list[Finding]:
|
||||||
|
try:
|
||||||
|
text = filepath.read_text(errors="replace")
|
||||||
|
except (PermissionError, OSError) as e:
|
||||||
|
print(f" ⚠️ Cannot read {filepath}: {e}", file=sys.stderr)
|
||||||
|
return []
|
||||||
|
|
||||||
|
findings = []
|
||||||
|
for i, line in enumerate(text.splitlines(), 1):
|
||||||
|
findings.extend(
|
||||||
|
scan_line(line, i, compiled, company_pats, person_pats, placeholder_format)
|
||||||
|
)
|
||||||
|
return findings
|
||||||
|
|
||||||
|
|
||||||
|
def sanitize_file(
|
||||||
|
filepath: Path,
|
||||||
|
compiled: list,
|
||||||
|
company_pats: list,
|
||||||
|
person_pats: list,
|
||||||
|
placeholder_format: str,
|
||||||
|
) -> list[Finding]:
|
||||||
|
"""Scan and replace PII in-place. Returns findings for reporting."""
|
||||||
|
try:
|
||||||
|
text = filepath.read_text(errors="replace")
|
||||||
|
except (PermissionError, OSError) as e:
|
||||||
|
print(f" ⚠️ Cannot read {filepath}: {e}", file=sys.stderr)
|
||||||
|
return []
|
||||||
|
|
||||||
|
findings = []
|
||||||
|
new_lines = []
|
||||||
|
|
||||||
|
for i, line in enumerate(text.splitlines(), 1):
|
||||||
|
line_findings = scan_line(
|
||||||
|
line, i, compiled, company_pats, person_pats, placeholder_format
|
||||||
|
)
|
||||||
|
findings.extend(line_findings)
|
||||||
|
|
||||||
|
if line_findings and not is_import_line(line):
|
||||||
|
# Replace matches (longest first to avoid partial replacements)
|
||||||
|
matches = sorted(
|
||||||
|
[(f.match, f.label) for f in line_findings],
|
||||||
|
key=lambda x: len(x[0]),
|
||||||
|
reverse=True,
|
||||||
|
)
|
||||||
|
for match_text, label in matches:
|
||||||
|
placeholder = get_placeholder(label, placeholder_format)
|
||||||
|
line = line.replace(match_text, placeholder)
|
||||||
|
|
||||||
|
new_lines.append(line)
|
||||||
|
|
||||||
|
if findings:
|
||||||
|
filepath.write_text("\n".join(new_lines) + ("\n" if text.endswith("\n") else ""))
|
||||||
|
|
||||||
|
return findings
|
||||||
|
|
||||||
|
|
||||||
|
def collect_files(target: Path, recursive: bool, skip_paths: set[str]) -> list[Path]:
|
||||||
|
"""Collect files with supported extensions."""
|
||||||
|
if target.is_file():
|
||||||
|
if target.suffix in SUPPORTED_EXTENSIONS:
|
||||||
|
return [target]
|
||||||
|
return []
|
||||||
|
|
||||||
|
files = []
|
||||||
|
if recursive:
|
||||||
|
for fp in target.rglob("*"):
|
||||||
|
if fp.is_file() and fp.suffix in SUPPORTED_EXTENSIONS and not should_skip_path(fp, skip_paths):
|
||||||
|
files.append(fp)
|
||||||
|
else:
|
||||||
|
for fp in target.iterdir():
|
||||||
|
if fp.is_file() and fp.suffix in SUPPORTED_EXTENSIONS and not should_skip_path(fp, skip_paths):
|
||||||
|
files.append(fp)
|
||||||
|
|
||||||
|
return sorted(files)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Reporting
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def print_report(
|
||||||
|
all_findings: dict[str, list[Finding]], mode: str
|
||||||
|
) -> None:
|
||||||
|
total = sum(len(f) for f in all_findings.values())
|
||||||
|
files_affected = len(all_findings)
|
||||||
|
|
||||||
|
if total == 0:
|
||||||
|
print("\n✅ No PII or sensitive data found.")
|
||||||
|
return
|
||||||
|
|
||||||
|
action = "Found" if mode == "scan" else "Sanitized"
|
||||||
|
print(f"\n{'=' * 60}")
|
||||||
|
print(f"🔍 {action} {total} issue(s) across {files_affected} file(s)")
|
||||||
|
print(f"{'=' * 60}")
|
||||||
|
|
||||||
|
# Aggregate by type
|
||||||
|
by_type: dict[str, int] = defaultdict(int)
|
||||||
|
for filepath, findings in all_findings.items():
|
||||||
|
for f in findings:
|
||||||
|
by_type[f.label] += 1
|
||||||
|
|
||||||
|
print("\nBy type:")
|
||||||
|
for label, count in sorted(by_type.items(), key=lambda x: -x[1]):
|
||||||
|
print(f" {label:20s} {count:>4d}")
|
||||||
|
|
||||||
|
print(f"\nBy file:")
|
||||||
|
for filepath, findings in sorted(all_findings.items()):
|
||||||
|
print(f"\n 📄 {filepath} ({len(findings)} finding(s))")
|
||||||
|
for f in findings:
|
||||||
|
print(f" Line {f.line_num:>4d} [{f.label}]: {f.match}")
|
||||||
|
|
||||||
|
if mode == "scan":
|
||||||
|
print(f"\n💡 Run with --sanitize to redact these findings.")
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Main
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> int:
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="Scan or sanitize files for PII and sensitive data.",
|
||||||
|
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||||
|
epilog="""
|
||||||
|
Examples:
|
||||||
|
%(prog)s --scan --file config.py
|
||||||
|
%(prog)s --scan --dir . --recursive
|
||||||
|
%(prog)s --sanitize --dir src/ --recursive
|
||||||
|
%(prog)s --scan --dir . --recursive --config security/sanitizer-config.json
|
||||||
|
""",
|
||||||
|
)
|
||||||
|
|
||||||
|
mode_group = parser.add_mutually_exclusive_group(required=True)
|
||||||
|
mode_group.add_argument("--scan", action="store_true", help="Report findings without modifying files")
|
||||||
|
mode_group.add_argument("--sanitize", action="store_true", help="Replace PII with safe placeholders")
|
||||||
|
|
||||||
|
target_group = parser.add_mutually_exclusive_group(required=True)
|
||||||
|
target_group.add_argument("--file", type=str, help="Scan a single file")
|
||||||
|
target_group.add_argument("--dir", type=str, help="Scan a directory")
|
||||||
|
|
||||||
|
parser.add_argument("--recursive", "-r", action="store_true", help="Recurse into subdirectories (with --dir)")
|
||||||
|
parser.add_argument("--config", type=str, help="Path to config JSON (default: security/sanitizer-config.json)")
|
||||||
|
parser.add_argument("--quiet", "-q", action="store_true", help="Only print summary, not individual findings")
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
# Load config
|
||||||
|
config_path = Path(args.config) if args.config else None
|
||||||
|
config = load_config(config_path)
|
||||||
|
|
||||||
|
placeholder_format = config.get("placeholder_format", "bracket")
|
||||||
|
skip_paths = set(config.get("skip_paths", [])) | DEFAULT_SKIP_PATHS
|
||||||
|
|
||||||
|
compiled, company_pats, person_pats = compile_patterns(config)
|
||||||
|
|
||||||
|
# Determine target
|
||||||
|
if args.file:
|
||||||
|
target = Path(args.file)
|
||||||
|
if not target.exists():
|
||||||
|
print(f"❌ File not found: {target}", file=sys.stderr)
|
||||||
|
return 2
|
||||||
|
else:
|
||||||
|
target = Path(args.dir)
|
||||||
|
if not target.exists():
|
||||||
|
print(f"❌ Directory not found: {target}", file=sys.stderr)
|
||||||
|
return 2
|
||||||
|
|
||||||
|
files = collect_files(target, args.recursive, skip_paths)
|
||||||
|
if not files:
|
||||||
|
print("No supported files found.")
|
||||||
|
return 0
|
||||||
|
|
||||||
|
mode = "scan" if args.scan else "sanitize"
|
||||||
|
process_fn = scan_file if args.scan else sanitize_file
|
||||||
|
|
||||||
|
all_findings: dict[str, list[Finding]] = {}
|
||||||
|
for fp in files:
|
||||||
|
findings = process_fn(fp, compiled, company_pats, person_pats, placeholder_format)
|
||||||
|
if findings:
|
||||||
|
all_findings[str(fp)] = findings
|
||||||
|
|
||||||
|
if not args.quiet:
|
||||||
|
print_report(all_findings, mode)
|
||||||
|
else:
|
||||||
|
total = sum(len(f) for f in all_findings.values())
|
||||||
|
if total > 0:
|
||||||
|
action = "Found" if mode == "scan" else "Sanitized"
|
||||||
|
print(f"{action} {total} issue(s) in {len(all_findings)} file(s)")
|
||||||
|
|
||||||
|
# Exit code: 0 clean, 1 PII found
|
||||||
|
return 1 if all_findings else 0
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
sys.exit(main())
|
||||||
94
telemetry/README.md
Normal file
94
telemetry/README.md
Normal file
|
|
@ -0,0 +1,94 @@
|
||||||
|
# Telemetry
|
||||||
|
|
||||||
|
Opt-in, local-first, privacy-respecting usage telemetry for AI Marketing Skills.
|
||||||
|
|
||||||
|
## What's Collected
|
||||||
|
|
||||||
|
When you opt in, the following **anonymous** data is sent:
|
||||||
|
|
||||||
|
| Field | Example | Purpose |
|
||||||
|
|-------|---------|---------|
|
||||||
|
| Skill name | `growth-engine` | Know which skills are used |
|
||||||
|
| Duration (ms) | `4500` | Track performance |
|
||||||
|
| Success/fail | `true` | Track reliability |
|
||||||
|
| Version | `1.0.0` | Know which versions are in use |
|
||||||
|
| OS | `Darwin` | Platform compatibility |
|
||||||
|
| Architecture | `arm64` | Platform compatibility |
|
||||||
|
| Python version | `3.12` | Runtime compatibility |
|
||||||
|
| Timestamp | `2026-03-31T12:00:00Z` | Usage patterns |
|
||||||
|
| Device ID | `<random-uuid>` | Deduplicate (not tied to identity) |
|
||||||
|
|
||||||
|
## What's NOT Collected — Ever
|
||||||
|
|
||||||
|
- ❌ Code content
|
||||||
|
- ❌ File paths
|
||||||
|
- ❌ Repository names
|
||||||
|
- ❌ Usernames or emails
|
||||||
|
- ❌ Environment variables
|
||||||
|
- ❌ API keys or secrets
|
||||||
|
- ❌ Any content you're working on
|
||||||
|
|
||||||
|
## How to Opt In or Out
|
||||||
|
|
||||||
|
### First run (interactive)
|
||||||
|
```bash
|
||||||
|
python3 telemetry/telemetry_init.py
|
||||||
|
```
|
||||||
|
You'll be asked to choose. Your choice is saved.
|
||||||
|
|
||||||
|
### Non-interactive
|
||||||
|
```bash
|
||||||
|
python3 telemetry/telemetry_init.py --yes # Opt in
|
||||||
|
python3 telemetry/telemetry_init.py --no # Opt out
|
||||||
|
```
|
||||||
|
|
||||||
|
### Change your mind later
|
||||||
|
Delete the config and re-run:
|
||||||
|
```bash
|
||||||
|
rm ~/.ai-marketing-skills/telemetry-config.json
|
||||||
|
python3 telemetry/telemetry_init.py
|
||||||
|
```
|
||||||
|
|
||||||
|
## Local Data — Always Available
|
||||||
|
|
||||||
|
**Regardless of opt-in**, all skill runs are logged locally so you can see your own usage:
|
||||||
|
|
||||||
|
```
|
||||||
|
~/.ai-marketing-skills/analytics/skill-usage.jsonl
|
||||||
|
```
|
||||||
|
|
||||||
|
This data never leaves your machine unless you opt in.
|
||||||
|
|
||||||
|
## View Your Stats
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python3 telemetry/telemetry_report.py
|
||||||
|
```
|
||||||
|
|
||||||
|
Shows: total runs, runs per skill, success rates, average durations, most used skill, and more.
|
||||||
|
|
||||||
|
### Options
|
||||||
|
```bash
|
||||||
|
python3 telemetry/telemetry_report.py --json # Machine-readable JSON
|
||||||
|
python3 telemetry/telemetry_report.py --skill seo-bot # Filter to one skill
|
||||||
|
```
|
||||||
|
|
||||||
|
## Check for Updates
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python3 telemetry/version_check.py
|
||||||
|
```
|
||||||
|
|
||||||
|
- Compares your local version against the latest GitHub release
|
||||||
|
- Silent when up to date
|
||||||
|
- Caches the result for 24 hours to avoid excess API calls
|
||||||
|
- Never blocks execution if offline
|
||||||
|
|
||||||
|
## Privacy Commitment
|
||||||
|
|
||||||
|
1. **Opt-in only** — nothing is sent without your explicit consent
|
||||||
|
2. **Local-first** — your data is always stored locally for your own use
|
||||||
|
3. **Minimal data** — only what's needed to improve the skills
|
||||||
|
4. **No PII** — no names, emails, paths, or content
|
||||||
|
5. **Transparent** — all telemetry code is right here, read it yourself
|
||||||
|
6. **Revocable** — opt out any time, delete your config file
|
||||||
94
telemetry/telemetry_init.py
Normal file
94
telemetry/telemetry_init.py
Normal file
|
|
@ -0,0 +1,94 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""First-run opt-in prompt for anonymous usage telemetry."""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import uuid
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
CONFIG_DIR = Path.home() / ".ai-marketing-skills"
|
||||||
|
CONFIG_FILE = CONFIG_DIR / "telemetry-config.json"
|
||||||
|
|
||||||
|
|
||||||
|
def load_config():
|
||||||
|
"""Load existing telemetry config, or return None if not found."""
|
||||||
|
if CONFIG_FILE.exists():
|
||||||
|
try:
|
||||||
|
with open(CONFIG_FILE, "r") as f:
|
||||||
|
return json.load(f)
|
||||||
|
except (json.JSONDecodeError, OSError):
|
||||||
|
return None
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def save_config(opted_in: bool) -> dict:
|
||||||
|
"""Save telemetry config and return it."""
|
||||||
|
CONFIG_DIR.mkdir(parents=True, exist_ok=True)
|
||||||
|
config = {
|
||||||
|
"opted_in": opted_in,
|
||||||
|
"device_id": str(uuid.uuid4()),
|
||||||
|
"created": datetime.now(timezone.utc).isoformat(),
|
||||||
|
}
|
||||||
|
with open(CONFIG_FILE, "w") as f:
|
||||||
|
json.dump(config, f, indent=2)
|
||||||
|
return config
|
||||||
|
|
||||||
|
|
||||||
|
def prompt_user() -> bool:
|
||||||
|
"""Interactive opt-in prompt. Returns True if user opts in."""
|
||||||
|
print(
|
||||||
|
"Would you like to opt into anonymous usage telemetry?\n"
|
||||||
|
"This helps us improve skills.\n"
|
||||||
|
"\n"
|
||||||
|
"Data collected: skill name, duration, success/fail, version, OS.\n"
|
||||||
|
"No code, file paths, or repo content is ever sent.\n"
|
||||||
|
)
|
||||||
|
while True:
|
||||||
|
answer = input("(y/n): ").strip().lower()
|
||||||
|
if answer in ("y", "yes"):
|
||||||
|
return True
|
||||||
|
if answer in ("n", "no"):
|
||||||
|
return False
|
||||||
|
print("Please enter y or n.")
|
||||||
|
|
||||||
|
|
||||||
|
def init_telemetry(yes: bool = False, no: bool = False) -> dict:
|
||||||
|
"""Initialize telemetry. Returns config dict.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
yes: Non-interactive opt-in.
|
||||||
|
no: Non-interactive opt-out.
|
||||||
|
"""
|
||||||
|
existing = load_config()
|
||||||
|
if existing is not None:
|
||||||
|
return existing
|
||||||
|
|
||||||
|
if yes:
|
||||||
|
opted_in = True
|
||||||
|
elif no:
|
||||||
|
opted_in = False
|
||||||
|
else:
|
||||||
|
opted_in = prompt_user()
|
||||||
|
|
||||||
|
config = save_config(opted_in)
|
||||||
|
status = "enabled" if opted_in else "disabled"
|
||||||
|
print(f"Telemetry {status}. Config saved to {CONFIG_FILE}")
|
||||||
|
return config
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(description="Initialize telemetry opt-in.")
|
||||||
|
group = parser.add_mutually_exclusive_group()
|
||||||
|
group.add_argument("--yes", action="store_true", help="Opt in non-interactively.")
|
||||||
|
group.add_argument("--no", action="store_true", help="Opt out non-interactively.")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
config = init_telemetry(yes=args.yes, no=args.no)
|
||||||
|
print(json.dumps(config, indent=2))
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
118
telemetry/telemetry_log.py
Normal file
118
telemetry/telemetry_log.py
Normal file
|
|
@ -0,0 +1,118 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""Log a skill run event. Called by each skill's preamble.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python3 telemetry/telemetry_log.py --skill <name> --duration <ms> --success <true/false> --version <ver>
|
||||||
|
|
||||||
|
Always logs locally. If opted in, also sends to analytics endpoint.
|
||||||
|
Never logs: code content, file paths, repo names, usernames, environment variables.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import platform
|
||||||
|
import sys
|
||||||
|
import urllib.request
|
||||||
|
import urllib.error
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
CONFIG_DIR = Path.home() / ".ai-marketing-skills"
|
||||||
|
CONFIG_FILE = CONFIG_DIR / "telemetry-config.json"
|
||||||
|
ANALYTICS_DIR = CONFIG_DIR / "analytics"
|
||||||
|
USAGE_LOG = ANALYTICS_DIR / "skill-usage.jsonl"
|
||||||
|
|
||||||
|
# Replace with your analytics endpoint
|
||||||
|
ANALYTICS_ENDPOINT = "https://example.com/api/telemetry" # no-op stub — Replace with your analytics endpoint
|
||||||
|
|
||||||
|
|
||||||
|
def load_config() -> dict:
|
||||||
|
"""Load telemetry config. Returns empty dict if not found."""
|
||||||
|
if CONFIG_FILE.exists():
|
||||||
|
try:
|
||||||
|
with open(CONFIG_FILE, "r") as f:
|
||||||
|
return json.load(f)
|
||||||
|
except (json.JSONDecodeError, OSError):
|
||||||
|
pass
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
def python_version() -> str:
|
||||||
|
"""Return major.minor Python version string."""
|
||||||
|
return f"{sys.version_info.major}.{sys.version_info.minor}"
|
||||||
|
|
||||||
|
|
||||||
|
def build_entry(skill: str, duration_ms: int, success: bool, version: str, device_id: str) -> dict:
|
||||||
|
"""Build a log entry. Only safe, anonymous fields."""
|
||||||
|
return {
|
||||||
|
"skill": skill,
|
||||||
|
"duration_ms": duration_ms,
|
||||||
|
"success": success,
|
||||||
|
"version": version,
|
||||||
|
"os": platform.system(),
|
||||||
|
"arch": platform.machine(),
|
||||||
|
"python": python_version(),
|
||||||
|
"timestamp": datetime.now(timezone.utc).isoformat(),
|
||||||
|
"device_id": device_id,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def log_locally(entry: dict):
|
||||||
|
"""Append entry to local JSONL log."""
|
||||||
|
ANALYTICS_DIR.mkdir(parents=True, exist_ok=True)
|
||||||
|
with open(USAGE_LOG, "a") as f:
|
||||||
|
f.write(json.dumps(entry) + "\n")
|
||||||
|
|
||||||
|
|
||||||
|
def send_remote(entry: dict):
|
||||||
|
"""Send entry to remote analytics endpoint. Fails silently."""
|
||||||
|
try:
|
||||||
|
data = json.dumps(entry).encode("utf-8")
|
||||||
|
req = urllib.request.Request(
|
||||||
|
ANALYTICS_ENDPOINT,
|
||||||
|
data=data,
|
||||||
|
headers={"Content-Type": "application/json"},
|
||||||
|
method="POST",
|
||||||
|
)
|
||||||
|
urllib.request.urlopen(req, timeout=5)
|
||||||
|
except Exception:
|
||||||
|
# Never block skill execution
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
def parse_bool(value: str) -> bool:
|
||||||
|
"""Parse a boolean string."""
|
||||||
|
return value.lower() in ("true", "1", "yes")
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(description="Log a skill run event.")
|
||||||
|
parser.add_argument("--skill", required=True, help="Skill name.")
|
||||||
|
parser.add_argument("--duration", required=True, type=int, help="Duration in milliseconds.")
|
||||||
|
parser.add_argument("--success", required=True, help="true/false")
|
||||||
|
parser.add_argument("--version", required=True, help="Skill version.")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
config = load_config()
|
||||||
|
device_id = config.get("device_id", "unknown")
|
||||||
|
opted_in = config.get("opted_in", False)
|
||||||
|
|
||||||
|
entry = build_entry(
|
||||||
|
skill=args.skill,
|
||||||
|
duration_ms=args.duration,
|
||||||
|
success=parse_bool(args.success),
|
||||||
|
version=args.version,
|
||||||
|
device_id=device_id,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Always log locally
|
||||||
|
log_locally(entry)
|
||||||
|
|
||||||
|
# Send remotely only if opted in
|
||||||
|
if opted_in:
|
||||||
|
send_remote(entry)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
177
telemetry/telemetry_report.py
Normal file
177
telemetry/telemetry_report.py
Normal file
|
|
@ -0,0 +1,177 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""Local stats viewer for skill usage data.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python3 telemetry/telemetry_report.py # Full report
|
||||||
|
python3 telemetry/telemetry_report.py --json # Machine-readable output
|
||||||
|
python3 telemetry/telemetry_report.py --skill X # Filter to one skill
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import sys
|
||||||
|
from collections import defaultdict
|
||||||
|
from datetime import datetime, timezone, timedelta
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
CONFIG_DIR = Path.home() / ".ai-marketing-skills"
|
||||||
|
CONFIG_FILE = CONFIG_DIR / "telemetry-config.json"
|
||||||
|
USAGE_LOG = CONFIG_DIR / "analytics" / "skill-usage.jsonl"
|
||||||
|
|
||||||
|
|
||||||
|
def load_entries(skill_filter: str = None) -> list:
|
||||||
|
"""Load all log entries, optionally filtered by skill."""
|
||||||
|
if not USAGE_LOG.exists():
|
||||||
|
return []
|
||||||
|
entries = []
|
||||||
|
with open(USAGE_LOG, "r") as f:
|
||||||
|
for line in f:
|
||||||
|
line = line.strip()
|
||||||
|
if not line:
|
||||||
|
continue
|
||||||
|
try:
|
||||||
|
entry = json.loads(line)
|
||||||
|
if skill_filter and entry.get("skill") != skill_filter:
|
||||||
|
continue
|
||||||
|
entries.append(entry)
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
continue
|
||||||
|
return entries
|
||||||
|
|
||||||
|
|
||||||
|
def load_config() -> dict:
|
||||||
|
if CONFIG_FILE.exists():
|
||||||
|
try:
|
||||||
|
with open(CONFIG_FILE, "r") as f:
|
||||||
|
return json.load(f)
|
||||||
|
except (json.JSONDecodeError, OSError):
|
||||||
|
pass
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
def parse_timestamp(ts: str) -> datetime:
|
||||||
|
"""Parse ISO timestamp string."""
|
||||||
|
# Handle both formats with and without timezone
|
||||||
|
try:
|
||||||
|
return datetime.fromisoformat(ts)
|
||||||
|
except ValueError:
|
||||||
|
return datetime.fromisoformat(ts.replace("Z", "+00:00"))
|
||||||
|
|
||||||
|
|
||||||
|
def generate_report(entries: list, config: dict) -> dict:
|
||||||
|
"""Generate stats from entries."""
|
||||||
|
now = datetime.now(timezone.utc)
|
||||||
|
seven_days_ago = now - timedelta(days=7)
|
||||||
|
thirty_days_ago = now - timedelta(days=30)
|
||||||
|
|
||||||
|
total = len(entries)
|
||||||
|
last_7 = 0
|
||||||
|
last_30 = 0
|
||||||
|
skill_runs = defaultdict(int)
|
||||||
|
skill_successes = defaultdict(int)
|
||||||
|
skill_durations = defaultdict(list)
|
||||||
|
last_timestamp = None
|
||||||
|
|
||||||
|
for e in entries:
|
||||||
|
skill = e.get("skill", "unknown")
|
||||||
|
skill_runs[skill] += 1
|
||||||
|
|
||||||
|
if e.get("success"):
|
||||||
|
skill_successes[skill] += 1
|
||||||
|
|
||||||
|
duration = e.get("duration_ms")
|
||||||
|
if duration is not None:
|
||||||
|
skill_durations[skill].append(duration)
|
||||||
|
|
||||||
|
ts_str = e.get("timestamp")
|
||||||
|
if ts_str:
|
||||||
|
try:
|
||||||
|
ts = parse_timestamp(ts_str)
|
||||||
|
if ts.tzinfo is None:
|
||||||
|
ts = ts.replace(tzinfo=timezone.utc)
|
||||||
|
if ts >= seven_days_ago:
|
||||||
|
last_7 += 1
|
||||||
|
if ts >= thirty_days_ago:
|
||||||
|
last_30 += 1
|
||||||
|
if last_timestamp is None or ts > last_timestamp:
|
||||||
|
last_timestamp = ts
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
pass
|
||||||
|
|
||||||
|
# Per-skill stats
|
||||||
|
per_skill = {}
|
||||||
|
for skill, count in sorted(skill_runs.items(), key=lambda x: -x[1]):
|
||||||
|
avg_dur = None
|
||||||
|
if skill_durations[skill]:
|
||||||
|
avg_dur = round(sum(skill_durations[skill]) / len(skill_durations[skill]), 1)
|
||||||
|
success_rate = round(skill_successes[skill] / count * 100, 1) if count > 0 else 0
|
||||||
|
per_skill[skill] = {
|
||||||
|
"runs": count,
|
||||||
|
"success_rate_pct": success_rate,
|
||||||
|
"avg_duration_ms": avg_dur,
|
||||||
|
}
|
||||||
|
|
||||||
|
most_used = max(skill_runs, key=skill_runs.get) if skill_runs else None
|
||||||
|
|
||||||
|
return {
|
||||||
|
"total_runs": total,
|
||||||
|
"last_7_days": last_7,
|
||||||
|
"last_30_days": last_30,
|
||||||
|
"most_used_skill": most_used,
|
||||||
|
"last_run": last_timestamp.isoformat() if last_timestamp else None,
|
||||||
|
"opted_in": config.get("opted_in", False),
|
||||||
|
"per_skill": per_skill,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def print_report(report: dict):
|
||||||
|
"""Pretty-print the report."""
|
||||||
|
print("=" * 50)
|
||||||
|
print(" AI Marketing Skills — Usage Report")
|
||||||
|
print("=" * 50)
|
||||||
|
print()
|
||||||
|
print(f" Total runs (all time): {report['total_runs']}")
|
||||||
|
print(f" Last 7 days: {report['last_7_days']}")
|
||||||
|
print(f" Last 30 days: {report['last_30_days']}")
|
||||||
|
print(f" Most used skill: {report['most_used_skill'] or 'N/A'}")
|
||||||
|
print(f" Last run: {report['last_run'] or 'N/A'}")
|
||||||
|
print(f" Telemetry opt-in: {'Yes' if report['opted_in'] else 'No'}")
|
||||||
|
print()
|
||||||
|
|
||||||
|
if report["per_skill"]:
|
||||||
|
print(" Per-Skill Breakdown:")
|
||||||
|
print(" " + "-" * 46)
|
||||||
|
print(f" {'Skill':<25} {'Runs':>5} {'Success':>8} {'Avg ms':>8}")
|
||||||
|
print(" " + "-" * 46)
|
||||||
|
for skill, stats in report["per_skill"].items():
|
||||||
|
avg = f"{stats['avg_duration_ms']:.0f}" if stats["avg_duration_ms"] is not None else "N/A"
|
||||||
|
print(f" {skill:<25} {stats['runs']:>5} {stats['success_rate_pct']:>7.1f}% {avg:>8}")
|
||||||
|
else:
|
||||||
|
print(" No usage data found.")
|
||||||
|
print()
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(description="View local skill usage stats.")
|
||||||
|
parser.add_argument("--json", action="store_true", help="Output as JSON.")
|
||||||
|
parser.add_argument("--skill", help="Filter to a specific skill.")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
config = load_config()
|
||||||
|
entries = load_entries(skill_filter=args.skill)
|
||||||
|
|
||||||
|
if not entries and not args.json:
|
||||||
|
print("No usage data found. Run some skills first!")
|
||||||
|
print(f"Data location: {USAGE_LOG}")
|
||||||
|
return
|
||||||
|
|
||||||
|
report = generate_report(entries, config)
|
||||||
|
|
||||||
|
if args.json:
|
||||||
|
print(json.dumps(report, indent=2))
|
||||||
|
else:
|
||||||
|
print_report(report)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
135
telemetry/version_check.py
Normal file
135
telemetry/version_check.py
Normal file
|
|
@ -0,0 +1,135 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""Check for updates against GitHub releases.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python3 telemetry/version_check.py
|
||||||
|
|
||||||
|
Silent when up to date. Prints update notice if newer version available.
|
||||||
|
Caches result for 24 hours. Gracefully handles offline/errors.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import urllib.request
|
||||||
|
import urllib.error
|
||||||
|
from datetime import datetime, timezone, timedelta
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
REPO_ROOT = Path(__file__).resolve().parent.parent
|
||||||
|
VERSION_FILE = REPO_ROOT / "VERSION"
|
||||||
|
CACHE_DIR = Path.home() / ".ai-marketing-skills"
|
||||||
|
CACHE_FILE = CACHE_DIR / "version-cache.json"
|
||||||
|
GITHUB_API_URL = "https://api.github.com/repos/ericosiu/ai-marketing-skills/releases/latest"
|
||||||
|
CACHE_TTL_HOURS = 24
|
||||||
|
|
||||||
|
|
||||||
|
def read_local_version() -> str:
|
||||||
|
"""Read version from local VERSION file."""
|
||||||
|
try:
|
||||||
|
return VERSION_FILE.read_text().strip()
|
||||||
|
except OSError:
|
||||||
|
return "0.0.0"
|
||||||
|
|
||||||
|
|
||||||
|
def parse_semver(version: str) -> tuple:
|
||||||
|
"""Parse semver string into comparable tuple. Strips leading 'v'."""
|
||||||
|
v = version.lstrip("v")
|
||||||
|
parts = v.split(".")
|
||||||
|
result = []
|
||||||
|
for p in parts:
|
||||||
|
try:
|
||||||
|
result.append(int(p))
|
||||||
|
except ValueError:
|
||||||
|
result.append(0)
|
||||||
|
while len(result) < 3:
|
||||||
|
result.append(0)
|
||||||
|
return tuple(result[:3])
|
||||||
|
|
||||||
|
|
||||||
|
def load_cache() -> dict:
|
||||||
|
"""Load cached version check result."""
|
||||||
|
if not CACHE_FILE.exists():
|
||||||
|
return {}
|
||||||
|
try:
|
||||||
|
with open(CACHE_FILE, "r") as f:
|
||||||
|
return json.load(f)
|
||||||
|
except (json.JSONDecodeError, OSError):
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
def save_cache(latest_version: str):
|
||||||
|
"""Save version check result to cache."""
|
||||||
|
CACHE_DIR.mkdir(parents=True, exist_ok=True)
|
||||||
|
cache = {
|
||||||
|
"latest_version": latest_version,
|
||||||
|
"checked_at": datetime.now(timezone.utc).isoformat(),
|
||||||
|
}
|
||||||
|
try:
|
||||||
|
with open(CACHE_FILE, "w") as f:
|
||||||
|
json.dump(cache, f, indent=2)
|
||||||
|
except OSError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
def cache_is_fresh() -> bool:
|
||||||
|
"""Check if cache is less than CACHE_TTL_HOURS old."""
|
||||||
|
cache = load_cache()
|
||||||
|
checked_at = cache.get("checked_at")
|
||||||
|
if not checked_at:
|
||||||
|
return False
|
||||||
|
try:
|
||||||
|
ts = datetime.fromisoformat(checked_at)
|
||||||
|
if ts.tzinfo is None:
|
||||||
|
ts = ts.replace(tzinfo=timezone.utc)
|
||||||
|
return datetime.now(timezone.utc) - ts < timedelta(hours=CACHE_TTL_HOURS)
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_latest_version() -> str:
|
||||||
|
"""Fetch latest version from GitHub API. Returns version string or None."""
|
||||||
|
try:
|
||||||
|
req = urllib.request.Request(
|
||||||
|
GITHUB_API_URL,
|
||||||
|
headers={"Accept": "application/vnd.github.v3+json", "User-Agent": "ai-marketing-skills"},
|
||||||
|
)
|
||||||
|
with urllib.request.urlopen(req, timeout=10) as resp:
|
||||||
|
data = json.loads(resp.read().decode("utf-8"))
|
||||||
|
return data.get("tag_name", "").lstrip("v")
|
||||||
|
except Exception:
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def check_version():
|
||||||
|
"""Main version check logic."""
|
||||||
|
local = read_local_version()
|
||||||
|
|
||||||
|
# Check cache first
|
||||||
|
cache = load_cache()
|
||||||
|
if cache_is_fresh() and cache.get("latest_version"):
|
||||||
|
latest = cache["latest_version"]
|
||||||
|
else:
|
||||||
|
latest = fetch_latest_version()
|
||||||
|
if latest is None:
|
||||||
|
# Offline or API error — silently exit
|
||||||
|
return
|
||||||
|
save_cache(latest)
|
||||||
|
|
||||||
|
local_parsed = parse_semver(local)
|
||||||
|
latest_parsed = parse_semver(latest)
|
||||||
|
|
||||||
|
if latest_parsed > local_parsed:
|
||||||
|
print(f"🆕 AI Marketing Skills v{latest} available (you have v{local}). Run `git pull` to update.")
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
try:
|
||||||
|
check_version()
|
||||||
|
except Exception:
|
||||||
|
# Never block skill execution
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
Loading…
Add table
Add a link
Reference in a new issue