Add 4 new skill categories: revenue-intelligence, conversion-ops, podcast-ops, team-ops

New skills (8 total):
- revenue-intelligence: Gong Insight Pipeline, Revenue Attribution Mapper, Client Report Generator
- conversion-ops: CRO Audit, Survey-to-Lead-Magnet Engine
- podcast-ops: Podcast-to-Everything Pipeline
- team-ops: Elon Algorithm (Team Performance Audit), Meeting-to-Action Extractor

Also adds .gitignore for __pycache__
This commit is contained in:
Alfred Claw 2026-03-31 07:25:46 -07:00
parent b2c11a65aa
commit 36d6ed83e7
23 changed files with 8472 additions and 4 deletions

1
.gitignore vendored Normal file
View file

@ -0,0 +1 @@
__pycache__/

View file

@ -16,6 +16,10 @@ These aren't prompts. They're complete workflows — scripts, scoring algorithms
| [**Outbound Engine**](./outbound-engine/) | ICP definition to emails in inbox — fully automated | Cold Outbound Optimizer, Lead Pipeline, Competitive Monitor |
| [**SEO Ops**](./seo-ops/) | Find the keywords your competitors missed | Content Attack Briefs, GSC Optimizer, Trend Scout |
| [**Finance Ops**](./finance-ops/) | Your AI CFO that finds hidden costs in 30 minutes | CFO Briefing, Cost Estimate, Scenario Modeler |
| [**Revenue Intelligence**](./revenue-intelligence/) | Prove content ROI and turn sales calls into strategy | Gong Insight Pipeline, Revenue Attribution, Client Report Generator |
| [**Conversion Ops**](./conversion-ops/) | Score any landing page and turn survey data into lead magnets | CRO Audit, Survey-to-Lead-Magnet Engine |
| [**Podcast Ops**](./podcast-ops/) | One episode → 20+ content pieces across every platform | Podcast-to-Everything Pipeline, Content Calendar |
| [**Team Ops**](./team-ops/) | Ruthless performance audits and meeting intelligence | Elon Algorithm, Meeting-to-Action Extractor |
---
@ -108,11 +112,27 @@ ai-marketing-skills/
│ ├── gsc_client.py
│ ├── trend_scout.py
│ └── ...
└── finance-ops/ ← Financial analysis
├── finance-ops/ ← Financial analysis
│ ├── SKILL.md
│ ├── scripts/
│ ├── references/ ← Metrics, rates, ROI models
│ └── ...
├── revenue-intelligence/ ← Sales call insights + attribution
│ ├── SKILL.md
│ ├── gong_insight_pipeline.py
│ ├── revenue_attribution.py
│ └── client_report_generator.py
├── conversion-ops/ ← CRO + lead magnet generation
│ ├── SKILL.md
│ ├── cro_audit.py
│ └── survey_lead_magnet.py
├── podcast-ops/ ← Podcast → content factory
│ ├── SKILL.md
│ └── podcast_pipeline.py
└── team-ops/ ← Performance audits + meeting intel
├── SKILL.md
├── scripts/
├── references/ ← Metrics, rates, ROI models
└── ...
├── team_performance_audit.py
└── meeting_action_extractor.py
```
---

193
conversion-ops/README.md Normal file
View file

@ -0,0 +1,193 @@
# AI Conversion Ops
**Turn landing pages into conversion machines. Turn survey data into lead magnets.**
An AI-powered conversion optimization suite that replaces manual CRO audits and survey analysis. These tools score your landing pages across 8 proven conversion dimensions and transform raw survey responses into segmented lead magnet strategies — all without API keys or headless browsers.
## What's Inside
### 🎯 CRO Audit Tool
Fetches any landing page URL and runs it through a comprehensive conversion heuristics engine. Scores across 8 dimensions, compares against industry benchmarks, and generates specific fix recommendations with before/after suggestions.
**What it finds:**
- Weak or missing headlines that fail the 5-second test
- CTAs that blend in instead of standing out
- Missing social proof that kills trust
- Forms with too much friction
- Mobile responsiveness gaps
- Page weight and speed red flags
- Missing trust signals and urgency elements
### 📊 Survey-to-Lead-Magnet Engine
Ingests survey response CSVs, clusters respondents by pain point themes, ranks segments by size and commercial potential, and auto-generates complete lead magnet briefs for each segment.
**What it produces:**
- Pain point clusters from free-text survey responses
- Segments ranked by commercial opportunity
- Complete lead magnet briefs (title, format, hook, outline, CTA)
- Viral potential and conversion potential scores
- Prioritized implementation roadmap
## Quick Start
### 1. Install dependencies
```bash
pip install -r requirements.txt
```
### 2. Run a CRO audit
```bash
# Single page
python cro_audit.py --url https://yoursite.com/landing-page
# With industry benchmarks
python cro_audit.py --url https://yoursite.com/landing-page --industry saas
# Batch mode
python cro_audit.py --file urls.txt --industry ecommerce --output results.json
```
### 3. Generate lead magnets from survey data
```bash
# Basic analysis
python survey_lead_magnet.py --csv survey_responses.csv
# Specify pain point columns
python survey_lead_magnet.py --csv survey.csv --pain-columns "biggest_challenge" "what_keeps_you_up"
# Top 3 segments with JSON output
python survey_lead_magnet.py --csv survey.csv --top-segments 3 --json
```
## CRO Scoring Model
Every page is scored across **8 dimensions** (each 0100):
| Dimension | What It Measures | Weight |
|-----------|-----------------|--------|
| Headline Clarity | Value prop visible in <5 seconds | 15% |
| CTA Visibility | Prominent, contrasting, above fold | 20% |
| Social Proof | Testimonials, logos, numbers, case studies | 15% |
| Urgency | Scarcity, deadlines, limited availability | 5% |
| Trust Signals | Security badges, guarantees, certifications | 10% |
| Form Friction | Field count, form complexity, required fields | 15% |
| Mobile Responsiveness | Viewport meta, responsive patterns | 10% |
| Page Speed Indicators | Image optimization, script count, resource size | 10% |
**Overall CRO Score** = Weighted average → letter grade (A+ through F).
### Industry Benchmarks
Benchmarks are calibrated per industry:
| Industry | Avg CRO Score | Top Quartile |
|----------|--------------|--------------|
| SaaS | 62 | 78+ |
| E-commerce | 58 | 74+ |
| Agency | 55 | 72+ |
| Finance | 60 | 76+ |
| Healthcare | 52 | 68+ |
| Education | 54 | 70+ |
| B2B | 56 | 73+ |
## Survey Segmentation
The lead magnet engine uses keyword frequency analysis and TF-IDF clustering to group survey responses:
1. **Text preprocessing** — Normalize, tokenize, remove stopwords
2. **Theme extraction** — TF-IDF vectorization of pain point responses
3. **Clustering** — Group similar responses into pain segments
4. **Ranking** — Score segments by size × commercial signal strength
5. **Brief generation** — Create lead magnet briefs targeting each cluster
### Lead Magnet Formats
The engine recommends the best format per segment:
- **Guide** — Deep educational content for complex problems
- **Checklist** — Actionable steps for process-oriented pain points
- **Template** — Fill-in-the-blank tools for recurring tasks
- **Calculator** — Interactive tools for quantifiable decisions
- **Swipe File** — Example collections for creative/copy challenges
## Architecture
```
┌──────────────────────────────────────────────────┐
│ CRO Audit │
│ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │
│ │ HTML │ │ 8-Dim │ │ Industry │ │
│ │ Fetcher │ │ Scorer │ │ Benchmarks │ │
│ └────┬─────┘ └────┬─────┘ └────────┬─────────┘ │
│ └─────────────┼────────────────┘ │
│ ▼ │
│ ┌──────────────────────────────────────────────┐ │
│ │ Weighted Score + Priority Fixes │ │
│ │ Before/After · Letter Grade · Benchmarks │ │
│ └──────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────┐
│ Survey-to-Lead-Magnet Engine │
│ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │
│ │ CSV │ │ TF-IDF │ │ Pain Point │ │
│ │ Ingest │ │ Cluster │ │ Ranking │ │
│ └────┬─────┘ └────┬─────┘ └────────┬─────────┘ │
│ └─────────────┼────────────────┘ │
│ ▼ │
│ ┌──────────────────────────────────────────────┐ │
│ │ Lead Magnet Briefs + Scoring Matrix │ │
│ │ Title · Format · Hook · Outline · CTA │ │
│ └──────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────┘
```
## Configuration
No API keys required. Both tools run entirely locally.
| Variable | Required | Description |
|----------|----------|-------------|
| `USER_AGENT` | No | Custom user agent for fetching pages |
| `REQUEST_TIMEOUT` | No | HTTP request timeout in seconds (default: 15) |
## Using as a Claude Code Skill
Add this to your `.claude/agents/` directory and use the `SKILL.md` for Claude Code integration. The skill enables Claude to:
1. Audit landing pages for conversion issues on demand
2. Score pages against industry benchmarks
3. Generate lead magnet strategies from survey data
4. Run batch CRO audits across multiple URLs
## File Structure
```
conversion-ops/
├── README.md # This file
├── SKILL.md # Claude Code agent skill definition
├── cro_audit.py # Landing page CRO scoring engine
├── survey_lead_magnet.py # Survey segmentation + lead magnet generator
└── requirements.txt # Python dependencies
```
## License
MIT
---
<div align="center">
**🧠 [Want these built and managed for you? →](https://singlebrain.com/?utm_source=github&utm_medium=skill_repo&utm_campaign=ai_marketing_skills)**
*This is how we build agents at [Single Brain](https://singlebrain.com/?utm_source=github&utm_medium=skill_repo&utm_campaign=ai_marketing_skills) for our clients.*
[Single Grain](https://www.singlegrain.com/?utm_source=github&utm_medium=skill_repo&utm_campaign=ai_marketing_skills) · our marketing agency
📬 **[Level up your marketing with 14,000+ marketers and founders →](https://levelingup.beehiiv.com/subscribe)** *(free)*
</div>

116
conversion-ops/SKILL.md Normal file
View file

@ -0,0 +1,116 @@
# AI Conversion Ops
AI-powered conversion rate optimization: landing page audits, CRO scoring, survey segmentation, and lead magnet generation.
## When to Use
- User asks for a landing page audit or CRO analysis
- User wants to score a page across conversion dimensions
- User needs to identify conversion bottlenecks on a URL
- User has survey data and wants to segment respondents by pain point
- User wants lead magnet ideas generated from survey responses
- User needs batch CRO analysis across multiple URLs
## Tools
### CRO Audit (`cro_audit.py`)
Fetches a landing page and scores it across 8 conversion dimensions. No headless browser needed.
```bash
# Single URL audit
python cro_audit.py --url https://example.com/landing-page
# Batch mode — multiple URLs
python cro_audit.py --urls https://example.com/page1 https://example.com/page2
# URLs from a file (one per line)
python cro_audit.py --file urls.txt
# Specify industry for benchmark comparison
python cro_audit.py --url https://example.com --industry saas
# JSON output
python cro_audit.py --url https://example.com --json
# Save report to file
python cro_audit.py --url https://example.com --output report.json
```
**Scoring dimensions (each 0100):**
1. **Headline Clarity** — Is the value prop obvious in <5 seconds?
2. **CTA Visibility** — Are CTAs prominent, contrasting, above the fold?
3. **Social Proof** — Testimonials, logos, case studies, numbers?
4. **Urgency** — Scarcity, deadlines, limited offers?
5. **Trust Signals** — Security badges, guarantees, privacy, certifications?
6. **Form Friction** — How many fields? Is the form intimidating?
7. **Mobile Responsiveness** — Viewport meta, responsive patterns, touch targets?
8. **Page Speed Indicators** — Image optimization, script count, resource size?
**Overall CRO Score** = Weighted average across all 8 dimensions.
**Output includes:**
- Per-dimension score with specific findings
- Priority fixes ranked by impact
- Before/after suggestions for each issue
- Industry benchmark comparison
- Overall letter grade (A+ through F)
**Supported industries:** `saas`, `ecommerce`, `agency`, `finance`, `healthcare`, `education`, `b2b`, `general`
### Survey-to-Lead-Magnet Engine (`survey_lead_magnet.py`)
Ingests survey CSV data, clusters respondents by pain point, and generates lead magnet briefs for each segment.
```bash
# Basic usage — analyze survey CSV
python survey_lead_magnet.py --csv survey_responses.csv
# Specify which columns contain pain points / challenges
python survey_lead_magnet.py --csv survey.csv --pain-columns "biggest_challenge" "top_frustration"
# Limit number of segments
python survey_lead_magnet.py --csv survey.csv --top-segments 5
# JSON output
python survey_lead_magnet.py --csv survey.csv --json
# Save output
python survey_lead_magnet.py --csv survey.csv --output lead_magnets.json
```
**What it produces:**
- Pain point clusters with respondent counts
- Segments ranked by size and commercial potential
- For each top segment, a lead magnet brief:
- Title, format (guide/checklist/template/calculator), hook
- Content outline (57 sections)
- Target CTA and distribution channel
- Viral potential score + conversion potential score
- Prioritized implementation roadmap
**CSV format:** Questions as column headers, one respondent per row. Works with any survey tool export (Typeform, Google Forms, SurveyMonkey, etc.)
## Configuration
No API keys required. Both tools work with local analysis only.
Optional environment variables:
| Variable | Required | Description |
|----------|----------|-------------|
| `USER_AGENT` | No | Custom user agent for page fetching (default provided) |
| `REQUEST_TIMEOUT` | No | HTTP timeout in seconds (default: 15) |
## Recommended Workflow
1. **Weekly:** Run `cro_audit.py` on your top landing pages to track CRO scores over time
2. **Post-survey:** Run `survey_lead_magnet.py` to turn survey data into content strategy
3. **Pre-launch:** Audit new landing pages before driving paid traffic
4. **Monthly:** Batch audit competitor landing pages to benchmark against
## Dependencies
```bash
pip install -r requirements.txt
```

946
conversion-ops/cro_audit.py Normal file
View file

@ -0,0 +1,946 @@
#!/usr/bin/env python3
"""
AI CRO Audit Tool
==================
Fetches a landing page URL, analyzes its HTML structure, and scores it across
8 conversion dimensions. Outputs a structured report with specific fix
recommendations and industry benchmark comparisons.
No headless browser required uses requests + BeautifulSoup.
Usage:
python cro_audit.py --url https://example.com/landing-page
python cro_audit.py --urls https://example.com/page1 https://example.com/page2
python cro_audit.py --file urls.txt --industry saas
python cro_audit.py --url https://example.com --json --output report.json
"""
import argparse
import json
import os
import re
import sys
from dataclasses import dataclass, field, asdict
from typing import Optional
from urllib.parse import urlparse
import requests
from bs4 import BeautifulSoup, Comment
# ---------------------------------------------------------------------------
# Constants
# ---------------------------------------------------------------------------
DEFAULT_UA = (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
)
REQUEST_TIMEOUT = int(os.getenv("REQUEST_TIMEOUT", "15"))
USER_AGENT = os.getenv("USER_AGENT", DEFAULT_UA)
# Dimension weights for overall score
DIMENSION_WEIGHTS = {
"headline_clarity": 0.15,
"cta_visibility": 0.20,
"social_proof": 0.15,
"urgency": 0.05,
"trust_signals": 0.10,
"form_friction": 0.15,
"mobile_responsiveness": 0.10,
"page_speed_indicators": 0.10,
}
# Industry benchmarks: {industry: {avg, top_quartile}}
INDUSTRY_BENCHMARKS = {
"saas": {"avg": 62, "top_quartile": 78},
"ecommerce": {"avg": 58, "top_quartile": 74},
"agency": {"avg": 55, "top_quartile": 72},
"finance": {"avg": 60, "top_quartile": 76},
"healthcare": {"avg": 52, "top_quartile": 68},
"education": {"avg": 54, "top_quartile": 70},
"b2b": {"avg": 56, "top_quartile": 73},
"general": {"avg": 56, "top_quartile": 72},
}
# CTA keyword patterns
CTA_PATTERNS = re.compile(
r"\b(get started|sign up|start free|try free|book a? ?demo|schedule|"
r"download|buy now|add to cart|subscribe|join|register|request|"
r"claim|grab|unlock|access|learn more|contact us|talk to|"
r"start now|begin|enroll|apply now|shop now|order now)\b",
re.IGNORECASE,
)
# Social proof patterns
SOCIAL_PROOF_PATTERNS = re.compile(
r"\b(testimonial|review|rating|stars?|customers?|clients?|"
r"companies|trusted by|used by|loved by|join \d|"
r"case stud|success stor|\d+\s*\+?\s*(users?|customers?|clients?|companies|businesses)|"
r"as seen|featured in|featured on|logo|partner)\b",
re.IGNORECASE,
)
# Urgency patterns
URGENCY_PATTERNS = re.compile(
r"\b(limited time|act now|hurry|expires?|deadline|only \d|"
r"last chance|don'?t miss|ending soon|today only|"
r"while supplies|few (left|remaining|spots)|countdown|"
r"offer ends|sale ends|hours left|minutes left|spots? left|"
r"exclusive|one-time|flash sale|clearance)\b",
re.IGNORECASE,
)
# Trust signal patterns
TRUST_PATTERNS = re.compile(
r"\b(ssl|secure|encrypt|privacy|guarantee|money.?back|"
r"refund|no.?risk|free trial|cancel any ?time|"
r"gdpr|hipaa|soc.?2|iso|pci|complian|certif|"
r"bbb|accredit|verified|badge|shield|lock|"
r"norton|mcafee|trustpilot|stripe|paypal)\b",
re.IGNORECASE,
)
# ---------------------------------------------------------------------------
# Data classes
# ---------------------------------------------------------------------------
@dataclass
class DimensionScore:
name: str
score: int # 0-100
findings: list = field(default_factory=list)
recommendations: list = field(default_factory=list)
@dataclass
class CROReport:
url: str
overall_score: float = 0.0
letter_grade: str = ""
dimensions: dict = field(default_factory=dict)
priority_fixes: list = field(default_factory=list)
benchmark_comparison: dict = field(default_factory=dict)
fetch_error: Optional[str] = None
# ---------------------------------------------------------------------------
# Fetcher
# ---------------------------------------------------------------------------
def fetch_page(url: str) -> tuple[Optional[str], Optional[str]]:
"""Fetch page HTML. Returns (html, error)."""
try:
resp = requests.get(
url,
headers={"User-Agent": USER_AGENT},
timeout=REQUEST_TIMEOUT,
allow_redirects=True,
)
resp.raise_for_status()
return resp.text, None
except requests.RequestException as e:
return None, str(e)
# ---------------------------------------------------------------------------
# Dimension Scorers
# ---------------------------------------------------------------------------
def score_headline_clarity(soup: BeautifulSoup, text: str) -> DimensionScore:
"""Score headline clarity — is the value prop obvious in <5 seconds?"""
dim = DimensionScore(name="Headline Clarity", score=50, findings=[], recommendations=[])
h1_tags = soup.find_all("h1")
h2_tags = soup.find_all("h2")
# Check H1 exists
if not h1_tags:
dim.score -= 30
dim.findings.append("No H1 tag found on the page")
dim.recommendations.append("Add a clear H1 headline that states your primary value proposition")
else:
h1_text = h1_tags[0].get_text(strip=True)
dim.findings.append(f"H1 found: \"{h1_text[:80]}{'...' if len(h1_text) > 80 else ''}\"")
# Length check
word_count = len(h1_text.split())
if word_count < 3:
dim.score -= 10
dim.findings.append(f"H1 is very short ({word_count} words) — may lack specificity")
dim.recommendations.append("Expand headline to include a specific benefit or outcome")
elif word_count > 15:
dim.score -= 10
dim.findings.append(f"H1 is long ({word_count} words) — may lose attention")
dim.recommendations.append("Shorten headline to 6-12 words for maximum clarity")
else:
dim.score += 15
# Check for benefit/outcome language
benefit_words = re.compile(
r"\b(grow|increase|boost|save|reduce|eliminate|transform|"
r"automate|simplify|faster|better|easier|free|revenue|"
r"profit|leads|sales|customers|results|roi)\b",
re.IGNORECASE,
)
if benefit_words.search(h1_text):
dim.score += 15
dim.findings.append("Headline contains benefit-oriented language")
else:
dim.recommendations.append("Include a specific benefit or outcome in the headline (e.g., 'Get 2x more leads')")
# Multiple H1s is bad
if len(h1_tags) > 1:
dim.score -= 10
dim.findings.append(f"Multiple H1 tags found ({len(h1_tags)}) — confuses hierarchy")
dim.recommendations.append("Use only one H1 tag per page for clear message hierarchy")
# Check for supporting subheadline
if h1_tags and h2_tags:
# Check if an H2 is near the H1 (within first few elements)
dim.score += 10
dim.findings.append("Supporting subheadline (H2) found")
elif h1_tags:
dim.recommendations.append("Add a subheadline (H2) that elaborates on the H1 value proposition")
# Check hero section has text content
hero_selectors = ["[class*='hero']", "[class*='banner']", "[class*='jumbotron']", "header"]
has_hero = False
for sel in hero_selectors:
hero = soup.select_one(sel)
if hero and len(hero.get_text(strip=True)) > 20:
has_hero = True
dim.score += 10
dim.findings.append("Hero/banner section detected with content")
break
if not has_hero:
dim.recommendations.append("Consider adding a prominent hero section with headline + subheadline + CTA")
dim.score = max(0, min(100, dim.score))
return dim
def score_cta_visibility(soup: BeautifulSoup, text: str) -> DimensionScore:
"""Score CTA visibility — are CTAs prominent, contrasting, above the fold?"""
dim = DimensionScore(name="CTA Visibility", score=40, findings=[], recommendations=[])
# Find buttons and links with CTA text
buttons = soup.find_all(["button", "a"])
cta_elements = []
for btn in buttons:
btn_text = btn.get_text(strip=True)
if CTA_PATTERNS.search(btn_text):
cta_elements.append(btn)
if not cta_elements:
dim.score -= 25
dim.findings.append("No recognizable CTA buttons/links found")
dim.recommendations.append(
"Add clear call-to-action buttons with action-oriented text "
"(e.g., 'Get Started Free', 'Book a Demo')"
)
else:
dim.score += 15
cta_texts = [el.get_text(strip=True)[:50] for el in cta_elements[:5]]
dim.findings.append(f"Found {len(cta_elements)} CTA element(s): {', '.join(cta_texts)}")
# Check for styled buttons (class contains btn/button/cta)
styled_ctas = [
el for el in cta_elements
if el.get("class") and any(
c for c in el.get("class", [])
if re.search(r"btn|button|cta", c, re.IGNORECASE)
)
]
if styled_ctas:
dim.score += 10
dim.findings.append(f"{len(styled_ctas)} CTA(s) have button styling classes")
else:
dim.recommendations.append("Style CTAs as prominent buttons with contrasting colors")
# Check for inline styles with background color (contrasting)
for el in cta_elements[:3]:
style = el.get("style", "")
if "background" in style.lower() or "color" in style.lower():
dim.score += 5
break
# Check if CTA appears early in the HTML (proxy for above-the-fold)
page_length = len(text)
if cta_elements:
first_cta_pos = text.find(str(cta_elements[0]))
if first_cta_pos > 0 and first_cta_pos < page_length * 0.3:
dim.score += 15
dim.findings.append("First CTA appears in the top 30% of page HTML (likely above fold)")
elif first_cta_pos > page_length * 0.6:
dim.score -= 10
dim.findings.append("First CTA appears late in the page — likely below the fold")
dim.recommendations.append("Move primary CTA above the fold so visitors see it without scrolling")
# Check for multiple CTAs (reinforcement)
if len(cta_elements) >= 2:
dim.score += 10
dim.findings.append("Multiple CTAs found — good reinforcement throughout page")
elif len(cta_elements) == 1:
dim.recommendations.append("Add a second CTA further down the page to catch scrollers")
# Check for sticky/fixed nav with CTA
nav = soup.find("nav")
if nav:
nav_ctas = [el for el in nav.find_all(["button", "a"]) if CTA_PATTERNS.search(el.get_text(strip=True))]
if nav_ctas:
dim.score += 10
dim.findings.append("Navigation bar contains a CTA — always visible during scroll")
dim.score = max(0, min(100, dim.score))
return dim
def score_social_proof(soup: BeautifulSoup, text: str) -> DimensionScore:
"""Score social proof presence — testimonials, logos, case studies, numbers."""
dim = DimensionScore(name="Social Proof", score=30, findings=[], recommendations=[])
# Check for social proof text patterns
matches = SOCIAL_PROOF_PATTERNS.findall(text)
if matches:
unique = set(m.lower() for m in matches)
dim.score += min(25, len(unique) * 5)
dim.findings.append(f"Social proof signals found: {', '.join(list(unique)[:8])}")
# Check for testimonial-like structures
blockquotes = soup.find_all("blockquote")
testimonial_divs = soup.select(
"[class*='testimonial'], [class*='review'], [class*='quote'], "
"[class*='feedback'], [class*='client'], [id*='testimonial']"
)
if blockquotes or testimonial_divs:
count = len(blockquotes) + len(testimonial_divs)
dim.score += 15
dim.findings.append(f"Testimonial/quote elements found ({count})")
else:
dim.recommendations.append("Add customer testimonials with real names, titles, and photos")
# Check for logo bars / trust logos
logo_sections = soup.select(
"[class*='logo'], [class*='partner'], [class*='client'], "
"[class*='brand'], [class*='trust'], [class*='company']"
)
img_tags = soup.find_all("img")
logo_imgs = [
img for img in img_tags
if img.get("alt") and re.search(r"logo|client|partner|brand", img.get("alt", ""), re.IGNORECASE)
]
if logo_sections or logo_imgs:
dim.score += 15
count = max(len(logo_sections), len(logo_imgs))
dim.findings.append(f"Client/partner logo elements detected ({count})")
else:
dim.recommendations.append("Add a logo bar showing recognizable client/partner brands")
# Check for specific numbers (e.g., "10,000+ customers")
number_proof = re.findall(
r"\d[\d,]*\s*\+?\s*(users?|customers?|clients?|companies|businesses|downloads?|reviews?|ratings?)",
text, re.IGNORECASE,
)
if number_proof:
dim.score += 10
dim.findings.append(f"Quantified social proof: {', '.join(number_proof[:3])}")
else:
dim.recommendations.append("Add specific numbers (e.g., '10,000+ customers') to quantify trust")
# Star ratings
star_elements = soup.select("[class*='star'], [class*='rating']")
if star_elements:
dim.score += 5
dim.findings.append("Star/rating elements detected")
if not matches and not blockquotes and not testimonial_divs and not logo_sections:
dim.recommendations.append(
"Social proof is critically missing. Add at minimum: 1 testimonial, "
"a client logo bar, and a quantified metric (e.g., '500+ companies trust us')"
)
dim.score = max(0, min(100, dim.score))
return dim
def score_urgency(soup: BeautifulSoup, text: str) -> DimensionScore:
"""Score urgency/scarcity elements."""
dim = DimensionScore(name="Urgency", score=40, findings=[], recommendations=[])
matches = URGENCY_PATTERNS.findall(text)
if matches:
unique = set(m.lower() for m in matches)
dim.score += min(35, len(unique) * 10)
dim.findings.append(f"Urgency signals found: {', '.join(list(unique)[:5])}")
else:
dim.findings.append("No urgency/scarcity elements detected")
dim.recommendations.append(
"Consider adding subtle urgency elements: limited-time offers, "
"countdown timers, or limited availability messaging"
)
# Countdown timer elements
countdown = soup.select("[class*='countdown'], [class*='timer'], [id*='countdown']")
if countdown:
dim.score += 15
dim.findings.append("Countdown timer element detected")
# Note: urgency isn't always appropriate — score is less punitive
if not matches and not countdown:
dim.score = max(dim.score, 35) # Floor at 35 — not having urgency is okay for many pages
dim.score = max(0, min(100, dim.score))
return dim
def score_trust_signals(soup: BeautifulSoup, text: str) -> DimensionScore:
"""Score trust signals — security, guarantees, compliance badges."""
dim = DimensionScore(name="Trust Signals", score=35, findings=[], recommendations=[])
matches = TRUST_PATTERNS.findall(text)
if matches:
unique = set(m.lower() for m in matches)
dim.score += min(30, len(unique) * 8)
dim.findings.append(f"Trust signals found: {', '.join(list(unique)[:6])}")
# Privacy policy link
privacy_links = [
a for a in soup.find_all("a")
if re.search(r"privacy|terms|policy", a.get_text(strip=True), re.IGNORECASE)
]
if privacy_links:
dim.score += 10
dim.findings.append("Privacy policy / terms links found")
else:
dim.recommendations.append("Add visible links to privacy policy and terms of service")
# Guarantee language
guarantee = re.search(
r"(money.?back|satisfaction|guarantee|risk.?free|no.?risk|full refund)",
text, re.IGNORECASE,
)
if guarantee:
dim.score += 15
dim.findings.append(f"Guarantee messaging found: '{guarantee.group()}'")
else:
dim.recommendations.append("Add a guarantee or risk-reversal statement near the CTA")
# HTTPS check (from URL parsing — if we got here, the page loaded)
# Security badge images
security_imgs = [
img for img in soup.find_all("img")
if img.get("alt") and re.search(
r"secure|ssl|badge|trust|verified|norton|mcafee",
img.get("alt", ""), re.IGNORECASE,
)
]
if security_imgs:
dim.score += 10
dim.findings.append(f"Security/trust badge images found ({len(security_imgs)})")
else:
dim.recommendations.append("Add trust badges (security seals, payment icons, compliance logos) near forms/CTAs")
dim.score = max(0, min(100, dim.score))
return dim
def score_form_friction(soup: BeautifulSoup, text: str) -> DimensionScore:
"""Score form friction — fewer fields = less friction."""
dim = DimensionScore(name="Form Friction", score=60, findings=[], recommendations=[])
forms = soup.find_all("form")
if not forms:
# No form could be good (simple CTA) or bad (no conversion mechanism)
cta_links = [a for a in soup.find_all("a") if CTA_PATTERNS.search(a.get_text(strip=True))]
if cta_links:
dim.score = 75
dim.findings.append("No form found — page uses link-based CTAs (low friction)")
else:
dim.score = 50
dim.findings.append("No form or clear conversion mechanism found")
dim.recommendations.append("Add a form or clear CTA link for lead capture")
return dim
# Analyze the primary form (first one)
form = forms[0]
inputs = form.find_all(["input", "select", "textarea"])
visible_inputs = [
inp for inp in inputs
if inp.get("type", "text") not in ("hidden", "submit", "button")
]
field_count = len(visible_inputs)
dim.findings.append(f"Form found with {field_count} visible field(s)")
if field_count <= 2:
dim.score = 90
dim.findings.append("Minimal form — very low friction")
elif field_count <= 4:
dim.score = 75
dim.findings.append("Moderate form length — acceptable friction")
elif field_count <= 6:
dim.score = 55
dim.findings.append("Form has 5-6 fields — consider reducing")
dim.recommendations.append("Reduce form to essential fields only (name + email minimum). Every extra field drops conversion ~7%")
elif field_count <= 10:
dim.score = 35
dim.findings.append(f"Long form ({field_count} fields) — high friction")
dim.recommendations.append("Split into a multi-step form or reduce to 3-4 essential fields")
else:
dim.score = 15
dim.findings.append(f"Very long form ({field_count} fields) — extreme friction")
dim.recommendations.append("This form is too long. Use progressive profiling: capture email first, ask for details later")
# Check for required field indicators
required_fields = [inp for inp in visible_inputs if inp.get("required") is not None]
if required_fields:
dim.findings.append(f"{len(required_fields)} required fields marked")
# Check for phone number field (high friction)
phone_fields = [
inp for inp in visible_inputs
if re.search(r"phone|tel|mobile", inp.get("name", "") + inp.get("type", ""), re.IGNORECASE)
]
if phone_fields:
dim.score -= 10
dim.findings.append("Phone number field detected — high-friction field")
dim.recommendations.append("Remove phone number field unless absolutely necessary. It's the #1 form abandonment cause")
# Check for clear submit button text
submit_btns = form.find_all(["button", "input"], attrs={"type": ["submit", "button"]})
if submit_btns:
btn_text = submit_btns[0].get_text(strip=True) or submit_btns[0].get("value", "")
if btn_text.lower() in ("submit", "send", "go"):
dim.score -= 5
dim.findings.append(f"Generic submit button text: '{btn_text}'")
dim.recommendations.append(f"Change '{btn_text}' to a benefit-oriented CTA (e.g., 'Get My Free Audit')")
elif btn_text:
dim.findings.append(f"Submit button text: '{btn_text}'")
# Multiple forms
if len(forms) > 2:
dim.score -= 5
dim.findings.append(f"Multiple forms on page ({len(forms)}) — may confuse visitors")
dim.score = max(0, min(100, dim.score))
return dim
def score_mobile_responsiveness(soup: BeautifulSoup, text: str) -> DimensionScore:
"""Score mobile responsiveness signals from HTML/meta tags."""
dim = DimensionScore(name="Mobile Responsiveness", score=40, findings=[], recommendations=[])
# Viewport meta tag
viewport = soup.find("meta", attrs={"name": "viewport"})
if viewport:
content = viewport.get("content", "")
dim.score += 25
dim.findings.append(f"Viewport meta tag found: {content[:60]}")
if "width=device-width" in content:
dim.score += 10
dim.findings.append("Viewport set to device-width — good")
else:
dim.score -= 20
dim.findings.append("No viewport meta tag — page likely not mobile-optimized")
dim.recommendations.append("Add <meta name='viewport' content='width=device-width, initial-scale=1'>")
# Responsive CSS indicators
style_tags = soup.find_all("style")
link_tags = soup.find_all("link", rel="stylesheet")
all_css = " ".join(tag.string or "" for tag in style_tags)
if "@media" in all_css:
dim.score += 10
dim.findings.append("Media queries found in inline CSS — responsive design present")
# Check for responsive framework classes
responsive_classes = re.search(
r"(col-(?:xs|sm|md|lg|xl)|container-fluid|row|grid|flex|"
r"sm:|md:|lg:|xl:|responsive|mobile)",
str(soup),
re.IGNORECASE,
)
if responsive_classes:
dim.score += 10
dim.findings.append("Responsive framework classes detected (grid/flex/breakpoint)")
# Touch-friendly: check for reasonable tap target sizing
small_links = soup.find_all("a")
inline_styled_small = [
a for a in small_links
if a.get("style") and re.search(r"font-size:\s*(\d+)", a.get("style", ""))
and int(re.search(r"font-size:\s*(\d+)", a.get("style", "")).group(1)) < 12
]
if inline_styled_small:
dim.score -= 5
dim.recommendations.append("Some links have very small font sizes — ensure tap targets are at least 44x44px")
# AMP or mobile-specific meta
amp = soup.find("html", attrs={"amp": True}) or soup.find("html", attrs={"": True})
if amp:
dim.score += 5
dim.findings.append("AMP page detected")
dim.score = max(0, min(100, dim.score))
return dim
def score_page_speed_indicators(soup: BeautifulSoup, html: str) -> DimensionScore:
"""Score page speed indicators from HTML analysis (not actual load time)."""
dim = DimensionScore(name="Page Speed Indicators", score=60, findings=[], recommendations=[])
# Page size
page_size_kb = len(html.encode("utf-8")) / 1024
dim.findings.append(f"HTML size: {page_size_kb:.0f} KB")
if page_size_kb > 200:
dim.score -= 15
dim.recommendations.append(f"HTML is {page_size_kb:.0f} KB — consider reducing inline content/styles")
elif page_size_kb > 100:
dim.score -= 5
# Count images
images = soup.find_all("img")
dim.findings.append(f"Images found: {len(images)}")
if len(images) > 20:
dim.score -= 10
dim.recommendations.append(f"Page has {len(images)} images — consider lazy loading or reducing image count")
elif len(images) > 10:
dim.score -= 5
# Check for lazy loading
lazy_images = [img for img in images if img.get("loading") == "lazy"]
if images and lazy_images:
pct = len(lazy_images) / len(images) * 100
dim.score += 10
dim.findings.append(f"Lazy loading: {len(lazy_images)}/{len(images)} images ({pct:.0f}%)")
elif len(images) > 5:
dim.recommendations.append("Add loading='lazy' to below-fold images")
# Check for modern image formats
modern_imgs = [
img for img in images
if img.get("src") and re.search(r"\.(webp|avif)", img.get("src", ""), re.IGNORECASE)
]
if modern_imgs:
dim.score += 5
dim.findings.append(f"Modern image formats (WebP/AVIF) detected: {len(modern_imgs)}")
elif images:
dim.recommendations.append("Convert images to WebP format for 25-35% size reduction")
# Count external scripts
scripts = soup.find_all("script", src=True)
dim.findings.append(f"External scripts: {len(scripts)}")
if len(scripts) > 15:
dim.score -= 15
dim.recommendations.append(f"Page loads {len(scripts)} external scripts — audit and remove unnecessary ones")
elif len(scripts) > 8:
dim.score -= 5
dim.recommendations.append("Consider deferring or async-loading non-critical scripts")
# Check for defer/async on scripts
deferred = [s for s in scripts if s.get("defer") is not None or s.get("async") is not None]
if scripts and deferred:
pct = len(deferred) / len(scripts) * 100
dim.findings.append(f"Deferred/async scripts: {len(deferred)}/{len(scripts)} ({pct:.0f}%)")
dim.score += 5
# Count external stylesheets
stylesheets = soup.find_all("link", rel="stylesheet")
if len(stylesheets) > 5:
dim.score -= 5
dim.recommendations.append(f"Page loads {len(stylesheets)} stylesheets — consider consolidating")
# Inline CSS bloat
inline_styles = soup.find_all("style")
inline_css_size = sum(len(s.string or "") for s in inline_styles)
if inline_css_size > 50000:
dim.score -= 10
dim.recommendations.append(f"Inline CSS is {inline_css_size / 1024:.0f} KB — move to external stylesheet and cache")
# Preconnect/preload hints
preconnects = soup.find_all("link", rel=["preconnect", "preload", "dns-prefetch"])
if preconnects:
dim.score += 5
dim.findings.append(f"Resource hints found: {len(preconnects)} preconnect/preload/dns-prefetch")
dim.score = max(0, min(100, dim.score))
return dim
# ---------------------------------------------------------------------------
# Report Builder
# ---------------------------------------------------------------------------
def compute_letter_grade(score: float) -> str:
if score >= 95:
return "A+"
elif score >= 90:
return "A"
elif score >= 85:
return "A-"
elif score >= 80:
return "B+"
elif score >= 75:
return "B"
elif score >= 70:
return "B-"
elif score >= 65:
return "C+"
elif score >= 60:
return "C"
elif score >= 55:
return "C-"
elif score >= 50:
return "D+"
elif score >= 45:
return "D"
elif score >= 40:
return "D-"
else:
return "F"
def build_report(url: str, html: str, industry: str = "general") -> CROReport:
"""Run all scorers and build the CRO report."""
soup = BeautifulSoup(html, "lxml")
# Extract visible text (strip scripts, styles, comments)
for element in soup(["script", "style"]):
element.decompose()
for comment in soup.find_all(string=lambda t: isinstance(t, Comment)):
comment.extract()
visible_text = soup.get_text(separator=" ", strip=True)
# Re-parse original for structural analysis
soup = BeautifulSoup(html, "lxml")
scorers = {
"headline_clarity": score_headline_clarity,
"cta_visibility": score_cta_visibility,
"social_proof": score_social_proof,
"urgency": score_urgency,
"trust_signals": score_trust_signals,
"form_friction": score_form_friction,
"mobile_responsiveness": score_mobile_responsiveness,
"page_speed_indicators": lambda s, t: score_page_speed_indicators(s, html),
}
dimensions = {}
for key, scorer in scorers.items():
dimensions[key] = scorer(soup, visible_text)
# Compute weighted overall score
overall = sum(
dimensions[key].score * DIMENSION_WEIGHTS[key]
for key in DIMENSION_WEIGHTS
)
# Build priority fixes (sorted by potential impact)
priority_fixes = []
for key, weight in sorted(DIMENSION_WEIGHTS.items(), key=lambda x: -x[1]):
dim = dimensions[key]
if dim.recommendations:
impact = "HIGH" if weight >= 0.15 else ("MEDIUM" if weight >= 0.10 else "LOW")
for rec in dim.recommendations:
priority_fixes.append({
"dimension": dim.name,
"impact": impact,
"current_score": dim.score,
"fix": rec,
})
# Sort: HIGH first, then by lowest current score
impact_order = {"HIGH": 0, "MEDIUM": 1, "LOW": 2}
priority_fixes.sort(key=lambda x: (impact_order[x["impact"]], x["current_score"]))
# Benchmark comparison
bench = INDUSTRY_BENCHMARKS.get(industry, INDUSTRY_BENCHMARKS["general"])
benchmark_comparison = {
"industry": industry,
"your_score": round(overall, 1),
"industry_avg": bench["avg"],
"top_quartile": bench["top_quartile"],
"vs_avg": round(overall - bench["avg"], 1),
"vs_top": round(overall - bench["top_quartile"], 1),
}
return CROReport(
url=url,
overall_score=round(overall, 1),
letter_grade=compute_letter_grade(overall),
dimensions={k: asdict(v) for k, v in dimensions.items()},
priority_fixes=priority_fixes,
benchmark_comparison=benchmark_comparison,
)
# ---------------------------------------------------------------------------
# Output Formatters
# ---------------------------------------------------------------------------
def format_report_text(report: CROReport) -> str:
"""Format report as human-readable text."""
lines = []
lines.append("=" * 70)
lines.append(f" CRO AUDIT REPORT")
lines.append(f" {report.url}")
lines.append("=" * 70)
lines.append("")
if report.fetch_error:
lines.append(f" ❌ FETCH ERROR: {report.fetch_error}")
lines.append("")
return "\n".join(lines)
# Overall score
lines.append(f" OVERALL CRO SCORE: {report.overall_score}/100 ({report.letter_grade})")
lines.append("")
# Benchmark comparison
bc = report.benchmark_comparison
indicator = "" if bc["vs_avg"] >= 0 else ""
lines.append(f" Industry: {bc['industry'].upper()}")
lines.append(f" vs. Industry Avg ({bc['industry_avg']}): {indicator} {abs(bc['vs_avg'])} points")
top_ind = "" if bc["vs_top"] >= 0 else ""
lines.append(f" vs. Top Quartile ({bc['top_quartile']}): {top_ind} {abs(bc['vs_top'])} points")
lines.append("")
# Dimension scores
lines.append("-" * 70)
lines.append(" DIMENSION SCORES")
lines.append("-" * 70)
for key in DIMENSION_WEIGHTS:
dim = report.dimensions[key]
bar_filled = int(dim["score"] / 5)
bar = "" * bar_filled + "" * (20 - bar_filled)
lines.append(f" {dim['name']:<25} {bar} {dim['score']:>3}/100")
for finding in dim["findings"]:
lines.append(f"{finding}")
if dim["recommendations"]:
for rec in dim["recommendations"]:
lines.append(f" ⚠ FIX: {rec}")
lines.append("")
# Priority fixes
if report.priority_fixes:
lines.append("-" * 70)
lines.append(" PRIORITY FIXES (ranked by impact)")
lines.append("-" * 70)
for i, fix in enumerate(report.priority_fixes[:10], 1):
icon = {"HIGH": "🔴", "MEDIUM": "🟡", "LOW": "🟢"}[fix["impact"]]
lines.append(f" {i}. {icon} [{fix['impact']}] {fix['dimension']} (score: {fix['current_score']})")
lines.append(f"{fix['fix']}")
lines.append("")
lines.append("=" * 70)
return "\n".join(lines)
# ---------------------------------------------------------------------------
# Main
# ---------------------------------------------------------------------------
def audit_url(url: str, industry: str = "general") -> CROReport:
"""Audit a single URL and return the report."""
# Normalize URL
if not url.startswith(("http://", "https://")):
url = "https://" + url
html, error = fetch_page(url)
if error:
report = CROReport(url=url, fetch_error=error)
return report
return build_report(url, html, industry)
def main():
parser = argparse.ArgumentParser(
description="AI CRO Audit — Score landing pages across 8 conversion dimensions",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
python cro_audit.py --url https://example.com/landing-page
python cro_audit.py --urls https://a.com https://b.com --industry saas
python cro_audit.py --file urls.txt --json --output results.json
""",
)
group = parser.add_mutually_exclusive_group(required=True)
group.add_argument("--url", help="Single URL to audit")
group.add_argument("--urls", nargs="+", help="Multiple URLs to audit")
group.add_argument("--file", help="File with URLs (one per line)")
parser.add_argument(
"--industry",
choices=list(INDUSTRY_BENCHMARKS.keys()),
default="general",
help="Industry for benchmark comparison (default: general)",
)
parser.add_argument("--json", action="store_true", help="Output as JSON")
parser.add_argument("--output", help="Save report to file")
args = parser.parse_args()
# Collect URLs
urls = []
if args.url:
urls = [args.url]
elif args.urls:
urls = args.urls
elif args.file:
try:
with open(args.file) as f:
urls = [line.strip() for line in f if line.strip() and not line.startswith("#")]
except FileNotFoundError:
print(f"Error: File not found: {args.file}", file=sys.stderr)
sys.exit(1)
if not urls:
print("Error: No URLs provided", file=sys.stderr)
sys.exit(1)
# Run audits
reports = []
for url in urls:
print(f"Auditing: {url}...", file=sys.stderr)
report = audit_url(url, args.industry)
reports.append(report)
# Output
if args.json:
output = json.dumps(
[asdict(r) for r in reports] if len(reports) > 1 else asdict(reports[0]),
indent=2,
)
if args.output:
with open(args.output, "w") as f:
f.write(output)
print(f"Report saved to {args.output}", file=sys.stderr)
else:
print(output)
else:
text_output = "\n\n".join(format_report_text(r) for r in reports)
if args.output:
with open(args.output, "w") as f:
f.write(text_output)
print(f"Report saved to {args.output}", file=sys.stderr)
else:
print(text_output)
# Summary for batch mode
if len(reports) > 1:
print("\n" + "=" * 70, file=sys.stderr)
print(" BATCH SUMMARY", file=sys.stderr)
print("=" * 70, file=sys.stderr)
for r in sorted(reports, key=lambda x: x.overall_score, reverse=True):
status = "" if not r.fetch_error else ""
score = f"{r.overall_score} ({r.letter_grade})" if not r.fetch_error else "FAILED"
print(f" {status} {score:>12} {r.url}", file=sys.stderr)
if __name__ == "__main__":
main()

View file

@ -0,0 +1,6 @@
requests>=2.31.0
beautifulsoup4>=4.12.0
lxml>=4.9.0
scikit-learn>=1.3.0
pandas>=2.0.0
numpy>=1.24.0

View file

@ -0,0 +1,794 @@
#!/usr/bin/env python3
"""
Survey-to-Lead-Magnet Engine
==============================
Takes survey response data (CSV), segments respondents by pain point clusters,
ranks segments by size and commercial potential, and auto-generates lead magnet
briefs targeting each segment.
Usage:
python survey_lead_magnet.py --csv survey_responses.csv
python survey_lead_magnet.py --csv survey.csv --pain-columns "biggest_challenge" "top_frustration"
python survey_lead_magnet.py --csv survey.csv --top-segments 5 --json
python survey_lead_magnet.py --csv survey.csv --output lead_magnets.json
"""
import argparse
import csv
import json
import os
import re
import sys
from collections import Counter
from dataclasses import dataclass, field, asdict
from typing import Optional
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
# ---------------------------------------------------------------------------
# Constants
# ---------------------------------------------------------------------------
# Columns that likely contain pain point / challenge responses
PAIN_COLUMN_PATTERNS = re.compile(
r"(challenge|pain|frustrat|struggle|problem|difficult|obstacle|"
r"barrier|concern|issue|blocker|worry|fear|hard|tough|"
r"biggest|main|top|primary|key|major|worst)",
re.IGNORECASE,
)
# Words that signal commercial intent / buying readiness
COMMERCIAL_SIGNALS = re.compile(
r"\b(budget|cost|price|invest|spend|pay|afford|roi|revenue|"
r"software|tool|platform|solution|vendor|agency|consultant|"
r"hire|outsource|automate|scale|grow|implement|upgrade|"
r"need|want|looking for|searching|evaluating|considering)\b",
re.IGNORECASE,
)
# Lead magnet format heuristics
FORMAT_KEYWORDS = {
"guide": ["understand", "learn", "how", "why", "strategy", "approach", "framework", "concept", "complex"],
"checklist": ["process", "steps", "workflow", "setup", "launch", "implement", "execute", "routine", "daily"],
"template": ["create", "write", "build", "design", "plan", "proposal", "email", "message", "document"],
"calculator": ["cost", "budget", "roi", "numbers", "forecast", "estimate", "pricing", "revenue", "metrics"],
"swipe_file": ["examples", "inspiration", "copy", "ads", "headlines", "subject lines", "creative", "ideas"],
}
# Stopwords for clustering (extend sklearn's default)
EXTRA_STOPWORDS = [
"really", "just", "like", "thing", "things", "lot", "also",
"get", "getting", "got", "know", "dont", "don't", "can't",
"want", "need", "think", "feel", "make", "much", "many",
"very", "would", "could", "should", "way", "able",
"one", "two", "first", "new", "good", "bad", "hard",
"well", "time", "still", "even", "right", "going",
]
# ---------------------------------------------------------------------------
# Data Classes
# ---------------------------------------------------------------------------
@dataclass
class PainSegment:
segment_id: int
theme: str
top_keywords: list
respondent_count: int
respondent_pct: float
commercial_score: float # 0-100
sample_responses: list
representative_quotes: list
@dataclass
class LeadMagnetBrief:
segment_id: int
segment_theme: str
title: str
format: str # guide, checklist, template, calculator, swipe_file
hook: str
outline: list
target_cta: str
distribution_channel: str
viral_potential: int # 0-100
conversion_potential: int # 0-100
combined_score: float
implementation_notes: str
@dataclass
class AnalysisResult:
total_respondents: int
columns_analyzed: list
segments: list
lead_magnets: list
implementation_roadmap: list
# ---------------------------------------------------------------------------
# Data Ingestion
# ---------------------------------------------------------------------------
def load_survey_data(csv_path: str) -> pd.DataFrame:
"""Load survey CSV. Tries multiple encodings."""
for encoding in ["utf-8", "utf-8-sig", "latin-1", "cp1252"]:
try:
df = pd.read_csv(csv_path, encoding=encoding)
return df
except (UnicodeDecodeError, pd.errors.ParserError):
continue
raise ValueError(f"Could not read CSV file: {csv_path}")
def detect_pain_columns(df: pd.DataFrame) -> list:
"""Auto-detect columns that likely contain pain point / challenge data."""
pain_cols = []
for col in df.columns:
if PAIN_COLUMN_PATTERNS.search(col):
pain_cols.append(col)
# If no pattern matches, look for open-text columns (long average text)
if not pain_cols:
for col in df.columns:
if df[col].dtype == object:
avg_len = df[col].dropna().astype(str).str.len().mean()
if avg_len > 30: # likely free-text responses
pain_cols.append(col)
return pain_cols
def extract_responses(df: pd.DataFrame, pain_columns: list) -> list:
"""Extract and combine text responses from pain columns."""
responses = []
for _, row in df.iterrows():
parts = []
for col in pain_columns:
val = row.get(col)
if pd.notna(val) and str(val).strip():
parts.append(str(val).strip())
combined = " ".join(parts)
if combined:
responses.append(combined)
return responses
# ---------------------------------------------------------------------------
# Clustering
# ---------------------------------------------------------------------------
def preprocess_text(text: str) -> str:
"""Clean and normalize text for clustering."""
text = text.lower()
text = re.sub(r"[^a-z\s]", " ", text)
text = re.sub(r"\s+", " ", text).strip()
return text
def cluster_responses(responses: list, n_clusters: Optional[int] = None) -> tuple:
"""
Cluster responses using TF-IDF + KMeans.
Returns (labels, vectorizer, tfidf_matrix, n_clusters).
"""
if len(responses) < 5:
# Too few responses — treat as single cluster
return [0] * len(responses), None, None, 1
cleaned = [preprocess_text(r) for r in responses]
# Build TF-IDF matrix
stop_words = list(TfidfVectorizer(stop_words="english").get_stop_words()) + EXTRA_STOPWORDS
vectorizer = TfidfVectorizer(
max_features=500,
stop_words=stop_words,
min_df=2 if len(responses) > 20 else 1,
max_df=0.85,
ngram_range=(1, 2),
)
try:
tfidf_matrix = vectorizer.fit_transform(cleaned)
except ValueError:
# All responses too similar or empty after preprocessing
return [0] * len(responses), None, None, 1
# Auto-determine cluster count if not specified
if n_clusters is None:
max_k = min(10, len(responses) // 3, tfidf_matrix.shape[0] - 1)
max_k = max(2, max_k)
best_k = 3
best_score = -1
for k in range(2, max_k + 1):
try:
km = KMeans(n_clusters=k, random_state=42, n_init=10)
labels = km.fit_predict(tfidf_matrix)
score = silhouette_score(tfidf_matrix, labels)
if score > best_score:
best_score = score
best_k = k
except ValueError:
continue
n_clusters = best_k
km = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
labels = km.fit_predict(tfidf_matrix)
return labels, vectorizer, tfidf_matrix, n_clusters
def extract_cluster_keywords(
vectorizer: TfidfVectorizer,
tfidf_matrix,
labels: list,
cluster_id: int,
top_n: int = 8,
) -> list:
"""Get top keywords for a specific cluster."""
if vectorizer is None:
return ["general"]
mask = np.array(labels) == cluster_id
cluster_matrix = tfidf_matrix[mask]
if cluster_matrix.shape[0] == 0:
return []
mean_tfidf = cluster_matrix.mean(axis=0).A1
feature_names = vectorizer.get_feature_names_out()
top_indices = mean_tfidf.argsort()[-top_n:][::-1]
return [feature_names[i] for i in top_indices if mean_tfidf[i] > 0]
def generate_theme_label(keywords: list) -> str:
"""Generate a human-readable theme label from top keywords."""
if not keywords:
return "General Challenges"
# Take top 2-3 keywords and create a label
top = keywords[:3]
# Capitalize and join
theme = " & ".join(word.replace("_", " ").title() for word in top)
return theme
# ---------------------------------------------------------------------------
# Scoring
# ---------------------------------------------------------------------------
def score_commercial_potential(responses: list) -> float:
"""Score how commercially valuable a segment is (0-100)."""
if not responses:
return 0
total_signals = 0
for resp in responses:
matches = COMMERCIAL_SIGNALS.findall(resp)
total_signals += len(matches)
# Normalize: avg signals per response, scaled to 0-100
avg_signals = total_signals / len(responses)
score = min(100, avg_signals * 25) # 4+ avg signals = 100
return round(score, 1)
def recommend_format(keywords: list, responses: list) -> str:
"""Recommend the best lead magnet format based on pain cluster."""
combined_text = " ".join(keywords) + " " + " ".join(responses[:10])
combined_lower = combined_text.lower()
scores = {}
for fmt, trigger_words in FORMAT_KEYWORDS.items():
score = sum(1 for word in trigger_words if word in combined_lower)
scores[fmt] = score
best = max(scores, key=scores.get)
if scores[best] == 0:
return "guide" # default
return best
def score_viral_potential(title: str, fmt: str, segment_size_pct: float) -> int:
"""Score how likely a lead magnet is to be shared (0-100)."""
score = 30 # baseline
# Larger segments = more sharing potential
score += min(25, segment_size_pct * 1.5)
# Templates and checklists are more shareable
format_boost = {
"template": 15,
"checklist": 12,
"swipe_file": 18,
"calculator": 10,
"guide": 5,
}
score += format_boost.get(fmt, 0)
# Titles with numbers or specific outcomes
if re.search(r"\d+", title):
score += 10
if re.search(r"(ultimate|complete|definitive|proven|secret)", title, re.IGNORECASE):
score += 5
return min(100, int(score))
def score_conversion_potential(commercial_score: float, segment_size_pct: float, fmt: str) -> int:
"""Score how likely a lead magnet is to convert to leads/customers (0-100)."""
score = 20 # baseline
# Commercial intent is the strongest signal
score += commercial_score * 0.4
# Segment size matters but with diminishing returns
score += min(15, segment_size_pct * 0.8)
# Some formats convert better
conversion_boost = {
"calculator": 15,
"template": 12,
"checklist": 10,
"guide": 5,
"swipe_file": 8,
}
score += conversion_boost.get(fmt, 0)
return min(100, int(score))
# ---------------------------------------------------------------------------
# Lead Magnet Brief Generator
# ---------------------------------------------------------------------------
FORMAT_LABELS = {
"guide": "Comprehensive Guide",
"checklist": "Actionable Checklist",
"template": "Ready-to-Use Template",
"calculator": "Interactive Calculator",
"swipe_file": "Swipe File Collection",
}
def generate_title(theme: str, fmt: str, keywords: list) -> str:
"""Generate a lead magnet title."""
templates = {
"guide": [
f"The Complete Guide to {theme}",
f"How to Solve {theme}: A Step-by-Step Guide",
f"{theme} Mastery: Everything You Need to Know",
],
"checklist": [
f"The {theme} Checklist: {min(15, 5 + len(keywords))} Steps to Success",
f"Your {theme} Pre-Launch Checklist",
f"{theme}: The Essential Checklist",
],
"template": [
f"{theme} Template Pack: Copy, Customize, Launch",
f"The {theme} Template That Saves 10+ Hours/Week",
f"Plug-and-Play {theme} Templates",
],
"calculator": [
f"{theme} Calculator: Know Your Numbers in 5 Minutes",
f"The {theme} ROI Calculator",
f"Calculate Your {theme} Score",
],
"swipe_file": [
f"50+ {theme} Examples That Actually Work",
f"The {theme} Swipe File: Steal These Ideas",
f"Best-in-Class {theme} Examples (Curated Collection)",
],
}
options = templates.get(fmt, templates["guide"])
return options[0]
def generate_hook(theme: str, keywords: list, sample_responses: list) -> str:
"""Generate a compelling hook for the lead magnet."""
# Extract a pain point from sample responses for the hook
pain_phrase = ""
if sample_responses:
# Find the most representative short phrase
for resp in sample_responses[:5]:
if 20 < len(resp) < 150:
pain_phrase = resp
break
if pain_phrase:
return (
f"If you've ever thought \"{pain_phrase[:80]}{'...' if len(pain_phrase) > 80 else ''}\" "
f"— this is for you. We analyzed hundreds of responses and found the exact "
f"patterns that separate those who overcome {keywords[0] if keywords else 'this challenge'} "
f"from those who stay stuck."
)
else:
return (
f"Most teams waste months trying to figure out {theme.lower()} on their own. "
f"This resource distills proven strategies into actionable steps you can "
f"implement today."
)
def generate_outline(theme: str, fmt: str, keywords: list) -> list:
"""Generate a content outline for the lead magnet."""
sections = [f"Section 1: Why {theme} Matters Now (The Landscape)"]
if fmt == "guide":
sections.extend([
f"Section 2: The Core Framework for {keywords[0].title() if keywords else 'Success'}",
f"Section 3: Common Mistakes (And How to Avoid Them)",
f"Section 4: Step-by-Step Implementation Plan",
f"Section 5: Tools & Resources You'll Need",
f"Section 6: Case Studies — What Good Looks Like",
f"Section 7: Quick-Start Action Plan",
])
elif fmt == "checklist":
sections.extend([
f"Section 2: Pre-Work — What to Have Ready",
f"Section 3: Phase 1 — Foundation ({keywords[0].title() if keywords else 'Setup'})",
f"Section 4: Phase 2 — Execution ({keywords[1].title() if len(keywords) > 1 else 'Build'})",
f"Section 5: Phase 3 — Optimization & Measurement",
f"Section 6: Common Gotchas to Watch For",
])
elif fmt == "template":
sections.extend([
f"Section 2: How to Use This Template",
f"Section 3: Template A — {keywords[0].title() if keywords else 'Standard'} Version",
f"Section 4: Template B — Advanced Version",
f"Section 5: Customization Guide",
f"Section 6: Real Examples (Filled-In Templates)",
])
elif fmt == "calculator":
sections.extend([
f"Section 2: Key Metrics You Need to Track",
f"Section 3: Input Your Numbers",
f"Section 4: Understanding Your Results",
f"Section 5: Benchmarks — How You Compare",
f"Section 6: Action Steps Based on Your Score",
])
elif fmt == "swipe_file":
sections.extend([
f"Section 2: What Makes These Examples Work",
f"Section 3: Category A — {keywords[0].title() if keywords else 'Top Performers'}",
f"Section 4: Category B — {keywords[1].title() if len(keywords) > 1 else 'Rising Stars'}",
f"Section 5: How to Adapt These for Your Business",
f"Section 6: Blank Templates to Get Started",
])
return sections
def generate_cta(fmt: str, theme: str) -> str:
"""Generate the target CTA for the lead magnet landing page."""
ctas = {
"guide": f"Download the Free {theme} Guide",
"checklist": f"Get Your Free {theme} Checklist",
"template": f"Grab the Free {theme} Templates",
"calculator": f"Try the Free {theme} Calculator",
"swipe_file": f"Download {theme} Swipe File",
}
return ctas.get(fmt, f"Get Free {theme} Resource")
def recommend_distribution(fmt: str, segment_size_pct: float) -> str:
"""Recommend primary distribution channel."""
if segment_size_pct > 25:
return "Homepage popup + dedicated landing page + paid social"
elif segment_size_pct > 15:
return "Blog content upgrade + email nurture sequence"
elif segment_size_pct > 8:
return "Targeted blog posts + LinkedIn organic"
else:
return "Niche community posts + targeted email segment"
def build_lead_magnet_brief(segment: PainSegment) -> LeadMagnetBrief:
"""Generate a complete lead magnet brief for a pain segment."""
fmt = recommend_format(segment.top_keywords, segment.sample_responses)
title = generate_title(segment.theme, fmt, segment.top_keywords)
hook = generate_hook(segment.theme, segment.top_keywords, segment.sample_responses)
outline = generate_outline(segment.theme, fmt, segment.top_keywords)
cta = generate_cta(fmt, segment.theme)
channel = recommend_distribution(fmt, segment.respondent_pct)
viral = score_viral_potential(title, fmt, segment.respondent_pct)
conversion = score_conversion_potential(
segment.commercial_score, segment.respondent_pct, fmt,
)
combined = (viral * 0.4 + conversion * 0.6)
impl_notes = (
f"Target segment: {segment.respondent_count} respondents ({segment.respondent_pct:.1f}% of total). "
f"Commercial intent score: {segment.commercial_score}/100. "
f"Recommended format: {FORMAT_LABELS.get(fmt, fmt)}. "
f"Estimated production time: {'1-2 days' if fmt in ('checklist', 'template') else '3-5 days'}."
)
return LeadMagnetBrief(
segment_id=segment.segment_id,
segment_theme=segment.theme,
title=title,
format=FORMAT_LABELS.get(fmt, fmt),
hook=hook,
outline=outline,
target_cta=cta,
distribution_channel=channel,
viral_potential=viral,
conversion_potential=conversion,
combined_score=round(combined, 1),
implementation_notes=impl_notes,
)
# ---------------------------------------------------------------------------
# Analysis Pipeline
# ---------------------------------------------------------------------------
def analyze_survey(
csv_path: str,
pain_columns: Optional[list] = None,
top_segments: int = 5,
) -> AnalysisResult:
"""Full analysis pipeline: load → cluster → score → generate briefs."""
# Load data
df = load_survey_data(csv_path)
total_respondents = len(df)
# Detect or use specified pain columns
if pain_columns:
# Validate columns exist
missing = [c for c in pain_columns if c not in df.columns]
if missing:
# Try fuzzy match
actual_cols = []
for pc in pain_columns:
matches = [c for c in df.columns if pc.lower() in c.lower()]
if matches:
actual_cols.append(matches[0])
else:
raise ValueError(f"Column not found: '{pc}'. Available: {list(df.columns)}")
pain_columns = actual_cols
else:
pain_columns = detect_pain_columns(df)
if not pain_columns:
raise ValueError(
"Could not auto-detect pain point columns. "
"Use --pain-columns to specify which columns contain challenge/pain responses.\n"
f"Available columns: {list(df.columns)}"
)
print(f"Analyzing columns: {pain_columns}", file=sys.stderr)
# Extract responses
responses = extract_responses(df, pain_columns)
if not responses:
raise ValueError("No non-empty responses found in the specified columns")
print(f"Found {len(responses)} responses from {total_respondents} respondents", file=sys.stderr)
# Cluster
labels, vectorizer, tfidf_matrix, n_clusters = cluster_responses(
responses, n_clusters=min(top_segments, len(responses) // 2) if len(responses) < 30 else None,
)
# Build segments
segments = []
for cluster_id in range(n_clusters):
mask = [i for i, l in enumerate(labels) if l == cluster_id]
cluster_responses_list = [responses[i] for i in mask]
keywords = extract_cluster_keywords(vectorizer, tfidf_matrix, labels, cluster_id)
theme = generate_theme_label(keywords)
commercial = score_commercial_potential(cluster_responses_list)
# Pick representative quotes (medium length, most representative)
quotes = sorted(
cluster_responses_list,
key=lambda r: abs(len(r) - 80), # prefer ~80 char responses
)[:3]
segment = PainSegment(
segment_id=cluster_id + 1,
theme=theme,
top_keywords=keywords,
respondent_count=len(mask),
respondent_pct=round(len(mask) / len(responses) * 100, 1),
commercial_score=commercial,
sample_responses=cluster_responses_list[:5],
representative_quotes=quotes,
)
segments.append(segment)
# Sort by size × commercial score
segments.sort(key=lambda s: s.respondent_count * (s.commercial_score + 10), reverse=True)
# Limit to top N
segments = segments[:top_segments]
# Re-number after sorting
for i, seg in enumerate(segments):
seg.segment_id = i + 1
# Generate lead magnet briefs
lead_magnets = []
for seg in segments:
brief = build_lead_magnet_brief(seg)
lead_magnets.append(brief)
# Sort briefs by combined score
lead_magnets.sort(key=lambda b: b.combined_score, reverse=True)
# Implementation roadmap
roadmap = []
for i, lm in enumerate(lead_magnets, 1):
roadmap.append({
"priority": i,
"title": lm.title,
"format": lm.format,
"segment_size": f"{lm.segment_theme} ({segments[lm.segment_id - 1].respondent_pct:.1f}%)",
"combined_score": lm.combined_score,
"estimated_effort": "1-2 days" if "Checklist" in lm.format or "Template" in lm.format else "3-5 days",
})
return AnalysisResult(
total_respondents=total_respondents,
columns_analyzed=pain_columns,
segments=[asdict(s) for s in segments],
lead_magnets=[asdict(lm) for lm in lead_magnets],
implementation_roadmap=roadmap,
)
# ---------------------------------------------------------------------------
# Output Formatters
# ---------------------------------------------------------------------------
def format_analysis_text(result: AnalysisResult) -> str:
"""Format analysis as human-readable text."""
lines = []
lines.append("=" * 70)
lines.append(" SURVEY-TO-LEAD-MAGNET ANALYSIS")
lines.append("=" * 70)
lines.append("")
lines.append(f" Total respondents: {result.total_respondents}")
lines.append(f" Columns analyzed: {', '.join(result.columns_analyzed)}")
lines.append(f" Segments identified: {len(result.segments)}")
lines.append("")
# Segments
lines.append("-" * 70)
lines.append(" PAIN POINT SEGMENTS (ranked by opportunity)")
lines.append("-" * 70)
for seg in result.segments:
lines.append("")
lines.append(f" Segment #{seg['segment_id']}: {seg['theme']}")
lines.append(f" Respondents: {seg['respondent_count']} ({seg['respondent_pct']}%)")
lines.append(f" Commercial Score: {seg['commercial_score']}/100")
lines.append(f" Top Keywords: {', '.join(seg['top_keywords'][:5])}")
lines.append("")
lines.append(" Representative Quotes:")
for q in seg["representative_quotes"]:
lines.append(f" \"{q[:100]}{'...' if len(q) > 100 else ''}\"")
lines.append("")
# Lead Magnet Briefs
lines.append("=" * 70)
lines.append(" LEAD MAGNET BRIEFS (ranked by combined score)")
lines.append("=" * 70)
for lm in result.lead_magnets:
lines.append("")
lines.append(f" 📦 {lm['title']}")
lines.append(f" Format: {lm['format']}")
lines.append(f" Segment: {lm['segment_theme']}")
lines.append(f" Viral Potential: {lm['viral_potential']}/100 | Conversion Potential: {lm['conversion_potential']}/100")
lines.append(f" Combined Score: {lm['combined_score']}/100")
lines.append("")
lines.append(f" Hook: {lm['hook'][:200]}{'...' if len(lm['hook']) > 200 else ''}")
lines.append("")
lines.append(" Outline:")
for section in lm["outline"]:
lines.append(f"{section}")
lines.append("")
lines.append(f" CTA: {lm['target_cta']}")
lines.append(f" Distribution: {lm['distribution_channel']}")
lines.append(f" Notes: {lm['implementation_notes']}")
lines.append("")
lines.append(" " + "-" * 50)
# Roadmap
lines.append("")
lines.append("=" * 70)
lines.append(" IMPLEMENTATION ROADMAP")
lines.append("=" * 70)
lines.append("")
for item in result.implementation_roadmap:
lines.append(f" #{item['priority']} [{item['estimated_effort']}] {item['title']}")
lines.append(f" Format: {item['format']} | Segment: {item['segment_size']} | Score: {item['combined_score']}")
lines.append("")
lines.append("=" * 70)
return "\n".join(lines)
# ---------------------------------------------------------------------------
# Main
# ---------------------------------------------------------------------------
def main():
parser = argparse.ArgumentParser(
description="Survey-to-Lead-Magnet Engine — Turn survey data into targeted lead magnet briefs",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
python survey_lead_magnet.py --csv survey_responses.csv
python survey_lead_magnet.py --csv survey.csv --pain-columns "biggest_challenge" "frustrations"
python survey_lead_magnet.py --csv survey.csv --top-segments 3 --json --output briefs.json
CSV Format:
Questions as column headers, one respondent per row.
Works with exports from Typeform, Google Forms, SurveyMonkey, etc.
""",
)
parser.add_argument("--csv", required=True, help="Path to survey responses CSV")
parser.add_argument(
"--pain-columns", nargs="+",
help="Column names containing pain point / challenge responses (auto-detected if not specified)",
)
parser.add_argument(
"--top-segments", type=int, default=5,
help="Number of top segments to analyze (default: 5)",
)
parser.add_argument("--json", action="store_true", help="Output as JSON")
parser.add_argument("--output", help="Save output to file")
args = parser.parse_args()
if not os.path.exists(args.csv):
print(f"Error: File not found: {args.csv}", file=sys.stderr)
sys.exit(1)
try:
result = analyze_survey(
csv_path=args.csv,
pain_columns=args.pain_columns,
top_segments=args.top_segments,
)
except ValueError as e:
print(f"Error: {e}", file=sys.stderr)
sys.exit(1)
# Output
if args.json:
output = json.dumps(asdict(result), indent=2, default=str)
if args.output:
with open(args.output, "w") as f:
f.write(output)
print(f"Output saved to {args.output}", file=sys.stderr)
else:
print(output)
else:
text_output = format_analysis_text(result)
if args.output:
with open(args.output, "w") as f:
f.write(text_output)
print(f"Output saved to {args.output}", file=sys.stderr)
else:
print(text_output)
if __name__ == "__main__":
main()

5
podcast-ops/.env.example Normal file
View file

@ -0,0 +1,5 @@
# Required: OpenAI API key (used for Whisper transcription)
OPENAI_API_KEY=sk-...
# Required: Anthropic API key (used for content generation via Claude)
ANTHROPIC_API_KEY=sk-ant-...

162
podcast-ops/README.md Normal file
View file

@ -0,0 +1,162 @@
# AI Podcast Ops
**One podcast episode in, 15-20 content pieces out. Scored, deduplicated, and scheduled.**
Most podcast teams publish an episode and maybe pull one audiogram. This pipeline treats every episode as a content mine — extracting narrative arcs, quotable moments, controversial takes, data points, and stories, then generating platform-native content for every channel with viral scoring and deduplication.
## What's Inside
### 🎙️ Podcast-to-Everything Pipeline (`podcast_pipeline.py`)
End-to-end pipeline that ingests podcast episodes (via RSS feed or raw transcript) and produces a full cross-platform content calendar.
**Ingest modes:**
- RSS feed → auto-download + Whisper transcription
- Raw transcript file (text, SRT, VTT)
- Batch mode: process last N episodes from a feed
**Content generated per episode:**
- 3-5 short-form video clip suggestions (with timestamps + hooks)
- 2-3 Twitter/X thread outlines
- 1 LinkedIn article draft
- 1 newsletter section
- 3-5 quote cards (text overlays for social)
- 1 blog post outline with SEO keywords
- 1 YouTube Shorts/TikTok script
**Intelligence layer:**
- Editorial Brain: LLM-powered extraction of 7 content atom types
- Viral scoring: Novelty × Controversy × Utility (0-100)
- Dedup engine: semantic similarity check against last N days of output
- Calendar generator: auto-schedules by platform best practices
### 📋 SKILL.md
Claude Code skill file. Drop into your project and ask: *"Turn this podcast episode into a content calendar"* — it handles the rest.
## Quick Start
```bash
# 1. Install dependencies
pip install -r requirements.txt
# 2. Set up environment
cp .env.example .env
# Edit .env with your API keys (OPENAI_API_KEY, ANTHROPIC_API_KEY)
# 3. Process latest episode from your podcast RSS
python podcast_pipeline.py --rss "https://feeds.example.com/podcast.xml"
# 4. Or process a local transcript
python podcast_pipeline.py --transcript episode-42.txt
# 5. Batch process last 5 episodes
python podcast_pipeline.py --batch "https://feeds.example.com/podcast.xml" --episodes 5
# 6. Generate weekly content calendar
python podcast_pipeline.py --calendar
# 7. Only keep high-scoring content
python podcast_pipeline.py --rss "https://feeds.example.com/podcast.xml" --min-score 80
```
## Configuration
### Environment Variables
| Variable | Required | Description |
|----------|----------|-------------|
| `OPENAI_API_KEY` | Yes | OpenAI API key (Whisper transcription) |
| `ANTHROPIC_API_KEY` | Yes | Anthropic API key (content generation) |
| `OPENAI_LLM_KEY` | Optional | Separate OpenAI key for GPT-based generation |
### CLI Options
| Flag | Description | Default |
|------|-------------|---------|
| `--rss <url>` | Process latest episode from RSS feed | — |
| `--transcript <file>` | Process a local transcript file | — |
| `--batch <url>` | Batch process from RSS feed | — |
| `--episodes <n>` | Number of episodes for batch mode | 5 |
| `--calendar` | Generate weekly calendar from outputs | — |
| `--dedup-days <n>` | Days of history for dedup check | 30 |
| `--min-score <n>` | Minimum viral score to include | 0 |
| `--output-dir <path>` | Output directory | `./output` |
## Output Structure
```
output/
├── episodes/
│ ├── 2024-01-15-episode-title/
│ │ ├── transcript.txt # Clean transcript
│ │ ├── atoms.json # Extracted content atoms
│ │ ├── content_pieces.json # All generated content
│ │ └── calendar.json # Scheduled calendar
│ └── ...
├── calendar/
│ └── week-2024-W03.json # Aggregated weekly calendar
├── content_history.json # Dedup tracking (hashes + embeddings)
└── pipeline_log.json # Run history and performance stats
```
## How It Works
```
RSS Feed / Transcript
┌─────────────────┐
│ 1. INGEST │ Download audio → Whisper → clean transcript
│ │ OR read transcript file directly
└────────┬────────┘
┌─────────────────┐
│ 2. EXTRACT │ Editorial Brain: find narrative arcs, quotes,
│ │ controversial takes, data points, stories,
│ │ frameworks, predictions
└────────┬────────┘
┌─────────────────┐
│ 3. GENERATE │ For each atom → platform-native content:
│ │ clips, threads, articles, newsletter,
│ │ quote cards, blog outlines, short scripts
└────────┬────────┘
┌─────────────────┐
│ 4. SCORE │ Viral potential: novelty × controversy × utility
│ │ Filter below threshold
└────────┬────────┘
┌─────────────────┐
│ 5. DEDUP │ Semantic similarity vs last N days
│ │ Remove overlaps, flag near-dupes
└────────┬────────┘
┌─────────────────┐
│ 6. SCHEDULE │ Calendar generation with platform-specific
│ │ timing rules and content mix optimization
└─────────────────┘
```
## Viral Scoring
Every generated piece is scored on three dimensions:
| Dimension | Weight | What It Measures |
|-----------|--------|-----------------|
| Novelty | 40% | Is this new or surprising? |
| Controversy | 30% | Will people argue about this? |
| Utility | 30% | Can someone use this immediately? |
**Thresholds:** 80+ = priority publish, 60-79 = solid fill, 40-59 = gap filler, <40 = cut
## Integration with Other Skills
- **Content Ops / Expert Panel** — Run generated content through the expert panel for quality gating before publish
- **SEO Ops** — Feed blog outlines to the SEO pipeline for keyword validation
- **Outbound Engine** — Use podcast insights as personalization hooks in outbound sequences
- **Growth Engine** — A/B test different content formats from the same episode atoms

302
podcast-ops/SKILL.md Normal file
View file

@ -0,0 +1,302 @@
---
name: podcast-pipeline
description: >-
Podcast-to-Everything content pipeline. Takes a podcast RSS feed or raw
transcript and generates a full cross-platform content calendar: short-form
video clips, Twitter/X threads, LinkedIn articles, newsletter sections, quote
cards, blog outlines with SEO keywords, and YouTube Shorts/TikTok scripts.
Scores each piece by viral potential (novelty × controversy × utility) and
deduplicates against recent output. Use when asked to: "repurpose this podcast",
"turn this episode into content", "podcast content calendar", "extract clips
from this episode", "podcast to social", "content from RSS feed", "batch
process episodes", or any request to turn podcast/audio content into a
multi-platform content plan.
---
# Podcast-to-Everything Pipeline
Turns podcast episodes into a full content calendar across every platform.
One episode in, 15-20 content pieces out — scored, deduplicated, and scheduled.
---
## Step 1: Ingest — Get the Transcript
Determine the input source and obtain a clean transcript.
### Option A: RSS Feed (`--rss <url>`)
1. Fetch the RSS feed XML
2. Extract the latest episode's audio URL (or use `--episodes N` for batch)
3. Download the audio file
4. Transcribe via OpenAI Whisper API (with timestamps)
5. Store transcript with episode metadata (title, date, description, duration)
### Option B: Raw Transcript (`--transcript <file>`)
1. Read the transcript file (plain text, SRT, or VTT)
2. Parse timestamps if present
3. Extract episode metadata from filename or prompt user
### Option C: Batch Mode (`--batch <rss_url> --episodes N`)
1. Fetch RSS feed
2. Extract the last N episodes
3. Process each through the full pipeline
4. Deduplicate across all episodes in the batch
### Transcript cleanup
- Remove filler words (um, uh, like, you know) for written content
- Preserve original with timestamps for video clip suggestions
- Split into logical segments by topic shift
---
## Step 2: Editorial Brain — Deep Analysis
Feed the full transcript to the LLM with this extraction framework:
### Extract these content atoms:
1. **Narrative Arcs** — Complete story segments with setup → tension → resolution.
Tag with start/end timestamps.
2. **Quotable Moments** — Punchy, shareable statements. One-liners that stand alone.
Must pass the "would someone screenshot this?" test.
3. **Controversial Takes** — Opinions that go against conventional wisdom.
The stuff that makes people reply "hard disagree" or "finally someone said it."
4. **Data Points** — Specific numbers, percentages, dollar amounts, timeframes.
Concrete proof points that add credibility.
5. **Stories** — Personal anecdotes, case studies, client examples.
Must have a character, a problem, and an outcome.
6. **Frameworks** — Step-by-step processes, mental models, decision matrices.
Anything structured that people would save or bookmark.
7. **Predictions** — Forward-looking claims about trends, markets, technology.
Hot takes about where things are going.
### Output format per atom:
```
- Type: [narrative_arc | quote | controversial_take | data_point | story | framework | prediction]
- Content: [extracted text]
- Timestamp: [start - end, if available]
- Context: [what was being discussed]
- Viral Score: [0-100, see Step 4]
- Suggested platforms: [where this atom works best]
```
---
## Step 3: Content Generation — One Episode, Many Pieces
For each episode, generate ALL of these from the extracted atoms:
### 3a. Short-Form Video Clips (3-5 per episode)
```
- Hook: [First 3 seconds — pattern interrupt or bold claim]
- Clip segment: [Timestamp range from transcript]
- Caption overlay: [Text for the screen]
- Platform: [YouTube Shorts / TikTok / Instagram Reels]
- Why it works: [What makes this clippable]
```
Prioritize: controversial takes > stories with payoffs > surprising data points
### 3b. Twitter/X Threads (2-3 per episode)
```
- Thread hook (tweet 1): [Curiosity gap or bold opener]
- Thread body (5-10 tweets): [Each tweet is one complete thought]
- Thread closer: [CTA — follow, reply, retweet trigger]
- Source atoms: [Which content atoms feed this thread]
```
Rules: No tweet over 280 chars. Each tweet must stand alone. Use data points as proof.
### 3c. LinkedIn Article Draft (1 per episode)
```
- Headline: [Specific, benefit-driven]
- Hook paragraph: [Before the "see more" fold — must earn the click]
- Body: [3-5 sections with headers, 800-1200 words]
- CTA: [Engagement driver — question, not link]
- Hashtags: [3-5 relevant, not spammy]
```
Voice: Professional but not corporate. First-person. Story-driven.
### 3d. Newsletter Section (1 per episode)
```
- Section headline: [Scannable, specific]
- TL;DR: [One sentence, the core insight]
- Body: [3-5 bullet points, each with a takeaway]
- Pull quote: [The most shareable line from the episode]
- Link: [Back to full episode]
```
### 3e. Quote Cards (3-5 per episode)
```
- Quote text: [Max 20 words — must work as text overlay]
- Attribution: [Speaker name]
- Background suggestion: [Color/mood that matches the tone]
- Platform sizing: [1080x1080 for IG, 1200x675 for Twitter, 1080x1920 for Stories]
```
### 3f. Blog Post Outline (1 per episode)
```
- Title: [SEO-optimized, includes primary keyword]
- Primary keyword: [Search volume + difficulty estimate]
- Secondary keywords: [3-5 related terms]
- Meta description: [155 chars max]
- H2 sections: [5-7, each maps to a content atom]
- Internal linking opportunities: [Topics that connect to existing content]
- Estimated word count: [1500-2500]
```
### 3g. YouTube Shorts / TikTok Script (1 per episode)
```
- HOOK (0-3s): [Pattern interrupt — question, bold claim, or visual]
- SETUP (3-15s): [Context — why should they care]
- PAYOFF (15-45s): [The insight, data, or story resolution]
- CTA (45-60s): [Follow, comment prompt, or part 2 tease]
- On-screen text: [Key phrases to overlay]
- B-roll suggestions: [Visual ideas if not talking-head]
```
---
## Step 4: Content Scoring — Viral Potential
Score every generated piece on three dimensions (each 0-100):
| Dimension | What It Measures | Signals |
|-----------|-----------------|---------|
| **Novelty** | Is this new or surprising? | Contrarian takes, unexpected data, first-to-say |
| **Controversy** | Will people argue about this? | Strong opinions, challenges norms, picks a side |
| **Utility** | Can someone use this immediately? | Frameworks, how-tos, templates, specific numbers |
**Viral Score = (Novelty × 0.4) + (Controversy × 0.3) + (Utility × 0.3)**
### Score thresholds:
- **80+** → Priority publish. Schedule for peak engagement windows.
- **60-79** → Solid content. Fill the calendar.
- **40-59** → Filler. Use only if calendar has gaps.
- **Below 40** → Cut it. Not worth the publish slot.
---
## Step 5: Dedup Engine
Before finalizing, check all generated content against:
1. **This batch** — No two pieces should cover the same angle
2. **Recent history** — Compare against last N days of output (default: 30)
3. **Similarity threshold** — Flag any pair with >70% semantic overlap
### Dedup rules:
- If two pieces overlap >70%: keep the higher-scored one, cut the other
- If a piece overlaps with recently published content: flag with ⚠️ and suggest a differentiation angle
- Track all published content hashes in `output/content_history.json`
---
## Step 6: Calendar Generation (`--calendar`)
Assemble scored, deduplicated content into a weekly publish calendar.
### Scheduling rules:
- **Twitter/X:** 1-2 per day, peak hours (8-10am, 12-1pm, 5-7pm ET)
- **LinkedIn:** 1 per day max, Tuesday-Thursday mornings
- **YouTube Shorts/TikTok:** 1 per day, evenings
- **Newsletter:** Weekly, same day each week
- **Blog:** 1-2 per week
- **Quote cards:** Intersperse on low-content days
### Calendar output format:
```json
{
"week_of": "2024-01-15",
"episode_source": "Episode Title - Guest Name",
"content_pieces": [
{
"date": "2024-01-15",
"time": "09:00 ET",
"platform": "twitter",
"type": "thread",
"content": "...",
"viral_score": 85,
"status": "draft"
}
],
"total_pieces": 18,
"avg_viral_score": 72,
"coverage": {
"twitter": 6,
"linkedin": 3,
"youtube_shorts": 3,
"newsletter": 1,
"blog": 1,
"quote_cards": 4
}
}
```
---
## Step 7: Output
All output goes to `output/` directory:
```
output/
├── episodes/
│ ├── YYYY-MM-DD-episode-slug/
│ │ ├── transcript.txt
│ │ ├── atoms.json # Extracted content atoms
│ │ ├── content_pieces.json # All generated content
│ │ └── calendar.json # Scheduled calendar
│ └── ...
├── calendar/
│ └── week-YYYY-WNN.json # Aggregated weekly calendar
├── content_history.json # Dedup tracking
└── pipeline_log.json # Run history and stats
```
---
## CLI Reference
```bash
# Process latest episode from RSS feed
python podcast_pipeline.py --rss "https://feeds.example.com/podcast.xml"
# Process a local transcript
python podcast_pipeline.py --transcript episode-42.txt
# Batch process last 5 episodes
python podcast_pipeline.py --batch "https://feeds.example.com/podcast.xml" --episodes 5
# Generate weekly calendar from existing outputs
python podcast_pipeline.py --calendar
# Process with custom dedup window
python podcast_pipeline.py --rss "https://feeds.example.com/podcast.xml" --dedup-days 60
# Process and only keep 80+ viral score content
python podcast_pipeline.py --rss "https://feeds.example.com/podcast.xml" --min-score 80
```
---
## Environment Variables
| Variable | Required | Description |
|----------|----------|-------------|
| `OPENAI_API_KEY` | Yes (for Whisper) | OpenAI API key for audio transcription |
| `ANTHROPIC_API_KEY` | Yes (for generation) | Anthropic API key for content generation |
| `OPENAI_LLM_KEY` | Optional | Separate OpenAI key if using GPT for generation instead |
---
## Reference Files
| File | Purpose |
|------|---------|
| `podcast_pipeline.py` | Main pipeline script |
| `requirements.txt` | Python dependencies |
| `README.md` | Setup and usage guide |

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,7 @@
anthropic>=0.40.0
openai>=1.50.0
feedparser>=6.0.0
requests>=2.31.0
python-dateutil>=2.8.0
python-slugify>=8.0.0
tqdm>=4.66.0

View file

@ -0,0 +1,329 @@
# 📊 AI Revenue Intelligence
> **Prove content ROI, extract call intelligence, and generate client reports — automatically.**
An AI-powered revenue intelligence suite that connects the dots between sales calls, content performance, and closed deals. These tools pull from Gong, GA4, HubSpot, and Ahrefs to answer the questions every marketing team hates: "What content actually drove revenue?" and "What are prospects really saying on calls?"
Built in production at [Single Grain](https://www.singlegrain.com/?utm_source=github&utm_medium=skill_repo&utm_campaign=ai_marketing_skills). Now open-sourced for any revenue-focused marketing team.
---
## Architecture
```
┌─────────────────────────────────────────────────────────────────────┐
│ DATA SOURCES │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │
│ │ Gong │ │ GA4 │ │ HubSpot │ │ Ahrefs │ │
│ │ (calls) │ │(traffic) │ │ (deals) │ │ (SEO) │ │
│ └─────┬────┘ └─────┬────┘ └─────┬────┘ └────────┬─────────┘ │
└────────┼──────────────┼─────────────┼────────────────┼─────────────┘
│ │ │ │
▼ │ │ │
┌──────────────────┐ │ │ │
│ Gong-to-Insight │ │ │ │
│ Pipeline │ │ │ │
│ │ │ │ │
│ • Objections │ │ │ │
│ • Buying signals │ │ │ │
│ • Competitors │ │ │ │
│ • Content topics │ │ │ │
│ • Follow-ups │ │ │ │
└──────┬───────────┘ │ │ │
│ ▼ ▼ │
│ ┌───────────────────────────┐ │
│ │ Revenue Attribution │ │
│ │ Mapper │ │
│ │ │ │
│ │ • First-touch / linear / │ │
│ │ time-decay attribution │ │
│ │ • Content ROI by type │ │
│ │ • CPA calculations │ │
│ │ • Content gap analysis │ │
│ └───────────┬───────────────┘ │
│ │ │
▼ ▼ ▼
┌──────────────────────────────────────────────────────────────┐
│ Client Report Generator │
│ │
│ Executive Summary + Traffic + Pipeline + SEO + Call Quality │
│ Anomaly Detection + Period-over-Period Comparison │
│ → Markdown or JSON output │
└──────────────────────────────────────────────────────────────┘
```
---
## Tools
### 1. 🎙️ Gong-to-Insight Pipeline (`gong_insight_pipeline.py`)
Turns sales call transcripts into structured intelligence. Works with the Gong API or plain `.txt` transcript files.
**What it extracts:**
- **Objections** — categorized as pricing, timing, competition, authority, or need
- **Buying signals** — budget confirmed, timeline mentioned, decision maker engaged, champion identified
- **Competitive mentions** — which competitors were named and in what sentiment (positive/negative/neutral)
- **Pricing discussions** — dollar amounts, pricing model questions, ROI concerns
- **Content topics** — recurring objection patterns that should become blog posts, case studies, or battle cards
- **Follow-up drafts** — personalized outbound suggestions based on what happened on the call
```bash
# Analyze a transcript file
python gong_insight_pipeline.py --file transcript.txt
# Analyze a directory of transcripts
python gong_insight_pipeline.py --dir ./transcripts/ --content-topics
# Pull from Gong API (last 7 days)
python gong_insight_pipeline.py --gong --days 7
# Full output with follow-ups
python gong_insight_pipeline.py --file call.txt --follow-ups --output insights.json
# Example output:
# ============================================================
# Call: discovery-call-acme
# Temperature: WARM
# ============================================================
#
# 🚫 Objections (3):
# pricing: 2
# timing: 1
# → [pricing] "That's a bit more than we budgeted for this quarter"
# → [pricing] "Can you do a smaller pilot first?"
# → [timing] "We're in the middle of a platform migration"
#
# ✅ Buying Signals (2):
# budget_confirmed: 1
# champion_identified: 1
#
# ⚔️ Competitors: HubSpot, Drift
#
# 💰 Pricing discussed: Yes (3 mentions)
```
### 2. 💰 Revenue Attribution Mapper (`revenue_attribution.py`)
The "prove content ROI" tool. Maps blog posts, videos, podcasts, and webinars to actual closed deals using first-touch, linear, or time-decay attribution models.
**What it produces:**
- Content-to-revenue mapping showing exactly which pieces drove pipeline
- Attribution across three models (pick the one that fits your sales motion)
- Cost-per-acquisition by content type (blog vs. video vs. webinar vs. podcast)
- Content gap analysis (which funnel stages have no content working?)
- Top performers ranked by attributed revenue
```bash
# Full attribution report (linear model)
python revenue_attribution.py --report
# Time-decay model (more credit to recent touchpoints)
python revenue_attribution.py --report --model time-decay
# Content gaps (which funnel stages are uncovered?)
python revenue_attribution.py --gaps
# CPA by content type
python revenue_attribution.py --cpa --costs content_costs.json
# Example output:
# ======================================================================
# CONTENT REVENUE ATTRIBUTION REPORT
# Model: linear
# ======================================================================
#
# 📊 Summary
# Total Revenue: $984,000
# Total Deals: 5
# Avg Deal Size: $196,800
# Content w/ Attribution: 13
# Avg Touchpoints/Deal: 4.4
#
# 📈 Revenue by Content Type
# Type Revenue Sessions Pieces Avg/Piece
# --------------------------------------------------------
# landing_page $211,200 1,800 1 $211,200
# blog $298,560 17,000 6 $49,760
# case_study $156,000 2,090 2 $78,000
# ...
```
### 3. 📋 Multi-Source Client Report Generator (`client_report_generator.py`)
Pulls from all four data sources (GA4, HubSpot, Ahrefs, Gong) and generates a unified, client-ready BI report with an auto-generated executive summary and optional anomaly detection.
**What it includes:**
- **Executive summary** — auto-generated highlights, concerns, and recommendations
- **Traffic** — sessions, users, conversions, channel breakdown, top pages (GA4)
- **Pipeline** — deals created/won/lost, revenue, win rate, avg cycle (HubSpot)
- **SEO** — domain rating, rankings, backlinks, organic traffic (Ahrefs)
- **Call quality** — talk ratio, call duration, next-steps rate, top topics (Gong)
- **Anomaly detection** — flags unusual changes with severity levels
- **Period comparison** — month-over-month, quarter-over-quarter, or year-over-year
```bash
# Console summary
python client_report_generator.py --client "Acme Corp"
# Full markdown report
python client_report_generator.py --client "Acme Corp" --format markdown --output report.md
# JSON for dashboards/slides
python client_report_generator.py --client "Acme Corp" --format json --anomalies
# Skip sources you don't use
python client_report_generator.py --client "Acme Corp" --skip gong,ahrefs
# Example output:
# ======================================================================
# Acme Corp - Performance Report
# 2025-03-01 to 2025-03-31
# ======================================================================
#
# 🟢 Overall: Strong
#
# ✅ Highlights:
# • Traffic up 8.1% (45,200 sessions)
# • Conversions up 14.8% (342 total)
# • Win rate at 60.0% (12 won)
# • $1,440,000 revenue closed
#
# ⚠️ Concerns:
# • Reps talking too much (54.2% talk ratio)
```
---
## Quick Start
### 1. Install dependencies
```bash
pip install -r requirements.txt
```
### 2. Configure environment
```bash
cp .env.example .env
# Edit .env with your API keys
```
### 3. Test with sample data
All tools ship with built-in sample data and fall back gracefully when API keys aren't configured. Try them out of the box:
```bash
# Analyze a transcript
echo "Prospect: That's more than we budgeted for this quarter.
Rep: I understand. What range were you expecting?
Prospect: We were looking at HubSpot too, they quoted us around 50k.
Rep: Makes sense. Our ROI calculator shows 3x return in year one." > sample.txt
python gong_insight_pipeline.py --file sample.txt --follow-ups
# Run attribution report (uses sample data without API keys)
python revenue_attribution.py --report --gaps
# Generate client report (uses sample data without API keys)
python client_report_generator.py --client "Demo Corp" --anomalies
```
### 4. Connect real APIs
Set these environment variables to connect live data:
```bash
# Gong
export GONG_API_KEY="your-gong-api-key"
# GA4
export GA4_PROPERTY_ID="123456789"
export GA4_CREDENTIALS_JSON="/path/to/service-account.json"
# HubSpot
export HUBSPOT_API_KEY="your-hubspot-private-app-token"
# Ahrefs
export AHREFS_TOKEN="your-ahrefs-api-token"
```
---
## Configuration
| Variable | Required By | Description |
|----------|-------------|-------------|
| `GONG_API_KEY` | Gong Pipeline, Client Report | Gong API access key |
| `GONG_API_BASE_URL` | Gong Pipeline, Client Report | Gong API URL (default: `https://api.gong.io/v2`) |
| `GA4_PROPERTY_ID` | Attribution, Client Report | GA4 property ID |
| `GA4_CREDENTIALS_JSON` | Attribution, Client Report | Path to GA4 service account JSON |
| `HUBSPOT_API_KEY` | Attribution, Client Report | HubSpot private app token |
| `AHREFS_TOKEN` | Client Report | Ahrefs API token |
| `YOUR_DOMAIN` | Client Report | Your root domain for Ahrefs data |
| `OUTPUT_DIR` | All | Output directory (default: `./output`) |
---
## Customization
### Objection Patterns
Edit `OBJECTION_PATTERNS` in `gong_insight_pipeline.py` to match your industry's objection language.
### Competitor List
Edit `KNOWN_COMPETITORS` in `gong_insight_pipeline.py` with your actual competitive landscape.
### Content Type Classification
Edit `CONTENT_TYPE_PATTERNS` in `revenue_attribution.py` to match your site's URL structure.
### Anomaly Thresholds
Pass custom thresholds to `detect_anomalies()` in `client_report_generator.py`:
```python
thresholds = {"warning": 0.15, "critical": 0.30} # 15% = warning, 30% = critical
```
---
## How They Work Together
1. **Weekly**: Run `gong_insight_pipeline.py` on recent calls → extract objections and buying signals
2. **Monthly**: Run `revenue_attribution.py` → see which content drove deals
3. **Monthly**: Run `client_report_generator.py` → deliver unified report to clients or leadership
4. **Quarterly**: Use Gong content topics + attribution gaps to plan next quarter's content
The insight loop:
- Gong reveals what prospects ask about → creates content topics
- Content gets published → drives traffic (GA4)
- Traffic converts to pipeline → deals close (HubSpot)
- Attribution mapper proves which content worked → invest more in winners
- Repeat
---
## File Structure
```
revenue-intelligence/
├── README.md # This file
├── SKILL.md # Claude Code agent skill definition
├── requirements.txt # Python dependencies
├── gong_insight_pipeline.py # Call transcript → structured insights
├── revenue_attribution.py # Content → revenue mapping
└── client_report_generator.py # Multi-source client BI reports
```
---
<div align="center">
**🧠 [Want these built and managed for you? →](https://singlebrain.com/?utm_source=github&utm_medium=skill_repo&utm_campaign=ai_marketing_skills)**
*This is how we build agents at [Single Brain](https://singlebrain.com/?utm_source=github&utm_medium=skill_repo&utm_campaign=ai_marketing_skills) for our clients.*
[Single Grain](https://www.singlegrain.com/?utm_source=github&utm_medium=skill_repo&utm_campaign=ai_marketing_skills) · our marketing agency
📬 **[Level up your marketing with 14,000+ marketers and founders →](https://levelingup.beehiiv.com/subscribe)** *(free)*
</div>

View file

@ -0,0 +1,172 @@
# AI Revenue Intelligence
AI-powered revenue intelligence: sales call insight extraction, content-to-revenue attribution, and multi-source client reporting.
## When to Use
- User wants to extract insights from Gong sales call transcripts
- User needs to identify objections, buying signals, or competitive mentions in calls
- User wants to prove content ROI by mapping content to closed deals
- User needs revenue attribution across first-touch and multi-touch models
- User wants to generate a unified client report from GA4 + HubSpot + Ahrefs + Gong
- User asks about content gaps in the buyer journey
- User needs anomaly detection across marketing metrics
## Tools
### Gong-to-Insight Pipeline (`gong_insight_pipeline.py`)
Extracts structured intelligence from sales call transcripts. Works with Gong API or plain transcript files.
```bash
# Analyze a single transcript file
python gong_insight_pipeline.py --file transcript.txt
# Analyze multiple transcript files
python gong_insight_pipeline.py --dir ./transcripts/
# Pull recent calls from Gong API (last 7 days)
python gong_insight_pipeline.py --gong --days 7
# Pull specific call by ID
python gong_insight_pipeline.py --gong --call-id abc123
# Output as JSON file
python gong_insight_pipeline.py --file transcript.txt --output insights.json
# Generate content topics from recurring objections
python gong_insight_pipeline.py --dir ./transcripts/ --content-topics
# Generate follow-up suggestions for outbound sequences
python gong_insight_pipeline.py --file transcript.txt --follow-ups
```
**What it extracts:**
- Objections (categorized: pricing, timing, competition, authority, need)
- Buying signals (budget confirmed, timeline mentioned, decision maker engaged, champion identified)
- Competitive mentions (who was mentioned, context: positive/negative/neutral)
- Pricing discussions (anchors, pushback, willingness indicators)
- Content topic suggestions from recurring objection patterns
- Personalized follow-up drafts based on call context
**Output:** Structured JSON to stdout or file. Each call produces an `insights` object with `objections`, `buying_signals`, `competitive_mentions`, `pricing_discussions`, `content_topics`, and `follow_ups` arrays.
### Revenue Attribution Mapper (`revenue_attribution.py`)
Maps content pieces to pipeline and closed revenue. Proves content ROI with first-touch and multi-touch attribution.
```bash
# Run full attribution report (GA4 + HubSpot)
python revenue_attribution.py --report
# First-touch attribution only
python revenue_attribution.py --report --model first-touch
# Multi-touch (linear) attribution
python revenue_attribution.py --report --model linear
# Time-decay attribution
python revenue_attribution.py --report --model time-decay
# Filter by date range
python revenue_attribution.py --report --start 2025-01-01 --end 2025-03-31
# Calculate cost-per-acquisition by content type
python revenue_attribution.py --cpa --costs content_costs.json
# Identify content gaps in the buyer journey
python revenue_attribution.py --gaps
# Output as JSON
python revenue_attribution.py --report --json --output attribution.json
```
**What it produces:**
- Content-to-revenue mapping (which blog posts, videos, podcasts drove deals)
- First-touch, linear, and time-decay attribution models
- Cost-per-acquisition by content type (blog, video, podcast, webinar)
- Content ROI report with revenue per piece
- Content gap analysis (funnel stages with no attribution)
- Top-performing content ranked by attributed revenue
**Data sources:** GA4 (page paths, sessions, conversions) + HubSpot (deals, touchpoints, close dates)
### Multi-Source Client Report Generator (`client_report_generator.py`)
Generates unified client-ready BI reports from GA4, HubSpot, Ahrefs, and Gong.
```bash
# Generate full client report
python client_report_generator.py --client "Acme Corp"
# Specify date range
python client_report_generator.py --client "Acme Corp" --start 2025-03-01 --end 2025-03-31
# Output as markdown
python client_report_generator.py --client "Acme Corp" --format markdown --output report.md
# Output as JSON (for rendering in slides/dashboards)
python client_report_generator.py --client "Acme Corp" --format json --output report.json
# Skip specific data sources
python client_report_generator.py --client "Acme Corp" --skip gong
python client_report_generator.py --client "Acme Corp" --skip ahrefs,gong
# Enable anomaly detection
python client_report_generator.py --client "Acme Corp" --anomalies
# Compare to previous period
python client_report_generator.py --client "Acme Corp" --compare previous-month
```
**What it produces:**
- Executive summary with key metrics and period-over-period changes
- Traffic section: sessions, users, top pages, channel breakdown (GA4)
- Pipeline section: deals created, moved, closed, revenue (HubSpot)
- SEO section: keyword rankings, backlinks, domain rating changes (Ahrefs)
- Call quality section: talk ratios, objection frequency, win rates (Gong)
- Anomaly flags: unusual spikes/drops with severity and context
- Output as structured markdown or JSON
## Configuration
All scripts read from environment variables. Copy `.env.example` to `.env` and fill in your values.
### Required Environment Variables
| Variable | Used By | Description |
|----------|---------|-------------|
| `GONG_API_KEY` | Gong Pipeline, Client Report | Gong API access key |
| `GONG_API_BASE_URL` | Gong Pipeline, Client Report | Gong API base URL |
| `HUBSPOT_API_KEY` | Attribution, Client Report | HubSpot private app token |
| `GA4_PROPERTY_ID` | Attribution, Client Report | GA4 property ID |
| `GA4_CREDENTIALS_JSON` | Attribution, Client Report | Path to GA4 service account JSON |
### Optional Environment Variables
| Variable | Used By | Description |
|----------|---------|-------------|
| `AHREFS_TOKEN` | Client Report | Ahrefs API token |
| `OUTPUT_DIR` | All | Directory for output files (default: `./output`) |
## Data Flow
```
Gong Transcripts → Insight Pipeline → Objections, Signals, Competitors → Content Topics + Follow-ups
GA4 + HubSpot → Attribution Mapper → Content ROI, CPA, Gap Analysis → Revenue Proof
GA4 + HubSpot + Ahrefs + Gong → Client Report → Executive Summary + Anomalies → Client Deliverable
```
## Recommended Workflow
1. **Weekly:** Run `gong_insight_pipeline.py --gong --days 7` to extract call intelligence
2. **Monthly:** Run `revenue_attribution.py --report` to prove content ROI
3. **Monthly:** Run `client_report_generator.py` for each client deliverable
4. **Quarterly:** Run `revenue_attribution.py --gaps` to find content gaps
5. **Ongoing:** Feed Gong insight follow-ups into outbound sequences
## Dependencies
```bash
pip install -r requirements.txt
```

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,705 @@
#!/usr/bin/env python3
"""
Gong-to-Insight Pipeline
Extracts structured intelligence from sales call transcripts:
- Objections (pricing, timing, competition, authority, need)
- Buying signals (budget, timeline, decision maker, champion)
- Competitive mentions (who, context)
- Pricing discussions
- Content topic suggestions from recurring patterns
- Personalized follow-up drafts
Works with Gong API or plain transcript files.
Usage:
python gong_insight_pipeline.py --file transcript.txt
python gong_insight_pipeline.py --dir ./transcripts/
python gong_insight_pipeline.py --gong --days 7
python gong_insight_pipeline.py --file transcript.txt --content-topics --follow-ups
"""
import argparse
import json
import os
import re
import sys
from collections import Counter, defaultdict
from datetime import datetime, timedelta
from pathlib import Path
from typing import Optional
# ---------------------------------------------------------------------------
# Gong API client
# ---------------------------------------------------------------------------
# To use the Gong API:
# 1. Set GONG_API_KEY (your Gong access key)
# 2. Set GONG_API_BASE_URL (default: https://api.gong.io/v2)
# 3. Generate API credentials in Gong > Settings > API
GONG_API_KEY = os.environ.get("GONG_API_KEY", "")
GONG_API_BASE_URL = os.environ.get("GONG_API_BASE_URL", "https://api.gong.io/v2")
def _gong_headers() -> dict:
"""Build authorization headers for Gong API."""
if not GONG_API_KEY:
print("ERROR: GONG_API_KEY not set. Export it or pass --file/--dir instead.", file=sys.stderr)
sys.exit(1)
return {
"Authorization": f"Bearer {GONG_API_KEY}",
"Content-Type": "application/json",
}
def fetch_calls_from_gong(days: int = 7, call_id: Optional[str] = None) -> list[dict]:
"""
Fetch call transcripts from Gong API.
Returns list of dicts: [{"id": ..., "title": ..., "transcript": ..., "participants": [...]}]
NOTE: This uses the Gong v2 API. You need:
- API credentials with 'api:calls:read:transcript' scope
- Calls must be processed (transcription complete)
"""
try:
import requests
except ImportError:
print("ERROR: 'requests' required for Gong API. Run: pip install requests", file=sys.stderr)
sys.exit(1)
headers = _gong_headers()
calls = []
if call_id:
# Fetch a specific call
# Step 1: Get call metadata
resp = requests.get(f"{GONG_API_BASE_URL}/calls/{call_id}", headers=headers)
resp.raise_for_status()
call_data = resp.json()
# Step 2: Get transcript
transcript_resp = requests.post(
f"{GONG_API_BASE_URL}/calls/transcript",
headers=headers,
json={"filter": {"callIds": [call_id]}},
)
transcript_resp.raise_for_status()
transcript_data = transcript_resp.json()
transcript_text = _assemble_transcript(transcript_data.get("callTranscripts", []))
calls.append({
"id": call_id,
"title": call_data.get("metaData", {}).get("title", "Unknown"),
"transcript": transcript_text,
"participants": [p.get("name", "") for p in call_data.get("parties", [])],
})
else:
# Fetch recent calls
from_dt = (datetime.utcnow() - timedelta(days=days)).strftime("%Y-%m-%dT%H:%M:%SZ")
to_dt = datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%SZ")
# Step 1: List calls in date range
list_resp = requests.post(
f"{GONG_API_BASE_URL}/calls",
headers=headers,
json={"filter": {"fromDateTime": from_dt, "toDateTime": to_dt}},
)
list_resp.raise_for_status()
call_list = list_resp.json().get("calls", [])
if not call_list:
print(f"No calls found in the last {days} days.", file=sys.stderr)
return []
call_ids = [c["id"] for c in call_list]
# Step 2: Batch fetch transcripts (Gong supports up to 100 per request)
for batch_start in range(0, len(call_ids), 100):
batch = call_ids[batch_start : batch_start + 100]
transcript_resp = requests.post(
f"{GONG_API_BASE_URL}/calls/transcript",
headers=headers,
json={"filter": {"callIds": batch}},
)
transcript_resp.raise_for_status()
transcripts_by_id = {}
for ct in transcript_resp.json().get("callTranscripts", []):
cid = ct.get("callId")
text = "\n".join(
f"{s.get('speakerName', 'Unknown')}: {' '.join(sent.get('text', '') for sent in s.get('sentences', []))}"
for s in ct.get("transcript", [])
)
transcripts_by_id[cid] = text
for c in call_list:
if c["id"] in transcripts_by_id:
calls.append({
"id": c["id"],
"title": c.get("title", "Unknown"),
"transcript": transcripts_by_id[c["id"]],
"participants": [p.get("name", "") for p in c.get("parties", [])],
})
return calls
def _assemble_transcript(call_transcripts: list) -> str:
"""Assemble transcript text from Gong API response format."""
lines = []
for ct in call_transcripts:
for segment in ct.get("transcript", []):
speaker = segment.get("speakerName", "Unknown")
text = " ".join(s.get("text", "") for s in segment.get("sentences", []))
lines.append(f"{speaker}: {text}")
return "\n".join(lines)
# ---------------------------------------------------------------------------
# Transcript analysis engine
# ---------------------------------------------------------------------------
# Objection patterns — maps regex patterns to objection categories
OBJECTION_PATTERNS = {
"pricing": [
r"(?i)(too expensive|over budget|can't afford|cost(s)? too|cheaper|lower price|discount|pricing is|budget.*tight|price.*high|expensive)",
r"(?i)(what('s| is) the (price|cost|pricing)|how much (does|will|would)|investment.*significant)",
r"(?i)(need to.*justify.*cost|hard to.*justify|roi.*unclear|not sure.*worth)",
],
"timing": [
r"(?i)(not the right time|bad timing|next quarter|next year|revisit.*later|too soon|not ready|circle back|table this)",
r"(?i)(busy.*right now|other priorities|roadmap.*full|backlog|bandwidth|tied up)",
r"(?i)(maybe (in|after) (q[1-4]|january|february|march|april|may|june|july|august|september|october|november|december))",
],
"competition": [
r"(?i)(already (using|working with|have)|current (vendor|provider|partner|agency)|locked in|contract.*with|compared to|vs\.?\s)",
r"(?i)(what makes you different|why.*switch|competitor|alternative|other option|looking at.*other)",
],
"authority": [
r"(?i)(need to (talk to|run.*by|check with|get approval|ask) (my|the|our))",
r"(?i)(not my (decision|call)|someone else|boss|manager|board|committee|stakeholder.*approve)",
r"(?i)(decision.*committee|buying committee|multiple stakeholders|procurement)",
],
"need": [
r"(?i)(don't (need|see the need|think we need)|not a priority|we're (fine|good|okay) (with|as)|status quo)",
r"(?i)(what problem.*solve|why would we|not sure.*fit|doesn't apply|not relevant)",
r"(?i)(happy with.*current|no pain|working well enough)",
],
}
# Buying signal patterns
BUYING_SIGNAL_PATTERNS = {
"budget_confirmed": [
r"(?i)(budget.*approved|have.*budget|allocated.*budget|budget (is|of) \$|earmarked|set aside.*for)",
r"(?i)(can.*invest|willing to (spend|invest|pay)|comfortable with.*price)",
],
"timeline_mentioned": [
r"(?i)(want.*by (q[1-4]|end of|january|february|march|april|may|june|july|august|september|october|november|december))",
r"(?i)(need.*live by|launch.*by|deadline|go.?live|start (date|asap|immediately|next week|this month))",
r"(?i)(sooner.*better|asap|urgent|time.?sensitive|quickly)",
],
"decision_maker_engaged": [
r"(?i)(ceo|cmo|cfo|cto|vp|vice president|chief|director|head of|svp|evp).*(?:join|call|meeting|asked me)",
r"(?i)(brought.*my (boss|manager|ceo|cmo)|loop(ed|ing) in|invited.*leadership)",
r"(?i)(decision maker|final say|sign.*off|authorize)",
],
"champion_identified": [
r"(?i)(love (this|it|what)|really (like|impressed|excited)|sold on|big fan|advocate)",
r"(?i)(push.*internally|sell.*internally|convince.*team|champion|sponsor|rally|get.*buy.?in)",
r"(?i)(exactly what we need|this solves|perfect fit|game.?changer)",
],
"next_steps_agreed": [
r"(?i)(next step|follow.?up|send.*proposal|schedule.*demo|set up.*call|let's (do|move|proceed))",
r"(?i)(send.*contract|nda|msa|sow|statement of work|proposal|agreement)",
],
}
# Competitive mention patterns — extend with your actual competitors
KNOWN_COMPETITORS = [
# Add your competitors here. These are common B2B marketing/agency competitors as examples.
"HubSpot", "Marketo", "Salesforce", "Drift", "6sense", "Demandbase",
"ZoomInfo", "Apollo", "Outreach", "Salesloft", "Gartner", "Forrester",
"WebFX", "Wpromote", "Tinuiti", "Power Digital", "Directive",
]
PRICING_DISCUSSION_PATTERNS = [
r"(?i)\$[\d,]+(\.\d{2})?(\s*(k|K|thousand|million|per month|/mo|/month|annually|per year))?",
r"(?i)(pricing (model|structure|tier|plan)|pay.*per|subscription|retainer|flat fee|hourly rate)",
r"(?i)(proposal|quote|estimate|ballpark|range|starting at|minimum.*engagement)",
r"(?i)(roi|return on investment|payback|break.?even|cost.*benefit)",
]
def analyze_transcript(text: str, source_id: str = "unknown") -> dict:
"""
Analyze a single transcript and return structured insights.
Returns dict with: objections, buying_signals, competitive_mentions,
pricing_discussions, raw_quotes
"""
lines = text.strip().split("\n")
insights = {
"source_id": source_id,
"analyzed_at": datetime.utcnow().isoformat() + "Z",
"objections": [],
"buying_signals": [],
"competitive_mentions": [],
"pricing_discussions": [],
}
for i, line in enumerate(lines):
context_window = " ".join(lines[max(0, i - 1) : min(len(lines), i + 2)])
# --- Objections ---
for category, patterns in OBJECTION_PATTERNS.items():
for pattern in patterns:
match = re.search(pattern, line)
if match:
insights["objections"].append({
"category": category,
"quote": line.strip(),
"match": match.group(),
"line_number": i + 1,
"context": context_window.strip(),
})
break # One match per category per line
# --- Buying Signals ---
for signal_type, patterns in BUYING_SIGNAL_PATTERNS.items():
for pattern in patterns:
match = re.search(pattern, line)
if match:
insights["buying_signals"].append({
"type": signal_type,
"quote": line.strip(),
"match": match.group(),
"line_number": i + 1,
})
break
# --- Competitive Mentions ---
for competitor in KNOWN_COMPETITORS:
if re.search(r"\b" + re.escape(competitor) + r"\b", line, re.IGNORECASE):
# Determine context sentiment (basic heuristic)
sentiment = "neutral"
neg_words = ["problem", "issue", "bad", "worse", "hate", "frustrat", "limit", "lack", "miss", "fail", "leaving", "switch"]
pos_words = ["good", "great", "love", "like", "happy", "better", "best", "strong"]
line_lower = line.lower()
if any(w in line_lower for w in neg_words):
sentiment = "negative"
elif any(w in line_lower for w in pos_words):
sentiment = "positive"
insights["competitive_mentions"].append({
"competitor": competitor,
"context_sentiment": sentiment,
"quote": line.strip(),
"line_number": i + 1,
})
# --- Pricing Discussions ---
for pattern in PRICING_DISCUSSION_PATTERNS:
match = re.search(pattern, line)
if match:
insights["pricing_discussions"].append({
"quote": line.strip(),
"match": match.group(),
"line_number": i + 1,
})
break
# Deduplicate (same quote can match multiple patterns)
insights["objections"] = _dedupe_by_line(insights["objections"])
insights["buying_signals"] = _dedupe_by_line(insights["buying_signals"])
insights["competitive_mentions"] = _dedupe_by_line(insights["competitive_mentions"])
insights["pricing_discussions"] = _dedupe_by_line(insights["pricing_discussions"])
# Summary stats
insights["summary"] = {
"total_objections": len(insights["objections"]),
"objection_categories": dict(Counter(o["category"] for o in insights["objections"])),
"total_buying_signals": len(insights["buying_signals"]),
"signal_types": dict(Counter(s["type"] for s in insights["buying_signals"])),
"competitors_mentioned": list(set(c["competitor"] for c in insights["competitive_mentions"])),
"has_pricing_discussion": len(insights["pricing_discussions"]) > 0,
"deal_temperature": _score_deal_temperature(insights),
}
return insights
def _dedupe_by_line(items: list) -> list:
"""Remove duplicate entries for the same line number."""
seen = set()
deduped = []
for item in items:
key = item.get("line_number", id(item))
if key not in seen:
seen.add(key)
deduped.append(item)
return deduped
def _score_deal_temperature(insights: dict) -> str:
"""
Score deal temperature based on signals vs objections.
Returns: hot, warm, cool, cold
"""
signal_count = len(insights["buying_signals"])
objection_count = len(insights["objections"])
# Weighted scoring
score = 0
for sig in insights["buying_signals"]:
weights = {
"budget_confirmed": 3,
"decision_maker_engaged": 3,
"timeline_mentioned": 2,
"champion_identified": 2,
"next_steps_agreed": 2,
}
score += weights.get(sig["type"], 1)
for obj in insights["objections"]:
penalties = {
"need": -3, # No need = worst signal
"authority": -1,
"timing": -1,
"pricing": -1,
"competition": -2,
}
score += penalties.get(obj["category"], -1)
if score >= 6:
return "hot"
elif score >= 3:
return "warm"
elif score >= 0:
return "cool"
else:
return "cold"
# ---------------------------------------------------------------------------
# Content topic generator
# ---------------------------------------------------------------------------
def generate_content_topics(all_insights: list[dict]) -> list[dict]:
"""
Analyze recurring objections across multiple calls to suggest content topics.
Returns list of content topic suggestions.
"""
objection_quotes = defaultdict(list)
for insight in all_insights:
for obj in insight.get("objections", []):
objection_quotes[obj["category"]].append(obj["quote"])
topics = []
# Map objection categories to content strategies
content_strategies = {
"pricing": {
"topic_template": "ROI Calculator: How {product} Pays for Itself in {timeframe}",
"content_types": ["blog post", "interactive calculator", "case study"],
"angle": "Address pricing objections with concrete ROI proof",
},
"timing": {
"topic_template": "The Cost of Waiting: What Happens When You Delay {solution}",
"content_types": ["blog post", "email sequence", "one-pager"],
"angle": "Create urgency with cost-of-inaction framing",
},
"competition": {
"topic_template": "{product} vs {competitor}: Honest Comparison for {use_case}",
"content_types": ["comparison page", "blog post", "battle card"],
"angle": "Win competitive deals with transparent comparison content",
},
"authority": {
"topic_template": "How to Build the Business Case for {product} (Template Included)",
"content_types": ["template", "guide", "executive summary"],
"angle": "Arm your champion with materials to sell internally",
},
"need": {
"topic_template": "Why Top {role}s Are Prioritizing {category} in {year}",
"content_types": ["thought leadership", "industry report", "webinar"],
"angle": "Build awareness and urgency around the problem",
},
}
for category, quotes in objection_quotes.items():
count = len(quotes)
if count == 0:
continue
strategy = content_strategies.get(category, {})
topics.append({
"category": category,
"frequency": count,
"sample_quotes": quotes[:3], # Top 3 examples
"suggested_topic": strategy.get("topic_template", f"Content addressing {category} objections"),
"recommended_content_types": strategy.get("content_types", ["blog post"]),
"strategic_angle": strategy.get("angle", ""),
"priority": "high" if count >= 5 else "medium" if count >= 2 else "low",
})
topics.sort(key=lambda t: t["frequency"], reverse=True)
return topics
# ---------------------------------------------------------------------------
# Follow-up generator
# ---------------------------------------------------------------------------
def generate_follow_ups(insights: dict) -> list[dict]:
"""
Generate personalized follow-up suggestions based on call insights.
"""
follow_ups = []
# Address top objections
for obj in insights.get("objections", [])[:3]:
templates = {
"pricing": {
"subject": "Quick thought on the investment discussion",
"body": "Following up on our pricing conversation. I put together a quick ROI model based on what you shared about {context}. The numbers suggest a {x}x return in the first year. Want me to walk through it?",
"asset": "ROI calculator or case study with similar company metrics",
},
"timing": {
"subject": "Timing + what others in your position did",
"body": "I hear you on timing. Quick data point: companies that started in a similar position to yours saw {metric} within the first 90 days. Happy to share the case study if helpful.",
"asset": "Quick-win case study showing fast time-to-value",
},
"competition": {
"subject": "Honest take on {competitor} vs us",
"body": "You mentioned you're also looking at {competitor}. Totally fair. Here's where we genuinely win and where they might be a better fit. I'd rather you make the right call than the easy one.",
"asset": "Competitive battle card or comparison one-pager",
},
"authority": {
"subject": "Materials for your team's review",
"body": "I know you need to loop in {stakeholder}. I put together a one-page executive summary that hits the points they'll care about most: ROI, timeline, and risk. Want me to send it over?",
"asset": "Executive summary one-pager, tailored to stakeholder concerns",
},
"need": {
"subject": "Something that might change the calculus",
"body": "I appreciated the honest pushback on whether this is a priority right now. One thing I didn't get to share: {relevant_insight}. Might be worth a 10-minute follow-up if you're open to it.",
"asset": "Industry report or benchmark data showing peer adoption",
},
}
template = templates.get(obj["category"], {})
follow_ups.append({
"type": "objection_response",
"objection_category": obj["category"],
"trigger_quote": obj["quote"],
"suggested_subject": template.get("subject", f"Following up on {obj['category']} discussion"),
"suggested_body": template.get("body", "Following up on our conversation..."),
"recommended_asset": template.get("asset", ""),
"timing": "Send within 24 hours of call",
})
# Capitalize on buying signals
for sig in insights.get("buying_signals", [])[:2]:
if sig["type"] == "champion_identified":
follow_ups.append({
"type": "champion_enablement",
"signal": sig["quote"],
"suggested_subject": "Ammo for your internal pitch",
"suggested_body": "You clearly get the value here. I want to make sure you have everything you need to bring the team along. Here's a deck you can customize + the key metrics that usually close the deal internally.",
"recommended_asset": "Internal pitch deck template + metrics cheat sheet",
"timing": "Send within 12 hours",
})
elif sig["type"] == "next_steps_agreed":
follow_ups.append({
"type": "momentum_keeper",
"signal": sig["quote"],
"suggested_subject": "Recap + next steps locked in",
"suggested_body": "Great call. Here's what we agreed on: {next_steps}. I'll have {deliverable} ready by {date}. Let me know if anything changes on your end.",
"recommended_asset": "Meeting summary with action items",
"timing": "Send within 2 hours of call",
})
return follow_ups
# ---------------------------------------------------------------------------
# File I/O
# ---------------------------------------------------------------------------
def load_transcript_file(filepath: str) -> dict:
"""Load a transcript from a text file."""
path = Path(filepath)
if not path.exists():
print(f"ERROR: File not found: {filepath}", file=sys.stderr)
sys.exit(1)
text = path.read_text(encoding="utf-8")
return {"id": path.stem, "title": path.stem, "transcript": text, "participants": []}
def load_transcript_dir(dirpath: str) -> list[dict]:
"""Load all .txt transcript files from a directory."""
path = Path(dirpath)
if not path.is_dir():
print(f"ERROR: Directory not found: {dirpath}", file=sys.stderr)
sys.exit(1)
files = sorted(path.glob("*.txt"))
if not files:
print(f"WARNING: No .txt files found in {dirpath}", file=sys.stderr)
return []
return [load_transcript_file(str(f)) for f in files]
# ---------------------------------------------------------------------------
# Output
# ---------------------------------------------------------------------------
def print_summary(insights: dict) -> None:
"""Print a human-readable summary of insights."""
s = insights["summary"]
print(f"\n{'='*60}")
print(f" Call: {insights['source_id']}")
print(f" Temperature: {s['deal_temperature'].upper()}")
print(f"{'='*60}")
if s["total_objections"]:
print(f"\n 🚫 Objections ({s['total_objections']}):")
for cat, count in sorted(s["objection_categories"].items(), key=lambda x: -x[1]):
print(f" {cat}: {count}")
for obj in insights["objections"][:3]:
print(f" → [{obj['category']}] \"{obj['quote'][:80]}...\"" if len(obj['quote']) > 80 else f" → [{obj['category']}] \"{obj['quote']}\"")
if s["total_buying_signals"]:
print(f"\n ✅ Buying Signals ({s['total_buying_signals']}):")
for sig_type, count in sorted(s["signal_types"].items(), key=lambda x: -x[1]):
print(f" {sig_type}: {count}")
if s["competitors_mentioned"]:
print(f"\n ⚔️ Competitors: {', '.join(s['competitors_mentioned'])}")
if s["has_pricing_discussion"]:
print(f"\n 💰 Pricing discussed: Yes ({len(insights['pricing_discussions'])} mentions)")
print()
# ---------------------------------------------------------------------------
# Main
# ---------------------------------------------------------------------------
def main():
parser = argparse.ArgumentParser(
description="Extract structured insights from sales call transcripts.",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
%(prog)s --file transcript.txt
%(prog)s --dir ./transcripts/ --content-topics
%(prog)s --gong --days 7 --follow-ups
%(prog)s --file call.txt --output insights.json
""",
)
# Input sources (mutually exclusive)
source = parser.add_mutually_exclusive_group(required=True)
source.add_argument("--file", help="Path to a single transcript file (.txt)")
source.add_argument("--dir", help="Path to directory of transcript files (.txt)")
source.add_argument("--gong", action="store_true", help="Pull transcripts from Gong API")
# Gong options
parser.add_argument("--days", type=int, default=7, help="Days of history to pull from Gong (default: 7)")
parser.add_argument("--call-id", help="Specific Gong call ID to analyze")
# Output options
parser.add_argument("--output", "-o", help="Write JSON output to file")
parser.add_argument("--json", action="store_true", help="Output raw JSON to stdout")
parser.add_argument("--content-topics", action="store_true", help="Generate content topics from recurring objections")
parser.add_argument("--follow-ups", action="store_true", help="Generate follow-up suggestions")
args = parser.parse_args()
# Load transcripts
calls = []
if args.file:
calls = [load_transcript_file(args.file)]
elif args.dir:
calls = load_transcript_dir(args.dir)
elif args.gong:
calls = fetch_calls_from_gong(days=args.days, call_id=args.call_id)
if not calls:
print("No transcripts to analyze.", file=sys.stderr)
sys.exit(1)
# Analyze
all_insights = []
for call in calls:
insights = analyze_transcript(call["transcript"], source_id=call.get("id", "unknown"))
insights["title"] = call.get("title", "")
all_insights.append(insights)
if not args.json:
print_summary(insights)
# Content topics
content_topics = []
if args.content_topics and len(all_insights) > 0:
content_topics = generate_content_topics(all_insights)
if not args.json:
print(f"\n{'='*60}")
print(" 📝 Content Topics from Recurring Objections")
print(f"{'='*60}")
for topic in content_topics:
print(f"\n [{topic['priority'].upper()}] {topic['category']} (mentioned {topic['frequency']}x)")
print(f" Topic: {topic['suggested_topic']}")
print(f" Types: {', '.join(topic['recommended_content_types'])}")
print(f" Angle: {topic['strategic_angle']}")
# Follow-ups
all_follow_ups = []
if args.follow_ups:
for insights in all_insights:
follow_ups = generate_follow_ups(insights)
all_follow_ups.extend(follow_ups)
if not args.json:
print(f"\n{'='*60}")
print(f" 📧 Follow-up Suggestions for: {insights['source_id']}")
print(f"{'='*60}")
for fu in follow_ups:
print(f"\n Type: {fu['type']}")
print(f" Subject: {fu['suggested_subject']}")
print(f" Timing: {fu['timing']}")
if fu.get("recommended_asset"):
print(f" Asset: {fu['recommended_asset']}")
# Build output
output = {
"analyzed_at": datetime.utcnow().isoformat() + "Z",
"total_calls": len(all_insights),
"calls": all_insights,
}
if content_topics:
output["content_topics"] = content_topics
if all_follow_ups:
output["follow_ups"] = all_follow_ups
# Aggregate stats
output["aggregate"] = {
"total_objections": sum(i["summary"]["total_objections"] for i in all_insights),
"total_buying_signals": sum(i["summary"]["total_buying_signals"] for i in all_insights),
"all_competitors": list(set(c for i in all_insights for c in i["summary"]["competitors_mentioned"])),
"temperature_distribution": dict(Counter(i["summary"]["deal_temperature"] for i in all_insights)),
}
# Output
if args.json:
print(json.dumps(output, indent=2))
if args.output:
out_path = Path(args.output)
out_path.parent.mkdir(parents=True, exist_ok=True)
out_path.write_text(json.dumps(output, indent=2))
if not args.json:
print(f"\n✅ Output written to {args.output}")
if __name__ == "__main__":
main()

View file

@ -0,0 +1,3 @@
requests>=2.28.0
google-analytics-data>=0.18.0
google-auth>=2.22.0

View file

@ -0,0 +1,797 @@
#!/usr/bin/env python3
"""
Revenue Attribution Mapper
Connects content pieces to pipeline and closed deals. Proves content ROI.
Maps blog posts, videos, podcasts to first-touch and multi-touch attribution
using GA4 + HubSpot deal data.
Usage:
python revenue_attribution.py --report
python revenue_attribution.py --report --model linear
python revenue_attribution.py --cpa --costs content_costs.json
python revenue_attribution.py --gaps
"""
import argparse
import json
import os
import sys
from collections import defaultdict
from datetime import datetime, timedelta
from pathlib import Path
from typing import Optional
# ---------------------------------------------------------------------------
# API Configuration
# ---------------------------------------------------------------------------
# HubSpot: Set HUBSPOT_API_KEY to your private app token
# Required scopes: crm.objects.deals.read, crm.objects.contacts.read
HUBSPOT_API_KEY = os.environ.get("HUBSPOT_API_KEY", "")
HUBSPOT_BASE_URL = "https://api.hubapi.com"
# GA4: Set GA4_PROPERTY_ID and GA4_CREDENTIALS_JSON
# GA4_CREDENTIALS_JSON should point to a service account JSON file
# Required: Google Analytics Data API (v1beta) enabled
GA4_PROPERTY_ID = os.environ.get("GA4_PROPERTY_ID", "")
GA4_CREDENTIALS_JSON = os.environ.get("GA4_CREDENTIALS_JSON", "")
OUTPUT_DIR = os.environ.get("OUTPUT_DIR", "./output")
# ---------------------------------------------------------------------------
# Content type classification
# ---------------------------------------------------------------------------
CONTENT_TYPE_PATTERNS = {
"blog": ["/blog/", "/posts/", "/article/", "/insights/"],
"video": ["/video/", "/youtube/", "/watch/", "/webinar-recording/"],
"podcast": ["/podcast/", "/episode/", "/listen/"],
"webinar": ["/webinar/", "/live/", "/register/"],
"case_study": ["/case-study/", "/case-studies/", "/success-story/", "/customer-story/"],
"landing_page": ["/lp/", "/landing/", "/offer/", "/download/"],
"tool": ["/tool/", "/calculator/", "/grader/", "/analyzer/"],
"comparison": ["/vs/", "/compare/", "/alternative/", "/versus/"],
}
# Funnel stage classification
FUNNEL_STAGE_PATTERNS = {
"awareness": ["/blog/", "/posts/", "/article/", "/podcast/", "/video/"],
"consideration": ["/case-study/", "/webinar/", "/guide/", "/comparison/", "/vs/"],
"decision": ["/pricing/", "/demo/", "/contact/", "/trial/", "/start/", "/lp/"],
}
def classify_content_type(url: str) -> str:
"""Classify a URL into a content type."""
url_lower = url.lower()
for content_type, patterns in CONTENT_TYPE_PATTERNS.items():
if any(p in url_lower for p in patterns):
return content_type
return "other"
def classify_funnel_stage(url: str) -> str:
"""Classify a URL into a funnel stage."""
url_lower = url.lower()
for stage, patterns in FUNNEL_STAGE_PATTERNS.items():
if any(p in url_lower for p in patterns):
return stage
return "unknown"
# ---------------------------------------------------------------------------
# GA4 Data Client
# ---------------------------------------------------------------------------
def fetch_ga4_page_data(start_date: str, end_date: str) -> list[dict]:
"""
Fetch page-level session and conversion data from GA4.
Returns list of dicts:
[{"page_path": "/blog/foo", "sessions": 1234, "conversions": 5, "users": 900}]
NOTE: Requires google-analytics-data library.
pip install google-analytics-data
Setup:
1. Create a service account in Google Cloud Console
2. Enable the Google Analytics Data API
3. Add the service account email as a viewer on your GA4 property
4. Download the JSON key file and set GA4_CREDENTIALS_JSON env var
"""
if not GA4_PROPERTY_ID or not GA4_CREDENTIALS_JSON:
print("WARNING: GA4_PROPERTY_ID or GA4_CREDENTIALS_JSON not set. Using sample data.", file=sys.stderr)
return _sample_ga4_data()
try:
from google.analytics.data_v1beta import BetaAnalyticsDataClient
from google.analytics.data_v1beta.types import (
DateRange,
Dimension,
Metric,
RunReportRequest,
)
client = BetaAnalyticsDataClient.from_service_account_json(GA4_CREDENTIALS_JSON)
request = RunReportRequest(
property=f"properties/{GA4_PROPERTY_ID}",
dimensions=[
Dimension(name="pagePath"),
Dimension(name="sessionDefaultChannelGroup"),
],
metrics=[
Metric(name="sessions"),
Metric(name="totalUsers"),
Metric(name="conversions"),
],
date_ranges=[DateRange(start_date=start_date, end_date=end_date)],
)
response = client.run_report(request)
results = []
for row in response.rows:
results.append({
"page_path": row.dimension_values[0].value,
"channel": row.dimension_values[1].value,
"sessions": int(row.metric_values[0].value),
"users": int(row.metric_values[1].value),
"conversions": int(row.metric_values[2].value),
})
return results
except ImportError:
print("WARNING: google-analytics-data not installed. Using sample data.", file=sys.stderr)
return _sample_ga4_data()
except Exception as e:
print(f"WARNING: GA4 API error: {e}. Using sample data.", file=sys.stderr)
return _sample_ga4_data()
def _sample_ga4_data() -> list[dict]:
"""Sample GA4 data for testing/demo purposes."""
return [
{"page_path": "/blog/seo-strategy-2025", "channel": "Organic Search", "sessions": 4200, "users": 3800, "conversions": 12},
{"page_path": "/blog/content-marketing-roi", "channel": "Organic Search", "sessions": 3100, "users": 2900, "conversions": 8},
{"page_path": "/blog/ai-marketing-tools", "channel": "Organic Search", "sessions": 5600, "users": 5100, "conversions": 15},
{"page_path": "/case-study/saas-company-3x-pipeline", "channel": "Direct", "sessions": 890, "users": 820, "conversions": 9},
{"page_path": "/case-study/ecommerce-seo-growth", "channel": "Organic Search", "sessions": 1200, "users": 1100, "conversions": 7},
{"page_path": "/podcast/episode-42-growth-loops", "channel": "Social", "sessions": 2300, "users": 2100, "conversions": 3},
{"page_path": "/webinar/ai-ops-for-marketers", "channel": "Email", "sessions": 650, "users": 600, "conversions": 11},
{"page_path": "/video/youtube-seo-masterclass", "channel": "Social", "sessions": 8900, "users": 8200, "conversions": 6},
{"page_path": "/blog/paid-media-benchmarks", "channel": "Organic Search", "sessions": 2700, "users": 2500, "conversions": 4},
{"page_path": "/lp/free-seo-audit", "channel": "Paid Search", "sessions": 1800, "users": 1700, "conversions": 22},
{"page_path": "/pricing", "channel": "Direct", "sessions": 3200, "users": 2900, "conversions": 18},
{"page_path": "/blog/b2b-lead-generation", "channel": "Organic Search", "sessions": 3400, "users": 3100, "conversions": 5},
{"page_path": "/vs/hubspot-alternative", "channel": "Organic Search", "sessions": 1500, "users": 1400, "conversions": 10},
]
# ---------------------------------------------------------------------------
# HubSpot Deal Data
# ---------------------------------------------------------------------------
def fetch_hubspot_deals(start_date: str, end_date: str) -> list[dict]:
"""
Fetch closed-won deals from HubSpot with touchpoint history.
Returns list of dicts:
[{
"deal_id": "123",
"deal_name": "Acme Corp",
"amount": 50000,
"close_date": "2025-03-15",
"touchpoints": [
{"url": "/blog/seo-strategy", "timestamp": "2025-01-10", "type": "first_touch"},
{"url": "/case-study/saas", "timestamp": "2025-02-20", "type": "page_view"},
{"url": "/pricing", "timestamp": "2025-03-01", "type": "page_view"},
]
}]
NOTE: Requires requests library.
Touchpoints come from HubSpot's contact timeline / page views.
You need a private app with crm.objects.deals.read + crm.objects.contacts.read scopes.
"""
if not HUBSPOT_API_KEY:
print("WARNING: HUBSPOT_API_KEY not set. Using sample data.", file=sys.stderr)
return _sample_hubspot_deals()
try:
import requests
headers = {"Authorization": f"Bearer {HUBSPOT_API_KEY}"}
# Fetch closed-won deals in date range
# Using the search API for better filtering
search_body = {
"filterGroups": [{
"filters": [
{"propertyName": "dealstage", "operator": "EQ", "value": "closedwon"},
{"propertyName": "closedate", "operator": "GTE", "value": f"{start_date}T00:00:00Z"},
{"propertyName": "closedate", "operator": "LTE", "value": f"{end_date}T23:59:59Z"},
]
}],
"properties": ["dealname", "amount", "closedate", "dealstage"],
"limit": 100,
}
resp = requests.post(
f"{HUBSPOT_BASE_URL}/crm/v3/objects/deals/search",
headers=headers,
json=search_body,
)
resp.raise_for_status()
deals_data = resp.json().get("results", [])
deals = []
for deal in deals_data:
props = deal.get("properties", {})
deal_id = deal["id"]
# Get associated contacts
assoc_resp = requests.get(
f"{HUBSPOT_BASE_URL}/crm/v3/objects/deals/{deal_id}/associations/contacts",
headers=headers,
)
contact_ids = [r["id"] for r in assoc_resp.json().get("results", [])] if assoc_resp.ok else []
# Get page views for each contact (from engagement timeline)
touchpoints = []
for cid in contact_ids[:5]: # Limit to avoid rate limits
# Fetch contact's page views from the timeline API
timeline_resp = requests.get(
f"{HUBSPOT_BASE_URL}/crm/v3/objects/contacts/{cid}/engagements",
headers=headers,
params={"limit": 50},
)
if timeline_resp.ok:
for eng in timeline_resp.json().get("results", []):
# Extract page view URLs from engagement metadata
metadata = eng.get("properties", {})
if metadata.get("hs_page_url"):
touchpoints.append({
"url": metadata["hs_page_url"],
"timestamp": metadata.get("hs_timestamp", ""),
"type": "page_view",
})
# Mark first and last touch
if touchpoints:
touchpoints.sort(key=lambda t: t["timestamp"])
touchpoints[0]["type"] = "first_touch"
touchpoints[-1]["type"] = "last_touch"
deals.append({
"deal_id": deal_id,
"deal_name": props.get("dealname", "Unknown"),
"amount": float(props.get("amount", 0) or 0),
"close_date": props.get("closedate", "")[:10],
"touchpoints": touchpoints,
})
return deals
except ImportError:
print("WARNING: requests not installed. Using sample data.", file=sys.stderr)
return _sample_hubspot_deals()
except Exception as e:
print(f"WARNING: HubSpot API error: {e}. Using sample data.", file=sys.stderr)
return _sample_hubspot_deals()
def _sample_hubspot_deals() -> list[dict]:
"""Sample HubSpot deal data for testing/demo."""
return [
{
"deal_id": "deal_001",
"deal_name": "Acme Corp - SEO Retainer",
"amount": 120000,
"close_date": "2025-03-15",
"touchpoints": [
{"url": "/blog/seo-strategy-2025", "timestamp": "2025-01-05", "type": "first_touch"},
{"url": "/blog/content-marketing-roi", "timestamp": "2025-01-22", "type": "page_view"},
{"url": "/case-study/saas-company-3x-pipeline", "timestamp": "2025-02-10", "type": "page_view"},
{"url": "/pricing", "timestamp": "2025-02-28", "type": "page_view"},
{"url": "/lp/free-seo-audit", "timestamp": "2025-03-05", "type": "last_touch"},
],
},
{
"deal_id": "deal_002",
"deal_name": "TechStart Inc - Full Service",
"amount": 240000,
"close_date": "2025-02-20",
"touchpoints": [
{"url": "/blog/ai-marketing-tools", "timestamp": "2024-12-01", "type": "first_touch"},
{"url": "/podcast/episode-42-growth-loops", "timestamp": "2024-12-15", "type": "page_view"},
{"url": "/webinar/ai-ops-for-marketers", "timestamp": "2025-01-10", "type": "page_view"},
{"url": "/vs/hubspot-alternative", "timestamp": "2025-01-25", "type": "page_view"},
{"url": "/pricing", "timestamp": "2025-02-10", "type": "last_touch"},
],
},
{
"deal_id": "deal_003",
"deal_name": "GrowthCo - Content Marketing",
"amount": 84000,
"close_date": "2025-03-01",
"touchpoints": [
{"url": "/blog/content-marketing-roi", "timestamp": "2025-01-15", "type": "first_touch"},
{"url": "/case-study/ecommerce-seo-growth", "timestamp": "2025-02-01", "type": "page_view"},
{"url": "/pricing", "timestamp": "2025-02-20", "type": "last_touch"},
],
},
{
"deal_id": "deal_004",
"deal_name": "SaaS Corp - Paid Media",
"amount": 180000,
"close_date": "2025-01-30",
"touchpoints": [
{"url": "/video/youtube-seo-masterclass", "timestamp": "2024-11-15", "type": "first_touch"},
{"url": "/blog/paid-media-benchmarks", "timestamp": "2024-12-10", "type": "page_view"},
{"url": "/blog/b2b-lead-generation", "timestamp": "2025-01-05", "type": "page_view"},
{"url": "/lp/free-seo-audit", "timestamp": "2025-01-20", "type": "last_touch"},
],
},
{
"deal_id": "deal_005",
"deal_name": "Enterprise Ltd - SEO + Content",
"amount": 360000,
"close_date": "2025-03-20",
"touchpoints": [
{"url": "/blog/seo-strategy-2025", "timestamp": "2024-12-20", "type": "first_touch"},
{"url": "/blog/ai-marketing-tools", "timestamp": "2025-01-08", "type": "page_view"},
{"url": "/case-study/saas-company-3x-pipeline", "timestamp": "2025-01-25", "type": "page_view"},
{"url": "/webinar/ai-ops-for-marketers", "timestamp": "2025-02-05", "type": "page_view"},
{"url": "/pricing", "timestamp": "2025-03-01", "type": "page_view"},
{"url": "/lp/free-seo-audit", "timestamp": "2025-03-10", "type": "last_touch"},
],
},
]
# ---------------------------------------------------------------------------
# Attribution Models
# ---------------------------------------------------------------------------
def first_touch_attribution(deals: list[dict]) -> dict[str, float]:
"""100% credit to the first touchpoint."""
attribution = defaultdict(float)
for deal in deals:
tps = deal.get("touchpoints", [])
if tps:
first = tps[0]
attribution[first["url"]] += deal["amount"]
return dict(attribution)
def last_touch_attribution(deals: list[dict]) -> dict[str, float]:
"""100% credit to the last touchpoint."""
attribution = defaultdict(float)
for deal in deals:
tps = deal.get("touchpoints", [])
if tps:
last = tps[-1]
attribution[last["url"]] += deal["amount"]
return dict(attribution)
def linear_attribution(deals: list[dict]) -> dict[str, float]:
"""Equal credit to all touchpoints."""
attribution = defaultdict(float)
for deal in deals:
tps = deal.get("touchpoints", [])
if tps:
credit = deal["amount"] / len(tps)
for tp in tps:
attribution[tp["url"]] += credit
return dict(attribution)
def time_decay_attribution(deals: list[dict], half_life_days: int = 7) -> dict[str, float]:
"""
More credit to touchpoints closer to close date.
Uses exponential decay with configurable half-life.
"""
import math
attribution = defaultdict(float)
for deal in deals:
tps = deal.get("touchpoints", [])
close_date = deal.get("close_date", "")
if not tps or not close_date:
continue
try:
close_dt = datetime.strptime(close_date, "%Y-%m-%d")
except ValueError:
continue
# Calculate decay weights
weights = []
for tp in tps:
try:
tp_dt = datetime.strptime(tp["timestamp"][:10], "%Y-%m-%d")
days_before = (close_dt - tp_dt).days
weight = math.pow(0.5, days_before / half_life_days)
weights.append(weight)
except (ValueError, KeyError):
weights.append(0.1)
total_weight = sum(weights) or 1
for tp, weight in zip(tps, weights):
attribution[tp["url"]] += deal["amount"] * (weight / total_weight)
return dict(attribution)
ATTRIBUTION_MODELS = {
"first-touch": first_touch_attribution,
"last-touch": last_touch_attribution,
"linear": linear_attribution,
"time-decay": time_decay_attribution,
}
# ---------------------------------------------------------------------------
# Report Generation
# ---------------------------------------------------------------------------
def generate_attribution_report(
deals: list[dict],
ga4_data: list[dict],
model: str = "linear",
) -> dict:
"""Generate a full attribution report."""
# Run attribution
model_func = ATTRIBUTION_MODELS.get(model, linear_attribution)
attribution = model_func(deals)
# Enrich with GA4 data
ga4_by_path = {}
for row in ga4_data:
path = row["page_path"]
if path not in ga4_by_path:
ga4_by_path[path] = {"sessions": 0, "users": 0, "conversions": 0}
ga4_by_path[path]["sessions"] += row["sessions"]
ga4_by_path[path]["users"] += row["users"]
ga4_by_path[path]["conversions"] += row["conversions"]
# Build content performance table
content_performance = []
for url, revenue in sorted(attribution.items(), key=lambda x: -x[1]):
ga4 = ga4_by_path.get(url, {"sessions": 0, "users": 0, "conversions": 0})
content_type = classify_content_type(url)
funnel_stage = classify_funnel_stage(url)
content_performance.append({
"url": url,
"content_type": content_type,
"funnel_stage": funnel_stage,
"attributed_revenue": round(revenue, 2),
"sessions": ga4["sessions"],
"users": ga4["users"],
"conversions": ga4["conversions"],
"revenue_per_session": round(revenue / ga4["sessions"], 2) if ga4["sessions"] else 0,
"deals_touched": sum(
1 for d in deals if any(tp["url"] == url for tp in d.get("touchpoints", []))
),
})
# Aggregate by content type
by_type = defaultdict(lambda: {"revenue": 0, "sessions": 0, "conversions": 0, "pieces": 0})
for cp in content_performance:
t = cp["content_type"]
by_type[t]["revenue"] += cp["attributed_revenue"]
by_type[t]["sessions"] += cp["sessions"]
by_type[t]["conversions"] += cp["conversions"]
by_type[t]["pieces"] += 1
type_summary = []
for content_type, stats in sorted(by_type.items(), key=lambda x: -x[1]["revenue"]):
type_summary.append({
"content_type": content_type,
"total_revenue": round(stats["revenue"], 2),
"total_sessions": stats["sessions"],
"total_conversions": stats["conversions"],
"piece_count": stats["pieces"],
"avg_revenue_per_piece": round(stats["revenue"] / stats["pieces"], 2) if stats["pieces"] else 0,
})
# Summary
total_revenue = sum(d["amount"] for d in deals)
total_deals = len(deals)
report = {
"generated_at": datetime.utcnow().isoformat() + "Z",
"attribution_model": model,
"summary": {
"total_revenue": total_revenue,
"total_deals": total_deals,
"avg_deal_size": round(total_revenue / total_deals, 2) if total_deals else 0,
"content_pieces_with_attribution": len(content_performance),
"avg_touchpoints_per_deal": round(
sum(len(d.get("touchpoints", [])) for d in deals) / total_deals, 1
) if total_deals else 0,
},
"top_content": content_performance[:20],
"by_content_type": type_summary,
}
return report
def calculate_cpa(report: dict, costs: dict) -> dict:
"""
Calculate cost-per-acquisition by content type.
costs should be: {"blog": 15000, "video": 8000, "podcast": 3000, ...}
representing total spend on each content type in the period.
"""
cpa_report = []
for type_data in report["by_content_type"]:
ct = type_data["content_type"]
cost = costs.get(ct, 0)
revenue = type_data["total_revenue"]
conversions = type_data["total_conversions"]
cpa_report.append({
"content_type": ct,
"total_cost": cost,
"total_revenue": revenue,
"conversions": conversions,
"cpa": round(cost / conversions, 2) if conversions else None,
"roi": round((revenue - cost) / cost, 2) if cost else None,
"roi_multiple": f"{round(revenue / cost, 1)}x" if cost else "N/A",
})
cpa_report.sort(key=lambda x: (x["roi"] or 0), reverse=True)
return {"cpa_by_content_type": cpa_report}
def find_content_gaps(deals: list[dict]) -> dict:
"""
Identify funnel stages with no or low content attribution.
"""
stage_coverage = defaultdict(lambda: {"urls": set(), "deals": 0, "revenue": 0})
for deal in deals:
stages_hit = set()
for tp in deal.get("touchpoints", []):
stage = classify_funnel_stage(tp["url"])
stage_coverage[stage]["urls"].add(tp["url"])
stages_hit.add(stage)
for stage in stages_hit:
stage_coverage[stage]["deals"] += 1
stage_coverage[stage]["revenue"] += deal["amount"] / len(stages_hit)
# Check for gaps
expected_stages = ["awareness", "consideration", "decision"]
gaps = []
for stage in expected_stages:
data = stage_coverage.get(stage, {"urls": set(), "deals": 0, "revenue": 0})
total_deals = len(deals)
coverage_pct = round(data["deals"] / total_deals * 100, 1) if total_deals else 0
if coverage_pct < 30:
severity = "critical" if coverage_pct < 10 else "moderate"
gaps.append({
"stage": stage,
"coverage_percent": coverage_pct,
"deals_with_stage": data["deals"],
"content_pieces": len(data["urls"]),
"severity": severity,
"recommendation": _gap_recommendation(stage, coverage_pct),
})
stage_summary = []
for stage in expected_stages:
data = stage_coverage.get(stage, {"urls": set(), "deals": 0, "revenue": 0})
stage_summary.append({
"stage": stage,
"content_pieces": len(data["urls"]),
"deals_touched": data["deals"],
"attributed_revenue": round(data["revenue"], 2),
"top_urls": list(data["urls"])[:5],
})
return {
"gaps": gaps,
"stage_summary": stage_summary,
"total_deals_analyzed": len(deals),
}
def _gap_recommendation(stage: str, coverage_pct: float) -> str:
"""Generate a recommendation for a content gap."""
recs = {
"awareness": "Create more top-of-funnel content (blog posts, videos, podcasts) targeting high-volume keywords. Focus on educational content that introduces the problem your product solves.",
"consideration": "Build comparison pages, case studies, and webinars that help prospects evaluate solutions. This is where you prove credibility and differentiation.",
"decision": "Add pricing pages, ROI calculators, free trials, and demo CTAs. Make it easy for ready-to-buy prospects to take action.",
}
return recs.get(stage, f"Create content for the {stage} stage to improve coverage from {coverage_pct}%.")
# ---------------------------------------------------------------------------
# Output Formatting
# ---------------------------------------------------------------------------
def print_report(report: dict) -> None:
"""Print attribution report in human-readable format."""
s = report["summary"]
print(f"\n{'='*70}")
print(f" CONTENT REVENUE ATTRIBUTION REPORT")
print(f" Model: {report['attribution_model']}")
print(f" Generated: {report['generated_at']}")
print(f"{'='*70}")
print(f"\n 📊 Summary")
print(f" Total Revenue: ${s['total_revenue']:,.0f}")
print(f" Total Deals: {s['total_deals']}")
print(f" Avg Deal Size: ${s['avg_deal_size']:,.0f}")
print(f" Content w/ Attribution: {s['content_pieces_with_attribution']}")
print(f" Avg Touchpoints/Deal: {s['avg_touchpoints_per_deal']}")
print(f"\n 📈 Revenue by Content Type")
print(f" {'Type':<16} {'Revenue':>12} {'Sessions':>10} {'Pieces':>8} {'Avg/Piece':>12}")
print(f" {'-'*58}")
for ct in report["by_content_type"]:
print(
f" {ct['content_type']:<16} "
f"${ct['total_revenue']:>10,.0f} "
f"{ct['total_sessions']:>10,} "
f"{ct['piece_count']:>8} "
f"${ct['avg_revenue_per_piece']:>10,.0f}"
)
print(f"\n 🏆 Top Content by Revenue")
print(f" {'URL':<45} {'Revenue':>12} {'Sessions':>10} {'Type':<12}")
print(f" {'-'*79}")
for cp in report["top_content"][:10]:
url_display = cp["url"][:43] + ".." if len(cp["url"]) > 45 else cp["url"]
print(
f" {url_display:<45} "
f"${cp['attributed_revenue']:>10,.0f} "
f"{cp['sessions']:>10,} "
f"{cp['content_type']:<12}"
)
print()
def print_gaps(gaps_report: dict) -> None:
"""Print content gap analysis."""
print(f"\n{'='*70}")
print(f" CONTENT GAP ANALYSIS")
print(f"{'='*70}")
print(f"\n 📊 Funnel Stage Coverage ({gaps_report['total_deals_analyzed']} deals)")
for stage in gaps_report["stage_summary"]:
print(f"\n {stage['stage'].upper()}")
print(f" Content Pieces: {stage['content_pieces']}")
print(f" Deals Touched: {stage['deals_touched']}")
print(f" Revenue: ${stage['attributed_revenue']:,.0f}")
if gaps_report["gaps"]:
print(f"\n ⚠️ Gaps Identified")
for gap in gaps_report["gaps"]:
print(f"\n [{gap['severity'].upper()}] {gap['stage'].upper()}{gap['coverage_percent']}% coverage")
print(f"{gap['recommendation']}")
else:
print(f"\n ✅ No significant gaps found")
print()
# ---------------------------------------------------------------------------
# Main
# ---------------------------------------------------------------------------
def main():
parser = argparse.ArgumentParser(
description="Map content to revenue with multi-touch attribution.",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
%(prog)s --report
%(prog)s --report --model time-decay
%(prog)s --cpa --costs content_costs.json
%(prog)s --gaps
%(prog)s --report --start 2025-01-01 --end 2025-03-31 --json
""",
)
parser.add_argument("--report", action="store_true", help="Generate attribution report")
parser.add_argument("--gaps", action="store_true", help="Identify content gaps in buyer journey")
parser.add_argument("--cpa", action="store_true", help="Calculate cost-per-acquisition by content type")
parser.add_argument("--model", choices=["first-touch", "last-touch", "linear", "time-decay"],
default="linear", help="Attribution model (default: linear)")
parser.add_argument("--start", help="Start date YYYY-MM-DD (default: 90 days ago)")
parser.add_argument("--end", help="End date YYYY-MM-DD (default: today)")
parser.add_argument("--costs", help="JSON file with content costs by type (for --cpa)")
parser.add_argument("--json", action="store_true", help="Output raw JSON")
parser.add_argument("--output", "-o", help="Write output to file")
args = parser.parse_args()
if not (args.report or args.gaps or args.cpa):
parser.error("At least one of --report, --gaps, or --cpa is required")
# Date range
end_date = args.end or datetime.utcnow().strftime("%Y-%m-%d")
start_date = args.start or (datetime.utcnow() - timedelta(days=90)).strftime("%Y-%m-%d")
print(f"Fetching data for {start_date} to {end_date}...", file=sys.stderr)
# Fetch data
ga4_data = fetch_ga4_page_data(start_date, end_date)
deals = fetch_hubspot_deals(start_date, end_date)
output = {
"date_range": {"start": start_date, "end": end_date},
"generated_at": datetime.utcnow().isoformat() + "Z",
}
if args.report:
report = generate_attribution_report(deals, ga4_data, model=args.model)
output["attribution_report"] = report
if not args.json:
print_report(report)
if args.cpa:
if not args.report:
report = generate_attribution_report(deals, ga4_data, model=args.model)
output["attribution_report"] = report
costs = {}
if args.costs:
costs_path = Path(args.costs)
if costs_path.exists():
costs = json.loads(costs_path.read_text())
else:
print(f"WARNING: Costs file not found: {args.costs}. Using empty costs.", file=sys.stderr)
cpa_data = calculate_cpa(output["attribution_report"], costs)
output["cpa"] = cpa_data
if not args.json:
print(f"\n{'='*70}")
print(f" COST PER ACQUISITION BY CONTENT TYPE")
print(f"{'='*70}")
print(f" {'Type':<16} {'Cost':>10} {'Revenue':>12} {'CPA':>10} {'ROI':>8}")
print(f" {'-'*56}")
for row in cpa_data["cpa_by_content_type"]:
cpa_str = f"${row['cpa']:,.0f}" if row["cpa"] is not None else "N/A"
roi_str = row["roi_multiple"]
print(
f" {row['content_type']:<16} "
f"${row['total_cost']:>8,} "
f"${row['total_revenue']:>10,.0f} "
f"{cpa_str:>10} "
f"{roi_str:>8}"
)
print()
if args.gaps:
gaps_data = find_content_gaps(deals)
output["gaps"] = gaps_data
if not args.json:
print_gaps(gaps_data)
if args.json:
print(json.dumps(output, indent=2, default=str))
if args.output:
out_path = Path(args.output)
out_path.parent.mkdir(parents=True, exist_ok=True)
out_path.write_text(json.dumps(output, indent=2, default=str))
if not args.json:
print(f"✅ Output written to {args.output}")
if __name__ == "__main__":
main()

319
team-ops/README.md Normal file
View file

@ -0,0 +1,319 @@
# 👥 AI Team Ops
> **Run your team like an engineer runs a system — measure everything, cut waste, ship faster.**
Two AI-powered tools for ruthless team optimization: a structured performance audit framework (the "Elon Algorithm") and an intelligent meeting transcript processor that never lets action items fall through the cracks.
Built for operators who want data-driven team decisions, not vibes-based management.
---
## Architecture
```
┌──────────────────────────────────────┐
│ TEAM PERFORMANCE AUDIT │
│ ("Elon Algorithm") │
└──────────────┬───────────────────────┘
┌────────────────────────┼────────────────────────┐
│ │ │
Role Descriptions OKRs / KPIs Output Data
(who does what) (what they should hit) (what they actually did)
│ │ │
└────────────────────────┼────────────────────────┘
┌──────────────▼───────────────────────┐
│ 5-Step Elon Algorithm │
│ │
│ 1. Question — is this necessary? │
│ 2. Delete — flag redundancies │
│ 3. Simplify — cut complexity │
│ 4. Accelerate — find bottlenecks │
│ 5. Automate — what can AI handle? │
└──────────────┬───────────────────────┘
┌──────────────▼───────────────────────┐
│ Scoring Engine │
│ • Output Velocity (30%) │
│ • Quality (30%) │
│ • Independence (20%) │
│ • Initiative (20%) │
│ │
│ → A/B/C Stack Rank │
│ → Promote / Coach / Reassign / Exit │
└──────────────────────────────────────┘
Executive Summary + Scorecards + Org Recommendations
┌──────────────────────────────────────┐
│ MEETING ACTION EXTRACTOR │
└──────────────┬───────────────────────┘
Meeting Transcripts (text / stdin / batch)
┌──────────────▼───────────────────────┐
│ LLM Extraction Engine │
│ │
│ • Decisions (who + context) │
│ • Action Items (owner + deadline) │
│ • Open Questions │
│ • Key Insights / Quotes │
│ • Follow-up Meetings │
│ • Implicit Commitments │
│ + Confidence Scores │
└──────────────┬───────────────────────┘
┌──────────────▼───────────────────────┐
│ Output │
│ • Structured JSON │
│ • Formatted Markdown │
│ • HubSpot Tasks (optional) │
└──────────────────────────────────────┘
```
---
## Tools
### 1. 🏭 Team Performance Audit (`team_performance_audit.py`)
The "Elon Algorithm" applied to team management. A 5-step framework that questions every role, deletes redundancy, simplifies workflows, accelerates bottlenecks, and flags automation opportunities.
**What it does:**
- Ingests role descriptions, OKRs/KPIs, and output data (CSV or JSON)
- Scores each person on 4 dimensions: output velocity, quality, independence, initiative
- Computes a weighted composite score and assigns A/B/C tier labels
- Runs the 5-step Elon Algorithm via LLM for qualitative organizational analysis
- Generates recommended actions: promote, retain, coach, reassign, exit
- Outputs executive summary + individual scorecards + org-level recommendations
```bash
# Run with JSON input
python3 team_performance_audit.py --input team_data.json --output report.md
# Run with CSV input
python3 team_performance_audit.py --input team_data.csv --output report.md
# JSON output
python3 team_performance_audit.py --input team_data.json --format json --output report.json
# Dry run (quantitative only, no LLM calls)
python3 team_performance_audit.py --input team_data.json --dry-run
# Custom scoring weights
python3 team_performance_audit.py --input team_data.json \
--weights '{"output_velocity":0.4,"quality":0.3,"independence":0.15,"initiative":0.15}'
```
**JSON Input Format:**
```json
{
"team_members": [
{
"name": "Alice Chen",
"role": "Senior Engineer",
"role_description": "Owns backend API development",
"okrs": [
{"objective": "Reduce API latency", "key_result": "P95 < 200ms", "progress": 0.85}
],
"metrics": {
"tasks_completed": 47,
"tasks_assigned": 52,
"avg_completion_days": 3.2,
"quality_score": 92,
"peer_feedback_score": 4.5,
"initiatives_proposed": 3,
"initiatives_shipped": 2
},
"deliverables": [
{"name": "API v2 Migration", "status": "completed", "date": "2024-02-15"}
]
}
],
"org_context": {
"company_goals": ["Ship v3 by Q2", "Reduce infra costs 30%"],
"team_size": 12,
"evaluation_period": "Q1 2024"
}
}
```
**CSV Input Format:**
```csv
name,role,tasks_completed,tasks_assigned,avg_completion_days,quality_score,peer_feedback_score,initiatives_proposed,initiatives_shipped
Alice Chen,Senior Engineer,47,52,3.2,92,4.5,3,2
Bob Park,Junior Dev,28,40,5.1,68,3.2,0,0
```
**Scoring Dimensions:**
| Dimension | Weight | What It Measures |
|-----------|--------|-----------------|
| Output Velocity | 30% | Task completion rate + speed |
| Quality | 30% | Deliverable quality + peer feedback |
| Independence | 20% | Self-direction, low management overhead |
| Initiative | 20% | Proactive contributions beyond assigned work |
**Tier Labels:**
| Tier | Score | Meaning |
|------|-------|---------|
| 🟢 A-Player | 80+ | Top performer. Promote or retain aggressively. |
| 🟡 B-Player | 55-79 | Solid contributor. Coach to A or maintain. |
| 🔴 C-Player | <55 | Underperforming. Reassign, PIP, or exit. |
---
### 2. 📋 Meeting Action Extractor (`meeting_action_extractor.py`)
Never lose an action item again. Feed it meeting transcripts; get structured decisions, action items, follow-ups, and insights.
**What it does:**
- Extracts decisions with who made them and context
- Identifies action items with owner, deadline, and priority
- Catches implicit commitments ("I'll take care of that" → action item)
- Flags open questions and unresolved items
- Pulls out key insights and quotable moments
- Identifies follow-up meetings needed
- Assigns confidence scores (1.0 = explicit, 0.5 = inferred)
- Supports batch processing of entire transcript directories
- Optional HubSpot integration to push action items as tasks
```bash
# Single transcript → markdown
python3 meeting_action_extractor.py --transcript meeting.txt
# Single transcript → JSON
python3 meeting_action_extractor.py --transcript meeting.txt --format json
# Read from stdin (paste or pipe)
cat meeting.txt | python3 meeting_action_extractor.py --stdin
# Batch process a directory
python3 meeting_action_extractor.py --batch ./transcripts/ --output ./actions/
# Push action items to HubSpot
python3 meeting_action_extractor.py --transcript meeting.txt --push-hubspot
# Dry run
python3 meeting_action_extractor.py --transcript meeting.txt --dry-run
```
**Example Output (Markdown):**
```markdown
## Action Items
1. 🔴 **Finalize Q2 budget proposal**
- Owner: **Sarah**
- Deadline: Friday March 15
- Confidence: 95%
- Source: "Sarah, can you get the Q2 budget finalized by Friday?"
2. 🟡 **Look into the API latency issue** *(implicit)*
- Owner: **Mike**
- Deadline: No deadline
- Confidence: 80%
- Source: "Yeah, I'll look into that"
```
---
## Quick Start
### 1. Clone and install
```bash
git clone https://github.com/singlegrain/ai-marketing-skills.git
cd ai-marketing-skills/team-ops
pip install -r requirements.txt
```
### 2. Configure environment
```bash
# Set at least one LLM provider
export ANTHROPIC_API_KEY="sk-ant-..."
# OR
export OPENAI_API_KEY="sk-..."
# Optional: HubSpot for meeting action push
export HUBSPOT_API_KEY="pat-..."
# Optional: Override LLM settings
export LLM_PROVIDER="anthropic" # or "openai"
export LLM_MODEL="claude-sonnet-4-20250514" # or "gpt-4o"
```
### 3. Test with dry runs
```bash
# Test performance audit (quantitative scoring only)
python3 team_performance_audit.py --input sample_team.json --dry-run
# Test meeting extractor
python3 meeting_action_extractor.py --transcript sample_meeting.txt --dry-run
```
### 4. Run for real
```bash
# Full team audit
python3 team_performance_audit.py --input team_data.json --output q1_audit.md
# Extract actions from today's meeting
python3 meeting_action_extractor.py --transcript standup.txt --format markdown
# Batch process last week's meetings
python3 meeting_action_extractor.py --batch ./weekly_transcripts/ --output ./weekly_actions/
```
---
## Integrations
| Tool | Required | Used By |
|------|----------|---------|
| [Anthropic](https://anthropic.com) | One LLM required | Both tools |
| [OpenAI](https://openai.com) | One LLM required | Both tools |
| [HubSpot](https://hubspot.com) | Optional | Meeting Extractor (task push) |
---
## File Structure
```
team-ops/
├── README.md # This file
├── SKILL.md # Claude Code skill definition
├── requirements.txt # Python dependencies
├── team_performance_audit.py # Elon Algorithm team audit
└── meeting_action_extractor.py # Meeting transcript → action items
```
---
## How It Works Together
1. **Team Performance Audit** gives you the big picture: who's performing, who isn't, where the org is inefficient
2. **Meeting Action Extractor** keeps the day-to-day moving: every meeting produces clear, tracked action items
3. Together: audit identifies what needs to change, meetings track the execution of those changes
Run the audit quarterly. Run the extractor after every meeting. Watch accountability compound.
---
<div align="center">
**🧠 [Want these built and managed for you? →](https://singlebrain.com/?utm_source=github&utm_medium=skill_repo&utm_campaign=ai_marketing_skills)**
*This is how we build agents at [Single Brain](https://singlebrain.com/?utm_source=github&utm_medium=skill_repo&utm_campaign=ai_marketing_skills) for our clients.*
[Single Grain](https://www.singlegrain.com/?utm_source=github&utm_medium=skill_repo&utm_campaign=ai_marketing_skills) · our marketing agency
📬 **[Level up your marketing with 14,000+ marketers and founders →](https://levelingup.beehiiv.com/subscribe)** *(free)*
</div>

93
team-ops/SKILL.md Normal file
View file

@ -0,0 +1,93 @@
# AI Team Ops
AI-powered team performance analysis and meeting intelligence: ruthless performance audits using the "Elon Algorithm" + automatic extraction of action items, decisions, and follow-ups from meeting transcripts.
## When to Use
Use this skill when:
- Evaluating team performance against OKRs/KPIs with a structured framework
- Stack ranking team members to identify A/B/C players
- Finding redundant roles, bottlenecks, and automation opportunities in your org
- Extracting action items and decisions from meeting transcripts
- Processing batch meeting notes into structured follow-up lists
- Pushing meeting action items to CRM (HubSpot) as tasks
## Tools
### Team Performance
| Script | Purpose | Key Command |
|--------|---------|-------------|
| `team_performance_audit.py` | Elon Algorithm: 5-step team audit + stack rank + scorecards | `python3 team_performance_audit.py --input team_data.json --output report.md` |
### Meeting Intelligence
| Script | Purpose | Key Command |
|--------|---------|-------------|
| `meeting_action_extractor.py` | Extract decisions, actions, follow-ups from transcripts | `python3 meeting_action_extractor.py --transcript meeting.txt --format markdown` |
## Configuration
All scripts use environment variables for LLM API access. Copy `.env.example` to `.env` and fill in your values.
### Required Environment Variables
- `ANTHROPIC_API_KEY` — Anthropic API key (Claude for analysis)
- `OPENAI_API_KEY` — OpenAI API key (alternative LLM provider)
### Optional Environment Variables
- `HUBSPOT_API_KEY` — HubSpot private app token (for pushing meeting action items as tasks)
- `LLM_PROVIDER``anthropic` (default) or `openai`
- `LLM_MODEL` — Model name override (default: `claude-sonnet-4-20250514` or `gpt-4o`)
## Data Flow
```
Role Descriptions + OKRs + Output Data (CSV/JSON)
┌──────────────────────────────────┐
│ team_performance_audit.py │
│ 5-Step Elon Algorithm: │
│ 1. Question requirements │
│ 2. Delete redundancies │
│ 3. Simplify workflows │
│ 4. Accelerate bottlenecks │
│ 5. Automate what's possible │
│ │
│ + Score: velocity, quality, │
│ independence, initiative │
│ + Stack rank: A/B/C players │
│ + Actions: promote/coach/exit │
└──────────────────────────────────┘
Executive Summary + Individual Scorecards + Org Recommendations
Meeting Transcripts (text files or stdin)
┌──────────────────────────────────┐
│ meeting_action_extractor.py │
│ Extract: │
│ • Decisions (who + context) │
│ • Action items (owner + │
│ deadline + priority) │
│ • Open questions │
│ • Key insights / quotes │
│ • Follow-up meetings needed │
│ • Implicit commitments │
│ + Confidence scores │
└──────────────────────────────────┘
Structured JSON / Markdown + Optional CRM Push
```
## Dependencies
- Python 3.9+
- `anthropic` or `openai` (for LLM-powered analysis)
- `requests` (for optional HubSpot integration)

View file

@ -0,0 +1,666 @@
#!/usr/bin/env python3
"""
Meeting-to-Action Extractor
Takes meeting transcripts and extracts structured action items, decisions,
follow-ups, and insights using LLM analysis.
Usage:
# Single transcript
python3 meeting_action_extractor.py --transcript meeting.txt
# Output as JSON
python3 meeting_action_extractor.py --transcript meeting.txt --format json
# Output as markdown (default)
python3 meeting_action_extractor.py --transcript meeting.txt --format markdown
# Batch mode — process a directory of transcripts
python3 meeting_action_extractor.py --batch ./transcripts/ --output ./actions/
# Read from stdin (pipe or paste)
cat meeting.txt | python3 meeting_action_extractor.py --stdin
# Push action items to HubSpot as tasks
python3 meeting_action_extractor.py --transcript meeting.txt --push-hubspot
# Dry run (no LLM calls, shows what would be processed)
python3 meeting_action_extractor.py --transcript meeting.txt --dry-run
"""
import argparse
import glob
import json
import os
import sys
from datetime import datetime
from pathlib import Path
from typing import Optional
# ---------------------------------------------------------------------------
# LLM Integration
# ---------------------------------------------------------------------------
EXTRACTION_SYSTEM_PROMPT = """You are an expert meeting analyst. Your job is to extract structured information from meeting transcripts with high accuracy.
You must return ONLY valid JSON (no markdown, no explanation) matching this exact schema:
{
"meeting_title": "string — inferred from context",
"meeting_date": "string — date if mentioned, else 'Unknown'",
"attendees": ["string — names mentioned as present"],
"decisions": [
{
"decision": "string — what was decided",
"made_by": "string — who made or drove the decision",
"context": "string — brief context/reasoning",
"confidence": 0.0-1.0
}
],
"action_items": [
{
"action": "string — specific task to be done",
"owner": "string — person responsible",
"deadline": "string — deadline if mentioned, else null",
"priority": "high|medium|low",
"is_implicit": false,
"source_quote": "string — the relevant quote from transcript",
"confidence": 0.0-1.0
}
],
"open_questions": [
{
"question": "string — unresolved question or topic",
"raised_by": "string — who raised it, if clear",
"context": "string — brief context",
"confidence": 0.0-1.0
}
],
"key_insights": [
{
"insight": "string — notable observation, data point, or quotable moment",
"speaker": "string — who said it",
"quote": "string — direct quote if available",
"confidence": 0.0-1.0
}
],
"follow_up_meetings": [
{
"topic": "string — what needs follow-up discussion",
"suggested_attendees": ["string"],
"urgency": "high|medium|low",
"confidence": 0.0-1.0
}
]
}
RULES:
- Detect implicit commitments. Phrases like "I'll handle that", "let me look into it", "I can take care of that", "we should probably..." are action items.
- Assign confidence scores: 1.0 = explicitly stated, 0.8 = strongly implied, 0.5-0.7 = inferred from context, <0.5 = uncertain.
- For priority: high = mentioned as urgent/blocking/deadline-sensitive. medium = important but not blocking. low = nice-to-have or background task.
- If someone says "I'll do X by Friday" that's an action item with owner and deadline.
- If a question is asked and not answered in the transcript, it's an open question.
- Be exhaustive. Missing an action item is worse than including a low-confidence one."""
def call_llm(prompt: str, system_prompt: str = "") -> str:
"""
Call the configured LLM provider.
Set LLM_PROVIDER to 'anthropic' or 'openai'.
"""
provider = os.getenv("LLM_PROVIDER", "anthropic").lower()
model = os.getenv("LLM_MODEL", "")
if provider == "anthropic":
api_key = os.getenv("ANTHROPIC_API_KEY", "")
if not api_key:
return _fallback_extraction()
try:
import anthropic
client = anthropic.Anthropic(api_key=api_key)
message = client.messages.create(
model=model or "claude-sonnet-4-20250514",
max_tokens=4096,
system=system_prompt,
messages=[{"role": "user", "content": prompt}],
)
return message.content[0].text
except ImportError:
print("Warning: 'anthropic' package not installed. Using fallback.", file=sys.stderr)
return _fallback_extraction()
except Exception as e:
print(f"Warning: Anthropic API error: {e}. Using fallback.", file=sys.stderr)
return _fallback_extraction()
elif provider == "openai":
api_key = os.getenv("OPENAI_API_KEY", "")
if not api_key:
return _fallback_extraction()
try:
import openai
client = openai.OpenAI(api_key=api_key)
response = client.chat.completions.create(
model=model or "gpt-4o",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": prompt},
],
max_tokens=4096,
response_format={"type": "json_object"},
)
return response.choices[0].message.content
except ImportError:
print("Warning: 'openai' package not installed. Using fallback.", file=sys.stderr)
return _fallback_extraction()
except Exception as e:
print(f"Warning: OpenAI API error: {e}. Using fallback.", file=sys.stderr)
return _fallback_extraction()
else:
print(f"Warning: Unknown LLM provider '{provider}'.", file=sys.stderr)
return _fallback_extraction()
def _fallback_extraction() -> str:
"""Return a placeholder when no LLM is available."""
return json.dumps({
"meeting_title": "Unknown (LLM unavailable)",
"meeting_date": "Unknown",
"attendees": [],
"decisions": [],
"action_items": [],
"open_questions": [],
"key_insights": [],
"follow_up_meetings": [],
"_error": "No LLM API key configured. Set ANTHROPIC_API_KEY or OPENAI_API_KEY.",
})
# ---------------------------------------------------------------------------
# Extraction
# ---------------------------------------------------------------------------
def extract_from_transcript(transcript: str) -> dict:
"""
Send a transcript to the LLM and parse the structured extraction.
"""
# Truncate extremely long transcripts to avoid token limits
max_chars = 100_000 # ~25k tokens
if len(transcript) > max_chars:
print(
f"Warning: Transcript truncated from {len(transcript)} to {max_chars} chars.",
file=sys.stderr,
)
transcript = transcript[:max_chars] + "\n\n[TRANSCRIPT TRUNCATED]"
prompt = f"""Extract all decisions, action items, open questions, key insights, and follow-up meetings from this meeting transcript.
Return ONLY valid JSON matching the schema in your instructions.
---
TRANSCRIPT:
{transcript}
---"""
raw_response = call_llm(prompt, system_prompt=EXTRACTION_SYSTEM_PROMPT)
# Parse JSON from response (handle potential markdown wrapping)
try:
# Try direct parse first
return json.loads(raw_response)
except json.JSONDecodeError:
# Try to extract JSON from markdown code block
if "```json" in raw_response:
json_str = raw_response.split("```json")[1].split("```")[0].strip()
return json.loads(json_str)
elif "```" in raw_response:
json_str = raw_response.split("```")[1].split("```")[0].strip()
return json.loads(json_str)
else:
print("Error: Could not parse LLM response as JSON.", file=sys.stderr)
return {
"meeting_title": "Parse Error",
"decisions": [],
"action_items": [],
"open_questions": [],
"key_insights": [],
"follow_up_meetings": [],
"_error": f"Failed to parse LLM response. Raw: {raw_response[:500]}",
}
# ---------------------------------------------------------------------------
# Output Formatting
# ---------------------------------------------------------------------------
def format_markdown(extraction: dict, source_file: Optional[str] = None) -> str:
"""Format extraction results as readable markdown."""
lines = []
title = extraction.get("meeting_title", "Meeting Notes")
date = extraction.get("meeting_date", "Unknown")
lines.extend([
f"# {title}",
"",
f"**Date:** {date}",
f"**Extracted:** {datetime.now().strftime('%Y-%m-%d %H:%M')}",
])
if source_file:
lines.append(f"**Source:** {source_file}")
attendees = extraction.get("attendees", [])
if attendees:
lines.append(f"**Attendees:** {', '.join(attendees)}")
lines.append("")
# --- Decisions ---
decisions = extraction.get("decisions", [])
if decisions:
lines.extend(["## Decisions Made", ""])
for i, d in enumerate(decisions, 1):
conf = d.get("confidence", 0)
conf_bar = "🟢" if conf >= 0.8 else "🟡" if conf >= 0.5 else "🔴"
lines.append(f"{i}. **{d.get('decision', 'Unknown')}**")
lines.append(f" - Made by: {d.get('made_by', 'Unknown')}")
lines.append(f" - Context: {d.get('context', 'N/A')}")
lines.append(f" - Confidence: {conf_bar} {conf:.0%}")
lines.append("")
# --- Action Items ---
actions = extraction.get("action_items", [])
if actions:
lines.extend(["## Action Items", ""])
# Sort by priority
priority_order = {"high": 0, "medium": 1, "low": 2}
actions_sorted = sorted(actions, key=lambda a: priority_order.get(a.get("priority", "medium"), 1))
for i, a in enumerate(actions_sorted, 1):
priority = a.get("priority", "medium")
priority_emoji = {"high": "🔴", "medium": "🟡", "low": "🟢"}.get(priority, "")
implicit_tag = " *(implicit)*" if a.get("is_implicit") else ""
deadline = a.get("deadline") or "No deadline"
conf = a.get("confidence", 0)
lines.append(f"{i}. {priority_emoji} **{a.get('action', 'Unknown')}**{implicit_tag}")
lines.append(f" - Owner: **{a.get('owner', 'Unassigned')}**")
lines.append(f" - Deadline: {deadline}")
lines.append(f" - Confidence: {conf:.0%}")
if a.get("source_quote"):
lines.append(f' - Source: "{a["source_quote"]}"')
lines.append("")
# --- Open Questions ---
questions = extraction.get("open_questions", [])
if questions:
lines.extend(["## Open Questions", ""])
for i, q in enumerate(questions, 1):
lines.append(f"{i}. **{q.get('question', 'Unknown')}**")
if q.get("raised_by"):
lines.append(f" - Raised by: {q['raised_by']}")
if q.get("context"):
lines.append(f" - Context: {q['context']}")
lines.append("")
# --- Key Insights ---
insights = extraction.get("key_insights", [])
if insights:
lines.extend(["## Key Insights", ""])
for i, ins in enumerate(insights, 1):
lines.append(f"{i}. **{ins.get('insight', 'Unknown')}**")
if ins.get("speaker"):
lines.append(f" - Speaker: {ins['speaker']}")
if ins.get("quote"):
lines.append(f' - Quote: "{ins["quote"]}"')
lines.append("")
# --- Follow-up Meetings ---
followups = extraction.get("follow_up_meetings", [])
if followups:
lines.extend(["## Follow-up Meetings Needed", ""])
for i, fu in enumerate(followups, 1):
urgency_emoji = {"high": "🔴", "medium": "🟡", "low": "🟢"}.get(fu.get("urgency", "medium"), "")
lines.append(f"{i}. {urgency_emoji} **{fu.get('topic', 'Unknown')}**")
attendees_list = fu.get("suggested_attendees", [])
if attendees_list:
lines.append(f" - Attendees: {', '.join(attendees_list)}")
lines.append("")
# --- Summary Stats ---
lines.extend([
"---",
"",
"### Extraction Summary",
f"- Decisions: {len(decisions)}",
f"- Action Items: {len(actions)} ({sum(1 for a in actions if a.get('priority') == 'high')} high priority)",
f"- Open Questions: {len(questions)}",
f"- Key Insights: {len(insights)}",
f"- Follow-up Meetings: {len(followups)}",
"",
"*Generated by Meeting-to-Action Extractor*",
])
return "\n".join(lines)
# ---------------------------------------------------------------------------
# HubSpot Integration (stub with real API structure)
# ---------------------------------------------------------------------------
def push_to_hubspot(extraction: dict) -> dict:
"""
Push action items to HubSpot as tasks.
Requires HUBSPOT_API_KEY env var.
Creates a task for each action item with owner, deadline, and priority.
Returns a summary of created/failed tasks.
"""
api_key = os.getenv("HUBSPOT_API_KEY", "")
if not api_key:
return {
"success": False,
"error": "HUBSPOT_API_KEY not set. Cannot push to HubSpot.",
"created": 0,
"failed": 0,
}
actions = extraction.get("action_items", [])
if not actions:
return {"success": True, "created": 0, "failed": 0, "message": "No action items to push."}
# --- HubSpot Task Creation ---
# Uses the HubSpot CRM API v3 to create tasks (engagements)
# Docs: https://developers.hubspot.com/docs/api/crm/tasks
import requests # only imported when actually pushing
results = {"created": 0, "failed": 0, "errors": []}
hubspot_url = "https://api.hubapi.com/crm/v3/objects/tasks"
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
}
# Map priority to HubSpot priority values
priority_map = {"high": "HIGH", "medium": "MEDIUM", "low": "LOW"}
for action in actions:
# Build the task payload
task_body = action.get("action", "Meeting action item")
owner_name = action.get("owner", "Unassigned")
deadline = action.get("deadline")
priority = priority_map.get(action.get("priority", "medium"), "MEDIUM")
meeting_title = extraction.get("meeting_title", "Meeting")
task_subject = f"[{meeting_title}] {task_body[:100]}"
payload = {
"properties": {
"hs_task_subject": task_subject,
"hs_task_body": (
f"Action: {task_body}\n"
f"Owner: {owner_name}\n"
f"Source: Meeting transcript extraction\n"
f"Confidence: {action.get('confidence', 'N/A')}"
),
"hs_task_status": "NOT_STARTED",
"hs_task_priority": priority,
}
}
# Add due date if we have a deadline
if deadline and deadline.lower() not in ("none", "no deadline", "null", "tbd"):
payload["properties"]["hs_timestamp"] = deadline # HubSpot expects ISO format
try:
resp = requests.post(hubspot_url, headers=headers, json=payload, timeout=10)
if resp.status_code in (200, 201):
results["created"] += 1
else:
results["failed"] += 1
results["errors"].append(f"Task '{task_body[:50]}': HTTP {resp.status_code}")
except requests.RequestException as e:
results["failed"] += 1
results["errors"].append(f"Task '{task_body[:50]}': {str(e)}")
results["success"] = results["failed"] == 0
return results
# ---------------------------------------------------------------------------
# Batch Processing
# ---------------------------------------------------------------------------
def process_batch(directory: str, output_dir: Optional[str], fmt: str, push_hs: bool) -> list[dict]:
"""
Process all transcript files in a directory.
Supports .txt, .md, and .json files.
"""
transcript_files = []
for ext in ("*.txt", "*.md", "*.json"):
transcript_files.extend(glob.glob(os.path.join(directory, ext)))
transcript_files.sort()
if not transcript_files:
print(f"No transcript files found in {directory}", file=sys.stderr)
return []
print(f"📂 Found {len(transcript_files)} transcripts to process", file=sys.stderr)
if output_dir:
os.makedirs(output_dir, exist_ok=True)
results = []
for i, filepath in enumerate(transcript_files, 1):
filename = os.path.basename(filepath)
print(f"\n[{i}/{len(transcript_files)}] Processing: {filename}", file=sys.stderr)
with open(filepath, "r") as f:
transcript = f.read()
extraction = extract_from_transcript(transcript)
extraction["_source_file"] = filepath
if fmt == "markdown":
output = format_markdown(extraction, source_file=filename)
ext = ".md"
else:
output = json.dumps(extraction, indent=2, default=str)
ext = ".json"
if output_dir:
out_filename = Path(filename).stem + f"_actions{ext}"
out_path = os.path.join(output_dir, out_filename)
with open(out_path, "w") as f:
f.write(output)
print(f" ✅ Saved to {out_path}", file=sys.stderr)
else:
print(output)
print("\n" + "=" * 80 + "\n")
if push_hs:
hs_result = push_to_hubspot(extraction)
print(
f" 📤 HubSpot: {hs_result.get('created', 0)} created, "
f"{hs_result.get('failed', 0)} failed",
file=sys.stderr,
)
results.append(extraction)
return results
# ---------------------------------------------------------------------------
# CLI
# ---------------------------------------------------------------------------
def main():
parser = argparse.ArgumentParser(
description="Meeting-to-Action Extractor — Extract decisions, action items, and follow-ups from meeting transcripts.",
epilog="Supports single transcripts, stdin, and batch processing of entire directories.",
)
# Input source (mutually exclusive)
input_group = parser.add_mutually_exclusive_group(required=True)
input_group.add_argument(
"--transcript", "-t",
help="Path to a single transcript file (.txt, .md).",
)
input_group.add_argument(
"--batch", "-b",
help="Directory of transcript files to process in batch.",
)
input_group.add_argument(
"--stdin",
action="store_true",
help="Read transcript from stdin.",
)
# Output options
parser.add_argument(
"--output", "-o",
help="Output file (single mode) or directory (batch mode).",
)
parser.add_argument(
"--format", "-f",
choices=["markdown", "json"],
default="markdown",
help="Output format (default: markdown).",
)
# Integration options
parser.add_argument(
"--push-hubspot",
action="store_true",
help="Push action items to HubSpot as tasks (requires HUBSPOT_API_KEY).",
)
# Execution options
parser.add_argument(
"--dry-run",
action="store_true",
help="Show what would be processed without making LLM calls.",
)
args = parser.parse_args()
# --- Single transcript mode ---
if args.transcript:
if not os.path.exists(args.transcript):
print(f"Error: File not found: {args.transcript}", file=sys.stderr)
sys.exit(1)
with open(args.transcript, "r") as f:
transcript = f.read()
if args.dry_run:
word_count = len(transcript.split())
print(f"📄 Would process: {args.transcript} ({word_count} words, {len(transcript)} chars)")
print(f" Format: {args.format}")
print(f" HubSpot push: {'yes' if args.push_hubspot else 'no'}")
return
print(f"📄 Processing: {args.transcript}", file=sys.stderr)
extraction = extract_from_transcript(transcript)
if args.format == "markdown":
output = format_markdown(extraction, source_file=args.transcript)
else:
output = json.dumps(extraction, indent=2, default=str)
if args.output:
with open(args.output, "w") as f:
f.write(output)
print(f"✅ Written to {args.output}", file=sys.stderr)
else:
print(output)
if args.push_hubspot:
hs_result = push_to_hubspot(extraction)
print(
f"📤 HubSpot: {hs_result.get('created', 0)} tasks created, "
f"{hs_result.get('failed', 0)} failed",
file=sys.stderr,
)
# Print summary to stderr
actions = extraction.get("action_items", [])
decisions = extraction.get("decisions", [])
print(
f"\n📊 Extracted: {len(decisions)} decisions, {len(actions)} action items "
f"({sum(1 for a in actions if a.get('priority') == 'high')} high priority)",
file=sys.stderr,
)
# --- Batch mode ---
elif args.batch:
if not os.path.isdir(args.batch):
print(f"Error: Directory not found: {args.batch}", file=sys.stderr)
sys.exit(1)
if args.dry_run:
files = []
for ext in ("*.txt", "*.md", "*.json"):
files.extend(glob.glob(os.path.join(args.batch, ext)))
print(f"📂 Would process {len(files)} files from {args.batch}:")
for f in sorted(files):
print(f" - {os.path.basename(f)}")
return
results = process_batch(args.batch, args.output, args.format, args.push_hubspot)
# Batch summary
total_actions = sum(len(r.get("action_items", [])) for r in results)
total_decisions = sum(len(r.get("decisions", [])) for r in results)
print(
f"\n📊 Batch complete: {len(results)} transcripts → "
f"{total_decisions} decisions, {total_actions} action items",
file=sys.stderr,
)
# --- Stdin mode ---
elif args.stdin:
if args.dry_run:
print("📄 Would process transcript from stdin")
return
print("📄 Reading transcript from stdin...", file=sys.stderr)
transcript = sys.stdin.read()
if not transcript.strip():
print("Error: Empty input.", file=sys.stderr)
sys.exit(1)
extraction = extract_from_transcript(transcript)
if args.format == "markdown":
output = format_markdown(extraction)
else:
output = json.dumps(extraction, indent=2, default=str)
if args.output:
with open(args.output, "w") as f:
f.write(output)
print(f"✅ Written to {args.output}", file=sys.stderr)
else:
print(output)
if args.push_hubspot:
hs_result = push_to_hubspot(extraction)
print(
f"📤 HubSpot: {hs_result.get('created', 0)} tasks created",
file=sys.stderr,
)
if __name__ == "__main__":
main()

View file

@ -0,0 +1,6 @@
# Core LLM providers (install at least one)
anthropic>=0.39.0
openai>=1.50.0
# For HubSpot CRM integration (optional)
requests>=2.31.0

View file

@ -0,0 +1,746 @@
#!/usr/bin/env python3
"""
Team Performance Audit The "Elon Algorithm"
A ruthless, structured team performance evaluation framework.
5-step analysis individual scorecards stack rank recommended actions.
The 5 Steps:
1. Question every requirement is this role/task actually necessary?
2. Delete redundant processes flag overlap between team members
3. Simplify identify overcomplicated workflows
4. Accelerate find bottlenecks slowing the team
5. Automate flag tasks that AI/automation could handle
Usage:
# Analyze from JSON input
python3 team_performance_audit.py --input team_data.json --output report.md
# Analyze from CSV
python3 team_performance_audit.py --input team_data.csv --output report.md
# Dry run (print to stdout, no LLM calls)
python3 team_performance_audit.py --input team_data.json --dry-run
# JSON output instead of markdown
python3 team_performance_audit.py --input team_data.json --format json --output report.json
Input format (JSON):
{
"team_members": [
{
"name": "Alice Chen",
"role": "Senior Engineer",
"role_description": "Owns backend API development and database optimization",
"okrs": [
{"objective": "Reduce API latency", "key_result": "P95 < 200ms", "progress": 0.85}
],
"metrics": {
"tasks_completed": 47,
"tasks_assigned": 52,
"avg_completion_days": 3.2,
"quality_score": 92,
"peer_feedback_score": 4.5,
"initiatives_proposed": 3,
"initiatives_shipped": 2
},
"deliverables": [
{"name": "API v2 Migration", "status": "completed", "date": "2024-02-15"},
{"name": "DB Index Optimization", "status": "completed", "date": "2024-03-01"}
]
}
],
"org_context": {
"company_goals": ["Ship v3 by Q2", "Reduce infrastructure costs 30%"],
"team_size": 12,
"evaluation_period": "Q1 2024"
}
}
Input format (CSV):
name,role,tasks_completed,tasks_assigned,avg_completion_days,quality_score,peer_feedback_score,initiatives_proposed,initiatives_shipped
Alice Chen,Senior Engineer,47,52,3.2,92,4.5,3,2
"""
import argparse
import csv
import json
import os
import sys
from datetime import datetime
from typing import Any
# ---------------------------------------------------------------------------
# LLM Integration (stubs with real API structure)
# ---------------------------------------------------------------------------
def call_llm(prompt: str, system_prompt: str = "") -> str:
"""
Call the configured LLM provider for analysis.
Supports Anthropic (Claude) and OpenAI (GPT-4).
Set LLM_PROVIDER env var to 'anthropic' or 'openai'.
Set the corresponding API key env var.
Returns the LLM response text, or a placeholder if no API key is set.
"""
provider = os.getenv("LLM_PROVIDER", "anthropic").lower()
model = os.getenv("LLM_MODEL", "")
if provider == "anthropic":
api_key = os.getenv("ANTHROPIC_API_KEY", "")
if not api_key:
return _fallback_analysis(prompt)
# --- Anthropic API call ---
try:
import anthropic
client = anthropic.Anthropic(api_key=api_key)
message = client.messages.create(
model=model or "claude-sonnet-4-20250514",
max_tokens=4096,
system=system_prompt or "You are an expert organizational analyst and management consultant.",
messages=[{"role": "user", "content": prompt}],
)
return message.content[0].text
except ImportError:
print("Warning: 'anthropic' package not installed. Using fallback analysis.", file=sys.stderr)
return _fallback_analysis(prompt)
except Exception as e:
print(f"Warning: Anthropic API error: {e}. Using fallback analysis.", file=sys.stderr)
return _fallback_analysis(prompt)
elif provider == "openai":
api_key = os.getenv("OPENAI_API_KEY", "")
if not api_key:
return _fallback_analysis(prompt)
# --- OpenAI API call ---
try:
import openai
client = openai.OpenAI(api_key=api_key)
response = client.chat.completions.create(
model=model or "gpt-4o",
messages=[
{"role": "system", "content": system_prompt or "You are an expert organizational analyst and management consultant."},
{"role": "user", "content": prompt},
],
max_tokens=4096,
)
return response.choices[0].message.content
except ImportError:
print("Warning: 'openai' package not installed. Using fallback analysis.", file=sys.stderr)
return _fallback_analysis(prompt)
except Exception as e:
print(f"Warning: OpenAI API error: {e}. Using fallback analysis.", file=sys.stderr)
return _fallback_analysis(prompt)
else:
print(f"Warning: Unknown LLM provider '{provider}'. Using fallback.", file=sys.stderr)
return _fallback_analysis(prompt)
def _fallback_analysis(prompt: str) -> str:
"""Fallback when no LLM API is available. Returns a notice."""
return (
"[LLM analysis unavailable — set ANTHROPIC_API_KEY or OPENAI_API_KEY]\n"
"The quantitative scores below are computed locally. "
"For qualitative analysis (redundancy detection, simplification recommendations, "
"automation opportunities), configure an LLM provider."
)
# ---------------------------------------------------------------------------
# Data Loading
# ---------------------------------------------------------------------------
def load_json_input(filepath: str) -> dict:
"""Load team data from a JSON file."""
with open(filepath, "r") as f:
data = json.load(f)
if "team_members" not in data:
raise ValueError("JSON input must contain a 'team_members' array.")
return data
def load_csv_input(filepath: str) -> dict:
"""
Load team data from a CSV file.
Expected columns: name, role, tasks_completed, tasks_assigned,
avg_completion_days, quality_score, peer_feedback_score,
initiatives_proposed, initiatives_shipped
"""
team_members = []
with open(filepath, "r") as f:
reader = csv.DictReader(f)
for row in reader:
member = {
"name": row.get("name", "Unknown"),
"role": row.get("role", "Unknown"),
"role_description": row.get("role_description", ""),
"okrs": [],
"metrics": {
"tasks_completed": int(row.get("tasks_completed", 0)),
"tasks_assigned": int(row.get("tasks_assigned", 0)),
"avg_completion_days": float(row.get("avg_completion_days", 0)),
"quality_score": float(row.get("quality_score", 0)),
"peer_feedback_score": float(row.get("peer_feedback_score", 0)),
"initiatives_proposed": int(row.get("initiatives_proposed", 0)),
"initiatives_shipped": int(row.get("initiatives_shipped", 0)),
},
"deliverables": [],
}
team_members.append(member)
return {"team_members": team_members, "org_context": {}}
def load_input(filepath: str) -> dict:
"""Load team data from JSON or CSV based on file extension."""
if filepath.endswith(".csv"):
return load_csv_input(filepath)
else:
return load_json_input(filepath)
# ---------------------------------------------------------------------------
# Scoring Engine
# ---------------------------------------------------------------------------
# Weight configuration for the composite score
SCORE_WEIGHTS = {
"output_velocity": 0.30, # Speed and throughput
"quality": 0.30, # Quality of deliverables
"independence": 0.20, # Self-direction, low management overhead
"initiative": 0.20, # Proactive contributions beyond assigned work
}
# Tier thresholds
TIER_THRESHOLDS = {
"A": 80, # A-player: top performers, promote/retain
"B": 55, # B-player: solid contributors, coach to A or maintain
"C": 0, # C-player: underperforming, reassign or exit
}
def compute_output_velocity(metrics: dict) -> float:
"""
Score output velocity (0-100).
Factors:
- Task completion rate (completed / assigned)
- Speed (inverse of avg_completion_days, normalized)
"""
completed = metrics.get("tasks_completed", 0)
assigned = metrics.get("tasks_assigned", 1) # avoid division by zero
avg_days = metrics.get("avg_completion_days", 5)
# Completion rate: 0-60 points
completion_rate = min(completed / max(assigned, 1), 1.0)
completion_score = completion_rate * 60
# Speed: 0-40 points (faster = better, assumes <2 days is excellent, >10 is poor)
if avg_days <= 1:
speed_score = 40
elif avg_days >= 10:
speed_score = 0
else:
speed_score = max(0, 40 * (1 - (avg_days - 1) / 9))
return round(completion_score + speed_score, 1)
def compute_quality(metrics: dict) -> float:
"""
Score quality (0-100).
Factors:
- Quality score from reviews/metrics (0-100 scale expected)
- Peer feedback score (1-5 scale, normalized to 0-100)
"""
quality_raw = metrics.get("quality_score", 50)
peer_score = metrics.get("peer_feedback_score", 3.0)
# Quality component: 60% weight
quality_component = min(quality_raw, 100) * 0.6
# Peer feedback: 40% weight (1-5 scale → 0-100)
peer_normalized = max(0, min((peer_score - 1) / 4 * 100, 100))
peer_component = peer_normalized * 0.4
return round(quality_component + peer_component, 1)
def compute_independence(metrics: dict) -> float:
"""
Score independence (0-100).
Heuristic based on:
- High completion rate (doesn't need hand-holding)
- Low avg_completion_days relative to task volume
- Peer feedback as proxy for collaboration without dependency
Note: For richer scoring, add fields like 'escalations_to_manager',
'blockers_raised', 'self_unblocked_count' to your input data.
"""
completed = metrics.get("tasks_completed", 0)
assigned = metrics.get("tasks_assigned", 1)
peer_score = metrics.get("peer_feedback_score", 3.0)
# Completion without escalation proxy: 60% weight
completion_rate = min(completed / max(assigned, 1), 1.0)
completion_component = completion_rate * 60
# Peer score as collaboration proxy: 40% weight
peer_normalized = max(0, min((peer_score - 1) / 4 * 100, 100))
peer_component = peer_normalized * 0.4
return round(completion_component + peer_component, 1)
def compute_initiative(metrics: dict) -> float:
"""
Score initiative (0-100).
Factors:
- Initiatives proposed (ideas beyond assigned work)
- Initiatives shipped (executed, not just suggested)
- Ship rate (proposed shipped conversion)
"""
proposed = metrics.get("initiatives_proposed", 0)
shipped = metrics.get("initiatives_shipped", 0)
# Volume: 0-50 points (caps at 5+ proposed)
volume_score = min(proposed / 5, 1.0) * 50
# Ship rate: 0-30 points
if proposed > 0:
ship_rate = min(shipped / proposed, 1.0)
ship_score = ship_rate * 30
else:
ship_score = 0
# Shipped count bonus: 0-20 points (caps at 3+ shipped)
shipped_bonus = min(shipped / 3, 1.0) * 20
return round(volume_score + ship_score + shipped_bonus, 1)
def compute_composite_score(metrics: dict) -> dict:
"""Compute all dimension scores and weighted composite."""
velocity = compute_output_velocity(metrics)
quality = compute_quality(metrics)
independence = compute_independence(metrics)
initiative = compute_initiative(metrics)
composite = (
velocity * SCORE_WEIGHTS["output_velocity"]
+ quality * SCORE_WEIGHTS["quality"]
+ independence * SCORE_WEIGHTS["independence"]
+ initiative * SCORE_WEIGHTS["initiative"]
)
# Determine tier
if composite >= TIER_THRESHOLDS["A"]:
tier = "A"
elif composite >= TIER_THRESHOLDS["B"]:
tier = "B"
else:
tier = "C"
return {
"output_velocity": velocity,
"quality": quality,
"independence": independence,
"initiative": initiative,
"composite": round(composite, 1),
"tier": tier,
}
def recommend_action(tier: str, scores: dict) -> str:
"""Generate recommended action based on tier and score profile."""
if tier == "A":
if scores["initiative"] >= 80:
return "PROMOTE — High performer with strong initiative. Leadership candidate."
return "RETAIN & REWARD — Top performer. Ensure compensation and growth path are competitive."
elif tier == "B":
weakest = min(
["output_velocity", "quality", "independence", "initiative"],
key=lambda k: scores[k],
)
weak_labels = {
"output_velocity": "speed/throughput",
"quality": "deliverable quality",
"independence": "self-direction",
"initiative": "proactive contribution",
}
return f"COACH — Solid contributor. Focus development on {weak_labels[weakest]} (score: {scores[weakest]})."
else: # C
if scores["composite"] < 30:
return "EXIT — Significant underperformance across dimensions. Consider transition plan."
return "REASSIGN or PIP — Underperforming in current role. Evaluate fit for different position."
# ---------------------------------------------------------------------------
# Elon Algorithm: 5-Step Analysis (LLM-powered)
# ---------------------------------------------------------------------------
def run_elon_algorithm(data: dict) -> str:
"""
Run the 5-step Elon Algorithm analysis using LLM.
Steps:
1. Question every requirement
2. Delete redundant processes
3. Simplify workflows
4. Accelerate bottlenecks
5. Automate what's possible
"""
team_summary = []
for m in data["team_members"]:
team_summary.append(
f"- {m['name']} ({m['role']}): {m.get('role_description', 'No description')}"
)
org_ctx = data.get("org_context", {})
goals = org_ctx.get("company_goals", ["Not specified"])
prompt = f"""Analyze this team using the Elon Algorithm — a ruthless 5-step organizational optimization framework.
## Team ({len(data['team_members'])} members)
{chr(10).join(team_summary)}
## Company Goals
{chr(10).join(f'- {g}' for g in goals)}
## Evaluation Period
{org_ctx.get('evaluation_period', 'Current quarter')}
## Full Team Data
{json.dumps(data['team_members'], indent=2, default=str)}
---
For each of the 5 steps below, provide SPECIFIC, ACTIONABLE findings (not generic advice):
### Step 1: Question Every Requirement
For each role, ask: Is this role necessary? Is every task they do necessary? Could the team function without this position? Which tasks they perform have no clear connection to company goals?
### Step 2: Delete Redundant Processes
Identify: Overlapping responsibilities between team members. Duplicate efforts. Roles that could be consolidated. Meetings or processes that exist by inertia.
### Step 3: Simplify
Find: Overcomplicated workflows. Multi-step processes that could be 1-2 steps. Unnecessary approval chains. Reports nobody reads.
### Step 4: Accelerate
Identify bottlenecks: Who/what is the slowest link? Where do tasks get stuck? What dependencies create wait times? What would unblock the most throughput?
### Step 5: Automate
Flag tasks ripe for AI/automation: Data entry, reporting, scheduling, template-based work, monitoring, routing, classification. Estimate effort saved.
Be specific. Name names. Reference actual data. This is a performance audit, not a feel-good exercise."""
return call_llm(
prompt,
system_prompt=(
"You are a ruthless organizational efficiency consultant. "
"Your job is to find waste, redundancy, and inefficiency. "
"Be direct, specific, and actionable. Name names when the data supports it. "
"Do not hedge or soften findings."
),
)
# ---------------------------------------------------------------------------
# Report Generation
# ---------------------------------------------------------------------------
def generate_scorecards(data: dict) -> list[dict]:
"""Score every team member and return sorted scorecards."""
scorecards = []
for member in data["team_members"]:
metrics = member.get("metrics", {})
scores = compute_composite_score(metrics)
action = recommend_action(scores["tier"], scores)
# OKR progress summary
okrs = member.get("okrs", [])
okr_avg = 0.0
if okrs:
okr_avg = sum(o.get("progress", 0) for o in okrs) / len(okrs)
scorecards.append({
"name": member["name"],
"role": member["role"],
"scores": scores,
"action": action,
"okr_progress": round(okr_avg * 100, 1),
"deliverables": member.get("deliverables", []),
})
# Sort by composite score descending (stack rank)
scorecards.sort(key=lambda x: x["scores"]["composite"], reverse=True)
# Add rank
for i, sc in enumerate(scorecards, 1):
sc["rank"] = i
return scorecards
def format_markdown_report(scorecards: list[dict], elon_analysis: str, data: dict) -> str:
"""Generate the full markdown report."""
org_ctx = data.get("org_context", {})
now = datetime.now().strftime("%Y-%m-%d %H:%M")
lines = [
f"# Team Performance Audit",
f"",
f"**Generated:** {now}",
f"**Team Size:** {len(scorecards)}",
f"**Period:** {org_ctx.get('evaluation_period', 'Current')}",
f"",
]
# --- Executive Summary ---
a_count = sum(1 for s in scorecards if s["scores"]["tier"] == "A")
b_count = sum(1 for s in scorecards if s["scores"]["tier"] == "B")
c_count = sum(1 for s in scorecards if s["scores"]["tier"] == "C")
avg_composite = sum(s["scores"]["composite"] for s in scorecards) / max(len(scorecards), 1)
lines.extend([
"## Executive Summary",
"",
f"| Metric | Value |",
f"|--------|-------|",
f"| Team Average Score | {avg_composite:.1f}/100 |",
f"| A-Players | {a_count} ({a_count/max(len(scorecards),1)*100:.0f}%) |",
f"| B-Players | {b_count} ({b_count/max(len(scorecards),1)*100:.0f}%) |",
f"| C-Players | {c_count} ({c_count/max(len(scorecards),1)*100:.0f}%) |",
"",
])
# Health assessment
if a_count / max(len(scorecards), 1) >= 0.3:
lines.append("**Assessment:** Strong team core. Focus on coaching B-players up and addressing C-players decisively.")
elif c_count / max(len(scorecards), 1) >= 0.3:
lines.append("**Assessment:** ⚠️ Significant underperformance. Org restructuring recommended.")
else:
lines.append("**Assessment:** Average team composition. Targeted development can move the needle.")
lines.append("")
# --- Stack Rank ---
lines.extend([
"## Stack Rank",
"",
"| Rank | Name | Role | Composite | Tier | Action |",
"|------|------|------|-----------|------|--------|",
])
for sc in scorecards:
tier_emoji = {"A": "🟢", "B": "🟡", "C": "🔴"}[sc["scores"]["tier"]]
lines.append(
f"| {sc['rank']} | {sc['name']} | {sc['role']} | "
f"{sc['scores']['composite']} | {tier_emoji} {sc['scores']['tier']} | "
f"{sc['action'].split('')[0]} |"
)
lines.append("")
# --- Elon Algorithm Analysis ---
lines.extend([
"## Elon Algorithm — 5-Step Analysis",
"",
elon_analysis,
"",
])
# --- Individual Scorecards ---
lines.extend([
"## Individual Scorecards",
"",
])
for sc in scorecards:
tier_emoji = {"A": "🟢", "B": "🟡", "C": "🔴"}[sc["scores"]["tier"]]
scores = sc["scores"]
lines.extend([
f"### #{sc['rank']}{sc['name']} ({sc['role']})",
"",
f"**Tier:** {tier_emoji} {scores['tier']}-Player | **Composite:** {scores['composite']}/100",
"",
f"| Dimension | Score |",
f"|-----------|-------|",
f"| Output Velocity | {scores['output_velocity']}/100 |",
f"| Quality | {scores['quality']}/100 |",
f"| Independence | {scores['independence']}/100 |",
f"| Initiative | {scores['initiative']}/100 |",
"",
])
if sc["okr_progress"] > 0:
lines.append(f"**OKR Progress:** {sc['okr_progress']}%")
lines.append("")
if sc["deliverables"]:
lines.append("**Recent Deliverables:**")
for d in sc["deliverables"]:
status_emoji = "" if d.get("status") == "completed" else "🔄"
lines.append(f"- {status_emoji} {d.get('name', 'Unknown')} ({d.get('status', 'unknown')}, {d.get('date', 'no date')})")
lines.append("")
lines.append(f"**Recommended Action:** {sc['action']}")
lines.append("")
lines.append("---")
lines.append("")
# --- Org-Level Recommendations ---
lines.extend([
"## Org-Level Recommendations",
"",
f"1. **Immediate:** Address {c_count} C-player(s) — each underperformer costs the team velocity.",
f"2. **Short-term:** Invest in coaching for {b_count} B-player(s) — targeted development on their weakest dimension.",
f"3. **Strategic:** Retain and challenge {a_count} A-player(s) — they leave when bored, not when overworked.",
])
if avg_composite < 60:
lines.append("4. **Warning:** Team average below 60. Consider structural changes, not just individual coaching.")
lines.append("")
lines.append("---")
lines.append(f"*Generated by Team Performance Audit (Elon Algorithm)*")
return "\n".join(lines)
def format_json_report(scorecards: list[dict], elon_analysis: str, data: dict) -> str:
"""Generate the full JSON report."""
org_ctx = data.get("org_context", {})
report = {
"generated": datetime.now().isoformat(),
"team_size": len(scorecards),
"evaluation_period": org_ctx.get("evaluation_period", "Current"),
"summary": {
"average_composite": round(
sum(s["scores"]["composite"] for s in scorecards) / max(len(scorecards), 1), 1
),
"tier_distribution": {
"A": sum(1 for s in scorecards if s["scores"]["tier"] == "A"),
"B": sum(1 for s in scorecards if s["scores"]["tier"] == "B"),
"C": sum(1 for s in scorecards if s["scores"]["tier"] == "C"),
},
},
"stack_rank": scorecards,
"elon_algorithm_analysis": elon_analysis,
}
return json.dumps(report, indent=2, default=str)
# ---------------------------------------------------------------------------
# CLI
# ---------------------------------------------------------------------------
def main():
parser = argparse.ArgumentParser(
description="Team Performance Audit — The Elon Algorithm",
epilog="Scores team members on velocity, quality, independence, and initiative. "
"Stack ranks with A/B/C tiers and generates actionable recommendations.",
)
parser.add_argument(
"--input", "-i",
required=True,
help="Path to team data file (JSON or CSV). See --help for format details.",
)
parser.add_argument(
"--output", "-o",
help="Output file path. If omitted, prints to stdout.",
)
parser.add_argument(
"--format", "-f",
choices=["markdown", "json"],
default="markdown",
help="Output format (default: markdown).",
)
parser.add_argument(
"--dry-run",
action="store_true",
help="Skip LLM calls. Only compute quantitative scores.",
)
parser.add_argument(
"--weights",
type=str,
help='Custom score weights as JSON: \'{"output_velocity":0.4,"quality":0.3,"independence":0.15,"initiative":0.15}\'',
)
args = parser.parse_args()
# Apply custom weights if provided
if args.weights:
try:
custom_weights = json.loads(args.weights)
for key in SCORE_WEIGHTS:
if key in custom_weights:
SCORE_WEIGHTS[key] = float(custom_weights[key])
# Validate weights sum to ~1.0
total = sum(SCORE_WEIGHTS.values())
if abs(total - 1.0) > 0.01:
print(f"Warning: Weights sum to {total}, not 1.0. Normalizing.", file=sys.stderr)
for key in SCORE_WEIGHTS:
SCORE_WEIGHTS[key] /= total
except (json.JSONDecodeError, ValueError) as e:
print(f"Error parsing --weights: {e}", file=sys.stderr)
sys.exit(1)
# Load data
try:
data = load_input(args.input)
except FileNotFoundError:
print(f"Error: File not found: {args.input}", file=sys.stderr)
sys.exit(1)
except (json.JSONDecodeError, ValueError) as e:
print(f"Error: Invalid input file: {e}", file=sys.stderr)
sys.exit(1)
print(f"📊 Loaded {len(data['team_members'])} team members", file=sys.stderr)
# Compute scorecards
scorecards = generate_scorecards(data)
print(f"✅ Scored and ranked all members", file=sys.stderr)
# Run Elon Algorithm (LLM analysis)
if args.dry_run:
elon_analysis = "[Dry run — LLM analysis skipped. Quantitative scores only.]"
print("⏭️ Dry run — skipping LLM analysis", file=sys.stderr)
else:
print("🤖 Running Elon Algorithm analysis...", file=sys.stderr)
elon_analysis = run_elon_algorithm(data)
print("✅ Analysis complete", file=sys.stderr)
# Generate report
if args.format == "json":
report = format_json_report(scorecards, elon_analysis, data)
else:
report = format_markdown_report(scorecards, elon_analysis, data)
# Output
if args.output:
with open(args.output, "w") as f:
f.write(report)
print(f"📝 Report written to {args.output}", file=sys.stderr)
else:
print(report)
# Summary to stderr
a_count = sum(1 for s in scorecards if s["scores"]["tier"] == "A")
b_count = sum(1 for s in scorecards if s["scores"]["tier"] == "B")
c_count = sum(1 for s in scorecards if s["scores"]["tier"] == "C")
print(f"\n🏆 Results: {a_count}A / {b_count}B / {c_count}C players", file=sys.stderr)
if __name__ == "__main__":
main()