Initial commit: 6 AI marketing skill categories
- growth-engine: Autonomous experiment engine (Karpathy autoresearch for marketing) - sales-pipeline: RB2B router, deal resurrector, trigger prospector, ICP learner - content-ops: Expert panel, quality gate, editorial brain, quote miner - outbound-engine: Cold outbound optimizer, lead pipeline, competitive monitor - seo-ops: Content attack briefs, GSC optimizer, trend scout - finance-ops: CFO briefing, cost estimate, scenario modeler 79 files, all sanitized - zero hardcoded credentials or internal references.
This commit is contained in:
commit
a96d0d8889
81 changed files with 15050 additions and 0 deletions
21
LICENSE
Normal file
21
LICENSE
Normal file
|
|
@ -0,0 +1,21 @@
|
||||||
|
MIT License
|
||||||
|
|
||||||
|
Copyright (c) 2026 Single Grain
|
||||||
|
|
||||||
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
|
of this software and associated documentation files (the "Software"), to deal
|
||||||
|
in the Software without restriction, including without limitation the rights
|
||||||
|
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||||
|
copies of the Software, and to permit persons to whom the Software is
|
||||||
|
furnished to do so, subject to the following conditions:
|
||||||
|
|
||||||
|
The above copyright notice and this permission notice shall be included in all
|
||||||
|
copies or substantial portions of the Software.
|
||||||
|
|
||||||
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||||
|
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||||
|
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||||
|
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||||
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||||
|
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||||
|
SOFTWARE.
|
||||||
146
README.md
Normal file
146
README.md
Normal file
|
|
@ -0,0 +1,146 @@
|
||||||
|
# AI Marketing Skills
|
||||||
|
|
||||||
|
**Open-source Claude Code skills for B2B marketing and sales teams.** Built by the team at [Single Grain](https://www.singlegrain.com) — battle-tested on real pipelines generating millions in revenue.
|
||||||
|
|
||||||
|
These aren't prompts. They're complete workflows — scripts, scoring algorithms, expert panels, and automation pipelines you can plug into Claude Code (or any AI coding agent) and run today.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🗂️ Skills
|
||||||
|
|
||||||
|
| Category | What It Does | Key Skills |
|
||||||
|
|----------|-------------|------------|
|
||||||
|
| [**Growth Engine**](./growth-engine/) | Autonomous marketing experiments that run, measure, and optimize themselves | Experiment Engine, Pacing Alerts, Weekly Scorecard |
|
||||||
|
| [**Sales Pipeline**](./sales-pipeline/) | Turn anonymous website visitors into qualified pipeline | RB2B Router, Deal Resurrector, Trigger Prospector, ICP Learner |
|
||||||
|
| [**Content Ops**](./content-ops/) | Ship content that scores 90+ every time | Expert Panel, Quality Gate, Editorial Brain, Quote Miner |
|
||||||
|
| [**Outbound Engine**](./outbound-engine/) | ICP definition to emails in inbox — fully automated | Cold Outbound Optimizer, Lead Pipeline, Competitive Monitor |
|
||||||
|
| [**SEO Ops**](./seo-ops/) | Find the keywords your competitors missed | Content Attack Briefs, GSC Optimizer, Trend Scout |
|
||||||
|
| [**Finance Ops**](./finance-ops/) | Your AI CFO that finds hidden costs in 30 minutes | CFO Briefing, Cost Estimate, Scenario Modeler |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🚀 Quick Start
|
||||||
|
|
||||||
|
Each skill category has its own README with setup instructions. The general pattern:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Clone the repo
|
||||||
|
git clone https://github.com/singlegrain/ai-marketing-skills.git
|
||||||
|
cd ai-marketing-skills
|
||||||
|
|
||||||
|
# 2. Pick a category
|
||||||
|
cd growth-engine # or sales-pipeline, content-ops, etc.
|
||||||
|
|
||||||
|
# 3. Install dependencies
|
||||||
|
pip install -r requirements.txt
|
||||||
|
|
||||||
|
# 4. Set up environment variables
|
||||||
|
cp .env.example .env
|
||||||
|
# Edit .env with your API keys
|
||||||
|
|
||||||
|
# 5. Run
|
||||||
|
python experiment-engine.py create \
|
||||||
|
--hypothesis "Thread posts get 2x engagement vs single posts" \
|
||||||
|
--variable format \
|
||||||
|
--variants '["thread", "single"]' \
|
||||||
|
--metric impressions
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🧠 How These Work with Claude Code
|
||||||
|
|
||||||
|
Every category includes a `SKILL.md` file. Drop it into your Claude Code project and the AI agent knows how to use the tools:
|
||||||
|
|
||||||
|
```
|
||||||
|
# In your project directory
|
||||||
|
cp ai-marketing-skills/growth-engine/SKILL.md .claude/skills/growth-engine.md
|
||||||
|
```
|
||||||
|
|
||||||
|
Then ask Claude Code: *"Run an experiment testing carousel vs. static posts on LinkedIn"* — it handles the rest.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📊 What Makes These Different
|
||||||
|
|
||||||
|
**These aren't toy demos.** Each skill was built to run real business operations:
|
||||||
|
|
||||||
|
- **Growth Engine** uses bootstrap confidence intervals and Mann-Whitney U tests — real statistics, not vibes
|
||||||
|
- **Deal Resurrector** has three intelligence layers including "follow the champion" — tracking departed contacts to their new companies
|
||||||
|
- **ICP Learner** rewrites your ideal customer profile based on actual win/loss data — your targeting improves automatically
|
||||||
|
- **Expert Panel** recursively scores content with domain-specific expert personas until quality hits 90+
|
||||||
|
- **RB2B Router** does intent scoring, seniority-based company dedup, and agency classification before routing to outbound sequences
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📁 Repository Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
ai-marketing-skills/
|
||||||
|
├── README.md ← You are here
|
||||||
|
├── growth-engine/ ← Autonomous experiments
|
||||||
|
│ ├── SKILL.md
|
||||||
|
│ ├── experiment-engine.py
|
||||||
|
│ ├── pacing-alert.py
|
||||||
|
│ ├── autogrowth-weekly-scorecard.py
|
||||||
|
│ └── ...
|
||||||
|
├── sales-pipeline/ ← Visitor → pipeline automation
|
||||||
|
│ ├── SKILL.md
|
||||||
|
│ ├── rb2b_instantly_router.py
|
||||||
|
│ ├── deal_resurrector.py
|
||||||
|
│ ├── trigger_prospector.py
|
||||||
|
│ ├── icp_learning_analyzer.py
|
||||||
|
│ └── ...
|
||||||
|
├── content-ops/ ← Quality scoring & production
|
||||||
|
│ ├── SKILL.md
|
||||||
|
│ ├── scripts/
|
||||||
|
│ ├── experts/ ← 9 expert panel definitions
|
||||||
|
│ ├── scoring-rubrics/ ← 5 scoring rubric templates
|
||||||
|
│ └── ...
|
||||||
|
├── outbound-engine/ ← Cold outbound automation
|
||||||
|
│ ├── SKILL.md
|
||||||
|
│ ├── scripts/
|
||||||
|
│ ├── references/ ← ICP template, copy rules
|
||||||
|
│ └── ...
|
||||||
|
├── seo-ops/ ← SEO intelligence
|
||||||
|
│ ├── SKILL.md
|
||||||
|
│ ├── content_attack_brief.py
|
||||||
|
│ ├── gsc_client.py
|
||||||
|
│ ├── trend_scout.py
|
||||||
|
│ └── ...
|
||||||
|
└── finance-ops/ ← Financial analysis
|
||||||
|
├── SKILL.md
|
||||||
|
├── scripts/
|
||||||
|
├── references/ ← Metrics, rates, ROI models
|
||||||
|
└── ...
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🤝 Contributing
|
||||||
|
|
||||||
|
Found a bug? Have an improvement? PRs welcome.
|
||||||
|
|
||||||
|
1. Fork the repo
|
||||||
|
2. Create your feature branch (`git checkout -b feature/better-scoring`)
|
||||||
|
3. Commit your changes
|
||||||
|
4. Push to the branch
|
||||||
|
5. Open a Pull Request
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📄 License
|
||||||
|
|
||||||
|
MIT License. Use these however you want.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🏢 About
|
||||||
|
|
||||||
|
Built by the marketing engineering team at [Single Grain](https://www.singlegrain.com). We help B2B companies grow with AI-powered marketing and sales operations.
|
||||||
|
|
||||||
|
**Want these skills managed for you?** [Talk to us](https://www.singlegrain.com/contact/) — we run these systems for companies doing $10M-$500M in revenue.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Star this repo if you find it useful. It helps others discover these tools.*
|
||||||
36
content-ops/.env.example
Normal file
36
content-ops/.env.example
Normal file
|
|
@ -0,0 +1,36 @@
|
||||||
|
# ── Required ──
|
||||||
|
|
||||||
|
# Anthropic API key (for LLM-powered features: editorial brain, content transform, expert panel)
|
||||||
|
ANTHROPIC_API_KEY=sk-ant-...
|
||||||
|
|
||||||
|
# ── Optional: Data directory ──
|
||||||
|
|
||||||
|
# Override default data directory (default: ./data/)
|
||||||
|
# CONTENT_OPS_DATA_DIR=./data
|
||||||
|
|
||||||
|
# ── Optional: Editorial Brain ──
|
||||||
|
|
||||||
|
# Override default model for editorial brain clip discovery
|
||||||
|
# EDITORIAL_BRAIN_MODEL=claude-sonnet-4-20250514
|
||||||
|
|
||||||
|
# ── Optional: Quote Mining Engine ──
|
||||||
|
|
||||||
|
# Path to feeds JSON config: {"Feed Name": "https://feed-url/rss", ...}
|
||||||
|
# QUOTE_MINING_FEEDS_FILE=./config/feeds.json
|
||||||
|
|
||||||
|
# Inline feeds JSON (alternative to file)
|
||||||
|
# QUOTE_MINING_FEEDS={"My Podcast": "https://feeds.example.com/rss"}
|
||||||
|
|
||||||
|
# Directory containing meeting notes (markdown files) to scan for quotes
|
||||||
|
# QUOTE_MINING_NOTES_DIR=./notes/
|
||||||
|
|
||||||
|
# Speaker name to extract from meeting notes (e.g., "John Smith")
|
||||||
|
# QUOTE_MINING_SPEAKER=
|
||||||
|
|
||||||
|
# ── Optional: Content Transform ──
|
||||||
|
|
||||||
|
# Voice configuration file (markdown describing your brand voice)
|
||||||
|
# VOICE_CONFIG_FILE=./config/voice.md
|
||||||
|
|
||||||
|
# Style guide file (markdown with writing style rules)
|
||||||
|
# STYLE_GUIDE_FILE=./config/style-guide.md
|
||||||
168
content-ops/README.md
Normal file
168
content-ops/README.md
Normal file
|
|
@ -0,0 +1,168 @@
|
||||||
|
# AI Content Ops
|
||||||
|
|
||||||
|
**Ship content that scores 90+ every time. Automatically.**
|
||||||
|
|
||||||
|
Most content teams publish and pray. This pipeline scores, gates, and iterates every piece of content through an AI expert panel before it goes live. Nothing ships below 90/100.
|
||||||
|
|
||||||
|
## What's Inside
|
||||||
|
|
||||||
|
### 🎯 Expert Panel (`SKILL.md`)
|
||||||
|
Claude Code skill that auto-assembles a panel of 7-10 domain experts tailored to whatever you're scoring. Works on:
|
||||||
|
- Blog posts, social content, email sequences
|
||||||
|
- Landing pages, ads, CTAs
|
||||||
|
- Strategy docs, pitch decks, charts
|
||||||
|
- Recruiting outreach, vendor evaluations
|
||||||
|
- Literally anything that needs a quality gate
|
||||||
|
|
||||||
|
The panel scores your content, identifies weaknesses, revises, and loops until every expert scores 90+. Max 3 rounds. Includes a 1.5x-weighted AI Writing Detector that catches all 24 known AI writing patterns.
|
||||||
|
|
||||||
|
### 🚦 Content Quality Gate (`scripts/content-quality-gate.py`)
|
||||||
|
CI/CD-style gate for your content pipeline. Runs the quality scorer on a batch of drafts and filters out anything below threshold. Nothing publishes without passing.
|
||||||
|
|
||||||
|
### 📊 Content Quality Scorer (`scripts/content-quality-scorer.py`)
|
||||||
|
Automated scoring engine with 5 dimensions:
|
||||||
|
- **Voice similarity** (35%) — matches your brand voice patterns
|
||||||
|
- **Specificity** (25%) — real numbers, named entities, concrete examples
|
||||||
|
- **AI slop penalty** (20%) — detects and penalizes 50+ banned AI words and 8 AI writing patterns
|
||||||
|
- **Length appropriateness** (10%) — platform-specific character limits
|
||||||
|
- **Engagement potential** (10%) — hooks, CTAs, debate invitations
|
||||||
|
|
||||||
|
### 🧠 Editorial Brain (`scripts/editorial-brain.py`)
|
||||||
|
Two-pass LLM analysis for finding clip-worthy moments in video transcripts:
|
||||||
|
1. **Pass 1**: Scans transcript chunks for candidate moments (hook → build → payoff arcs)
|
||||||
|
2. **Pass 2**: Deep-scores each candidate on hook/build/payoff/clean-cut (0-100)
|
||||||
|
3. Only 90+ clips get cut
|
||||||
|
|
||||||
|
Fundamentally different from keyword matching. Thinks like a human editor.
|
||||||
|
|
||||||
|
### ⛏️ Quote Mining Engine (`scripts/quote-mining-engine.py`)
|
||||||
|
Scans podcast RSS feeds and meeting notes to extract quotable, contrarian, viral-worthy moments. Scoring heuristics:
|
||||||
|
- Contrarian signals (wrong, myth, overrated, secret...)
|
||||||
|
- Specificity signals ($amounts, percentages, multipliers)
|
||||||
|
- Emotional triggers (fear, love, shocking, AI...)
|
||||||
|
- Shareability signals (how to, framework, lesson learned...)
|
||||||
|
|
||||||
|
### 🔄 Content Transform (`scripts/content-transform.py`)
|
||||||
|
Repurposes long-form content into platform-native formats:
|
||||||
|
- **X threads/posts** — punchy, data-driven, with ASCII diagrams
|
||||||
|
- **LinkedIn posts** — hook before the fold, story arc, engagement CTA
|
||||||
|
- **YouTube Short scripts** — HOOK/SETUP/PAYOFF/CTA structure with visual cues
|
||||||
|
- **Newsletter sections** — scannable, value-dense, "why this matters"
|
||||||
|
|
||||||
|
Includes optional expert panel integration for iterative quality improvement.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Install dependencies
|
||||||
|
pip install -r requirements.txt
|
||||||
|
|
||||||
|
# 2. Set up environment
|
||||||
|
cp .env.example .env
|
||||||
|
# Edit .env with your API keys
|
||||||
|
|
||||||
|
# 3. Score a batch of content drafts
|
||||||
|
python scripts/content-quality-scorer.py --input drafts.json --verbose
|
||||||
|
|
||||||
|
# 4. Run the quality gate
|
||||||
|
python scripts/content-quality-gate.py --input drafts.json --threshold 70
|
||||||
|
|
||||||
|
# 5. Mine quotes from your podcast RSS
|
||||||
|
python scripts/quote-mining-engine.py --days 90 --min-score 60
|
||||||
|
|
||||||
|
# 6. Find clip-worthy moments in a video
|
||||||
|
python scripts/editorial-brain.py --url "https://youtube.com/watch?v=..." --min-score 90
|
||||||
|
|
||||||
|
# 7. Transform content atoms into platform drafts
|
||||||
|
python scripts/content-transform.py --atoms atoms.json --top-n 10
|
||||||
|
```
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
All scripts use environment variables for configuration. See `.env.example` for the full list.
|
||||||
|
|
||||||
|
### Voice Customization
|
||||||
|
The quality scorer and content transformer use configurable voice patterns. Edit these in your `.env` or pass custom config files:
|
||||||
|
|
||||||
|
- `VOICE_MARKERS` — regex patterns that signal your brand voice
|
||||||
|
- `BANNED_WORDS` — AI slop vocabulary to penalize
|
||||||
|
- `PLATFORM_LIMITS` — character limits per platform
|
||||||
|
|
||||||
|
### Scoring Weights
|
||||||
|
Adjust scoring weights via a JSON config file:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"weights": {
|
||||||
|
"voice_similarity": 0.35,
|
||||||
|
"specificity": 0.25,
|
||||||
|
"slop_penalty": 0.20,
|
||||||
|
"length_appropriateness": 0.10,
|
||||||
|
"engagement_potential": 0.10
|
||||||
|
},
|
||||||
|
"threshold": 70
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Expert Panel Domains
|
||||||
|
|
||||||
|
Pre-built expert panels included:
|
||||||
|
- `experts/humanizer.md` — AI writing detection (24 patterns, mandatory)
|
||||||
|
- `experts/x-articles.md` — X/Twitter long-form posts
|
||||||
|
- `experts/linkedin.md` — LinkedIn posts
|
||||||
|
- `experts/newsletter.md` — Email newsletters
|
||||||
|
- `experts/youtube-shorts.md` — YouTube Shorts scripts
|
||||||
|
- `experts/instagram.md` — Instagram visual content
|
||||||
|
- `experts/podcast-quotes.md` — Podcast quote cards
|
||||||
|
- `experts/recruiting.md` — Recruiting outreach
|
||||||
|
- `experts/seo-strategy.md` — SEO strategy docs
|
||||||
|
|
||||||
|
Scoring rubrics:
|
||||||
|
- `scoring-rubrics/content-quality.md` — Blog, social, email, scripts
|
||||||
|
- `scoring-rubrics/strategic-quality.md` — Strategy and analysis
|
||||||
|
- `scoring-rubrics/conversion-quality.md` — Landing pages, ads, CTAs
|
||||||
|
- `scoring-rubrics/visual-quality.md` — Charts, infographics, slides
|
||||||
|
- `scoring-rubrics/evaluation-quality.md` — Candidate/vendor evaluations
|
||||||
|
|
||||||
|
## Input Formats
|
||||||
|
|
||||||
|
### Content Drafts (for scorer/gate)
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"drafts": [
|
||||||
|
{
|
||||||
|
"id": "draft-001",
|
||||||
|
"platform": "x",
|
||||||
|
"draft": "Your content text here..."
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Content Atoms (for transformer)
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"atoms": [
|
||||||
|
{
|
||||||
|
"id": "atom-001",
|
||||||
|
"content": "Long-form source content...",
|
||||||
|
"tags": ["AI", "marketing"],
|
||||||
|
"platforms_missing": ["x", "linkedin"],
|
||||||
|
"repurpose_score": 8
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
Content Source → Content Transform → Quality Scorer → Quality Gate → Publish
|
||||||
|
↑ ↓
|
||||||
|
Expert Panel ←── Revision Loop (max 3 rounds)
|
||||||
|
```
|
||||||
|
|
||||||
|
The pipeline is modular. Use any script standalone or wire them together.
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
MIT
|
||||||
231
content-ops/SKILL.md
Normal file
231
content-ops/SKILL.md
Normal file
|
|
@ -0,0 +1,231 @@
|
||||||
|
---
|
||||||
|
name: expert-panel
|
||||||
|
description: >-
|
||||||
|
Score, evaluate, and iteratively improve any content or strategy using an
|
||||||
|
auto-assembled panel of domain experts. Handles copy, sequences, landing pages,
|
||||||
|
strategy docs, titles, charts, recruiting evaluations, or anything else that
|
||||||
|
needs a quality gate. Recursively iterates until all scores hit 90+ (max 3
|
||||||
|
rounds). Use when asked to: "expert panel this", "score this", "rate these
|
||||||
|
variants", "quality check this", "panel review", "which version is better",
|
||||||
|
"expert score", "evaluate this copy/strategy/page", or when another skill
|
||||||
|
needs a quality gate on its output. Also triggers on: "score this landing page",
|
||||||
|
"expert panel these email variants", "rate this headline", "panel these charts".
|
||||||
|
---
|
||||||
|
|
||||||
|
# Expert Panel
|
||||||
|
|
||||||
|
General-purpose scoring and iterative improvement engine. Auto-assembles the
|
||||||
|
right experts for whatever is being evaluated, scores it, and loops until 90+.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 1: Intake — Understand What's Being Scored
|
||||||
|
|
||||||
|
Collect or infer from context:
|
||||||
|
|
||||||
|
1. **Content/artifact** — The thing(s) to score (paste, file path, or URL)
|
||||||
|
2. **Content type** — Copy, sequence, landing page, strategy, title, chart, candidate eval, etc.
|
||||||
|
3. **Offer context** — What's being sold/promoted? To whom? What domain/industry?
|
||||||
|
4. **Variants** — Are there multiple versions to compare? (A/B/C)
|
||||||
|
5. **Source skill** — Is this output from another skill? (e.g., cold-outbound-optimizer)
|
||||||
|
If yes, note the source for feedback-to-source routing in Step 6.
|
||||||
|
|
||||||
|
If context is obvious from the conversation, don't ask — just proceed.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 2: Auto-Assemble the Expert Panel
|
||||||
|
|
||||||
|
Build a panel of **7–10 experts** tailored to the content type and domain.
|
||||||
|
|
||||||
|
### Assembly rules
|
||||||
|
|
||||||
|
1. **Start with content-type experts.** Read `experts/` directory for pre-built panels matching
|
||||||
|
the content type. If an exact match exists (e.g., `experts/linkedin.md` for a LinkedIn post),
|
||||||
|
use it as the base.
|
||||||
|
|
||||||
|
2. **Add domain/offer experts.** Based on the offer context, add 1–3 experts who understand
|
||||||
|
the specific industry or domain. Examples:
|
||||||
|
- Scoring bakery marketing → add Food & Beverage Marketing Expert
|
||||||
|
- Scoring SaaS landing page → add SaaS Conversion Expert
|
||||||
|
- Scoring recruiting outreach → add Agency Recruiter + Talent Market Expert
|
||||||
|
- Scoring medical device copy → add Healthcare Compliance Expert
|
||||||
|
|
||||||
|
3. **Always include these two:**
|
||||||
|
- **AI Writing Detector** — See `experts/humanizer.md`. Weight: 1.5x. Non-negotiable.
|
||||||
|
- **Brand Voice Match** — Checks alignment with the configured brand voice and
|
||||||
|
known rejection patterns from `references/patterns.md` (if present).
|
||||||
|
|
||||||
|
4. **Check learned patterns.** If `references/patterns.md` exists, read it. If any patterns
|
||||||
|
apply to this content type, brief the panel on them. Dock points for known-bad patterns.
|
||||||
|
|
||||||
|
5. **Cap at 10 experts.** If you have more than 10, merge overlapping roles.
|
||||||
|
|
||||||
|
### Panel output format
|
||||||
|
List each expert with: Name, lens/focus, what they check.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 3: Select Scoring Rubric
|
||||||
|
|
||||||
|
Choose the appropriate rubric from `scoring-rubrics/`:
|
||||||
|
|
||||||
|
| Content type | Rubric file |
|
||||||
|
|---|---|
|
||||||
|
| Blog, social, email, newsletter, scripts | `scoring-rubrics/content-quality.md` |
|
||||||
|
| Strategy, recommendations, analysis | `scoring-rubrics/strategic-quality.md` |
|
||||||
|
| Landing pages, ads, CTAs | `scoring-rubrics/conversion-quality.md` |
|
||||||
|
| Charts, data viz, infographics | `scoring-rubrics/visual-quality.md` |
|
||||||
|
| Candidate evaluations | `scoring-rubrics/evaluation-quality.md` |
|
||||||
|
| Other | Synthesize a rubric from the two closest matches |
|
||||||
|
|
||||||
|
Read the selected rubric file for detailed criteria and point allocation.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 4: Score — Recursive Loop Until 90+
|
||||||
|
|
||||||
|
**Target: 90/100 across all experts. Non-negotiable. Max 3 rounds.**
|
||||||
|
|
||||||
|
### Each round produces:
|
||||||
|
|
||||||
|
```
|
||||||
|
## Round [N] — Score: [AVG]/100
|
||||||
|
|
||||||
|
| Expert | Score | Key Feedback |
|
||||||
|
|--------|-------|--------------|
|
||||||
|
| [Name] | [0-100] | [One-line rationale] |
|
||||||
|
| ... | ... | ... |
|
||||||
|
|
||||||
|
**Aggregate:** [weighted average — humanizer at 1.5x]
|
||||||
|
**Top 3 weaknesses:** [ranked]
|
||||||
|
**Changes made:** [specific edits addressing each weakness]
|
||||||
|
```
|
||||||
|
|
||||||
|
Then the revised content/artifact.
|
||||||
|
|
||||||
|
### Rules
|
||||||
|
|
||||||
|
- Scores must be brutally honest. No padding to 90.
|
||||||
|
- Humanizer score weighted 1.5x in the aggregate.
|
||||||
|
- If aggregate < 90: identify top 3 weaknesses → revise → next round.
|
||||||
|
- If aggregate ≥ 90: finalize and proceed to output.
|
||||||
|
- After 3 rounds, if still < 90: return best version with honest score + note on what's
|
||||||
|
holding it back.
|
||||||
|
- Show ALL rounds in output — the iteration trail is part of the value.
|
||||||
|
|
||||||
|
### Variant comparison mode
|
||||||
|
|
||||||
|
When scoring multiple variants (A/B/C):
|
||||||
|
- Score each variant independently through the full panel.
|
||||||
|
- After scoring, rank variants by aggregate score.
|
||||||
|
- If top variant is < 90, iterate on the best one (don't iterate all of them).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 5: Output Format
|
||||||
|
|
||||||
|
### Winner + Score (always at top)
|
||||||
|
|
||||||
|
```
|
||||||
|
## 🏆 Result: [SCORE]/100 — [PASS ✅ | NEEDS WORK ⚠️]
|
||||||
|
|
||||||
|
[Final content/artifact here]
|
||||||
|
|
||||||
|
**Iterations:** [N] rounds
|
||||||
|
**Panel:** [Expert names, comma-separated]
|
||||||
|
```
|
||||||
|
|
||||||
|
If variants: show winner first, then runner-up scores.
|
||||||
|
|
||||||
|
```
|
||||||
|
## 🏆 Winner: Variant [X] — [SCORE]/100
|
||||||
|
|
||||||
|
[Winning content]
|
||||||
|
|
||||||
|
### Runner-up scores
|
||||||
|
- Variant A: 87/100
|
||||||
|
- Variant B: 82/100
|
||||||
|
- Variant C: 91/100 ← Winner
|
||||||
|
```
|
||||||
|
|
||||||
|
### Feedback History (below the result)
|
||||||
|
|
||||||
|
Show full scoring rounds.
|
||||||
|
|
||||||
|
```
|
||||||
|
---
|
||||||
|
<details>
|
||||||
|
<summary>📊 Scoring History (N rounds)</summary>
|
||||||
|
|
||||||
|
[All round tables from Step 4]
|
||||||
|
|
||||||
|
</details>
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 6: Feedback-to-Source (When Scoring Another Skill's Output)
|
||||||
|
|
||||||
|
When the scored content came from another skill, generate a **Source Improvement Brief**:
|
||||||
|
|
||||||
|
```
|
||||||
|
## 🔁 Feedback for [Source Skill]
|
||||||
|
|
||||||
|
### What scored low
|
||||||
|
- [Pattern]: [Specific example from this content]
|
||||||
|
|
||||||
|
### Suggested skill improvements
|
||||||
|
- [Concrete change to the source skill's process/rubric/prompt]
|
||||||
|
|
||||||
|
### Patterns to add to source skill
|
||||||
|
- [Any recurring weakness that should become a rule]
|
||||||
|
```
|
||||||
|
|
||||||
|
This brief can be used to update the source skill's SKILL.md or rubrics.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 7: Memory — Learn from Approvals and Rejections
|
||||||
|
|
||||||
|
After the user approves or rejects panel output:
|
||||||
|
|
||||||
|
### On approval (score ≥ 90, user accepts)
|
||||||
|
Note what worked. No action needed unless a new positive pattern emerges.
|
||||||
|
|
||||||
|
### On rejection (user overrides the panel or rejects 90+ content)
|
||||||
|
1. Ask why (or infer from context).
|
||||||
|
2. Add a new pattern to `references/patterns.md` using this format:
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
## [Pattern Name]
|
||||||
|
- **Type:** rejection | preference | override
|
||||||
|
- **Content types:** [which types this applies to]
|
||||||
|
- **Rule:** [What to always/never do]
|
||||||
|
- **Example:** [The specific instance that triggered this]
|
||||||
|
- **Date:** [YYYY-MM-DD]
|
||||||
|
- **Point dock:** [-N points when detected]
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Confirm: "Added pattern: [one-line summary]. Panel will dock [N] points for this going forward."
|
||||||
|
|
||||||
|
### Pattern enforcement
|
||||||
|
Every scoring round, check `references/patterns.md` against the content. Apply point docks
|
||||||
|
before expert scoring begins. This means known-bad patterns are penalized even if individual
|
||||||
|
experts miss them.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Reference Files
|
||||||
|
|
||||||
|
| File | Purpose | When to read |
|
||||||
|
|---|---|---|
|
||||||
|
| `experts/humanizer.md` | AI writing detection rubric (24 patterns) | Every scoring run |
|
||||||
|
| `experts/[domain].md` | Pre-built expert panels for common domains | When domain matches |
|
||||||
|
| `scoring-rubrics/content-quality.md` | Content scoring rubric | Content scoring |
|
||||||
|
| `scoring-rubrics/strategic-quality.md` | Strategy scoring rubric | Strategy scoring |
|
||||||
|
| `scoring-rubrics/conversion-quality.md` | Landing page/ad/CTA rubric | Conversion scoring |
|
||||||
|
| `scoring-rubrics/visual-quality.md` | Chart/data viz/infographic rubric | Visual scoring |
|
||||||
|
| `scoring-rubrics/evaluation-quality.md` | Candidate/assessment rubric | Eval scoring |
|
||||||
|
| `references/patterns.md` | Learned rejection patterns | Every scoring run |
|
||||||
|
| `references/expert-assembly.md` | Domain-expert examples for auto-assembly | When building unfamiliar panels |
|
||||||
4
content-ops/config/feeds.example.json
Normal file
4
content-ops/config/feeds.example.json
Normal file
|
|
@ -0,0 +1,4 @@
|
||||||
|
{
|
||||||
|
"My Marketing Podcast": "https://feeds.example.com/marketing-podcast/rss",
|
||||||
|
"Industry Show": "https://feeds.example.com/industry-show/rss"
|
||||||
|
}
|
||||||
145
content-ops/experts/humanizer.md
Normal file
145
content-ops/experts/humanizer.md
Normal file
|
|
@ -0,0 +1,145 @@
|
||||||
|
# Expert Panel: AI Writing Detector (Humanizer)
|
||||||
|
|
||||||
|
## Context
|
||||||
|
- Based on the 24 AI writing patterns from Wikipedia's "Signs of AI writing" guide
|
||||||
|
- This expert scores drafts on how AI-generated they sound
|
||||||
|
- Scoring: 0 = obviously AI-generated, 100 = indistinguishable from human
|
||||||
|
- This should be the LAST check before any draft is finalized
|
||||||
|
|
||||||
|
## Scoring Rubric
|
||||||
|
|
||||||
|
### Banned Vocabulary (instant -5 per occurrence)
|
||||||
|
delve, tapestry, landscape (abstract), leverage, multifaceted, nuanced, pivotal, realm, robust, seamless, testament, transformative, underscore (verb), utilize, whilst, keen, embark, comprehensive, intricate, commendable, meticulous, paramount, groundbreaking, innovative, cutting-edge, synergy, holistic, paradigm, ecosystem, Additionally, align with, crucial, enduring, enhance, fostering, garner, highlight (verb), interplay, intricacies, showcase, vibrant, valuable, profound, renowned, breathtaking, nestled, stunning
|
||||||
|
|
||||||
|
### The 24 Patterns to Flag
|
||||||
|
|
||||||
|
#### CONTENT PATTERNS
|
||||||
|
|
||||||
|
**1. Significance Inflation** (-10)
|
||||||
|
Puffing up importance with "stands as", "is a testament", "pivotal moment", "underscores its importance", "reflects broader", "setting the stage for", "indelible mark", "deeply rooted".
|
||||||
|
- Before: "This initiative marked a pivotal moment in the evolution of digital marketing."
|
||||||
|
- After: "The company launched its first programmatic ad campaign in 2019."
|
||||||
|
|
||||||
|
**2. Undue Notability Claims** (-5)
|
||||||
|
Listing media mentions without context. "Active social media presence", "leading expert".
|
||||||
|
- Before: "His insights have been featured in Forbes, Inc, and Entrepreneur."
|
||||||
|
- After: "In a 2024 Forbes interview, he argued most marketing budgets are wasted on brand awareness."
|
||||||
|
|
||||||
|
**3. Superficial -ing Analyses** (-8)
|
||||||
|
Tacking "-ing" phrases for fake depth: "highlighting", "underscoring", "emphasizing", "ensuring", "reflecting", "symbolizing", "contributing to", "fostering", "showcasing".
|
||||||
|
- Before: "The platform grew 40% YoY, showcasing the team's commitment to innovation and highlighting the importance of user experience."
|
||||||
|
- After: "The platform grew 40% YoY. Most of that came from a single referral loop they built in Q2."
|
||||||
|
|
||||||
|
**4. Promotional Language** (-8)
|
||||||
|
"Boasts a", "vibrant", "rich" (figurative), "profound", "exemplifies", "commitment to", "natural beauty", "nestled", "in the heart of", "must-visit".
|
||||||
|
- Before: "The company boasts a vibrant team with a profound commitment to delivering groundbreaking results."
|
||||||
|
- After: "The company has 45 employees. Revenue grew 32% last year."
|
||||||
|
|
||||||
|
**5. Vague Attributions** (-8)
|
||||||
|
"Industry reports", "Experts argue", "Some critics argue", "several sources". No specific citations.
|
||||||
|
- Before: "Experts believe AI will transform the marketing landscape."
|
||||||
|
- After: "A 2024 Gartner survey found 67% of CMOs plan to increase AI spend next year."
|
||||||
|
|
||||||
|
**6. Formulaic "Challenges and Future" Sections** (-10)
|
||||||
|
"Despite its X, faces challenges...", "Despite these challenges, continues to Y", "Future Outlook".
|
||||||
|
- Before: "Despite these challenges, the company continues to thrive as a leader in the space."
|
||||||
|
- After: "Customer churn hit 8% in Q3. They hired a retention team in October."
|
||||||
|
|
||||||
|
#### LANGUAGE AND GRAMMAR PATTERNS
|
||||||
|
|
||||||
|
**7. AI Vocabulary Clustering** (-10)
|
||||||
|
Multiple banned words in same paragraph. See banned list above.
|
||||||
|
- Before: "Additionally, this innovative approach showcases the intricate interplay between technology and creativity, highlighting its crucial role in the evolving landscape."
|
||||||
|
- After: "The tool saves about 3 hours per week on content scheduling. That's it."
|
||||||
|
|
||||||
|
**8. Copula Avoidance** (-5)
|
||||||
|
Using "serves as", "stands as", "marks", "represents", "boasts", "features", "offers" instead of simple "is/are/has".
|
||||||
|
- Before: "The newsletter serves as a valuable resource for marketers."
|
||||||
|
- After: "The newsletter is a resource for marketers. 12K subscribers open it weekly."
|
||||||
|
|
||||||
|
**9. Negative Parallelisms** (-5)
|
||||||
|
"Not only...but...", "It's not just about X, it's Y", "It's not merely X, it's Y".
|
||||||
|
- Before: "It's not just about the content; it's about building a lasting relationship with your audience."
|
||||||
|
- After: "Good content gets replies. That's how you build an audience."
|
||||||
|
|
||||||
|
**10. Rule of Three Overuse** (-8)
|
||||||
|
Forcing ideas into groups of three. Triple adjectives, triple nouns, triple parallel clauses.
|
||||||
|
- Before: "The event features keynote sessions, panel discussions, and networking opportunities."
|
||||||
|
- After: "The event has talks and panels. There's also time for networking between sessions."
|
||||||
|
|
||||||
|
**11. Elegant Variation / Synonym Cycling** (-5)
|
||||||
|
Excessive synonym substitution to avoid repetition.
|
||||||
|
- Before: "The CEO shared his vision. The business leader outlined the strategy. The company head detailed the plan."
|
||||||
|
- After: "The CEO shared his vision and outlined the strategy."
|
||||||
|
|
||||||
|
**12. False Ranges** (-5)
|
||||||
|
"From X to Y" where X and Y aren't on a meaningful scale.
|
||||||
|
- Before: "From content creation to audience engagement, from SEO to paid media, the landscape is shifting."
|
||||||
|
- After: "Content, SEO, and paid media are all changing. Here's what actually matters."
|
||||||
|
|
||||||
|
#### STYLE PATTERNS
|
||||||
|
|
||||||
|
**13. Em Dash Overuse** (-5)
|
||||||
|
More than 1 em dash per 200 words. AI uses them for "punchy" sales writing.
|
||||||
|
|
||||||
|
**14. Overuse of Boldface** (-3)
|
||||||
|
Mechanical bold emphasis on every key term.
|
||||||
|
|
||||||
|
**15. Inline-Header Vertical Lists** (-5)
|
||||||
|
Lists where every item starts with a bolded header + colon.
|
||||||
|
|
||||||
|
**16. Title Case in Headings** (-3)
|
||||||
|
Capitalizing All Main Words In Every Heading.
|
||||||
|
|
||||||
|
**17. Emoji Decoration** (-5)
|
||||||
|
Emojis on headings or bullet points (🚀💡✅).
|
||||||
|
|
||||||
|
**18. Curly Quotation Marks** (-2)
|
||||||
|
Using " " instead of " ".
|
||||||
|
|
||||||
|
#### COMMUNICATION PATTERNS
|
||||||
|
|
||||||
|
**19. Collaborative Artifacts** (-10)
|
||||||
|
"I hope this helps", "Of course!", "Certainly!", "Would you like...", "let me know", "here is a...".
|
||||||
|
|
||||||
|
**20. Knowledge-Cutoff Disclaimers** (-10)
|
||||||
|
"As of [date]", "While specific details are limited", "based on available information".
|
||||||
|
|
||||||
|
**21. Sycophantic Tone** (-8)
|
||||||
|
"Great question!", "You're absolutely right!", "That's an excellent point!"
|
||||||
|
|
||||||
|
#### FILLER AND HEDGING
|
||||||
|
|
||||||
|
**22. Filler Phrases** (-5 each)
|
||||||
|
"In order to" → "To". "Due to the fact that" → "Because". "At this point in time" → "Now". "It is important to note that" → just state it.
|
||||||
|
|
||||||
|
**23. Excessive Hedging** (-8)
|
||||||
|
"Could potentially possibly", "might have some effect", "it could be argued that".
|
||||||
|
- Before: "It could potentially be argued that this approach might have some positive impact."
|
||||||
|
- After: "This approach works. Here's the data."
|
||||||
|
|
||||||
|
**24. Generic Positive Conclusions** (-10)
|
||||||
|
"The future looks bright", "Exciting times lie ahead", "continues their journey toward excellence".
|
||||||
|
- Before: "The future looks bright for AI in marketing. Exciting times lie ahead."
|
||||||
|
- After: "They plan to double their AI budget next quarter. We'll see if it pays off."
|
||||||
|
|
||||||
|
## Scoring Method
|
||||||
|
|
||||||
|
Start at 100. Deduct points for each pattern detected (penalties listed above). Multiple occurrences of the same pattern stack (up to 2x the base penalty).
|
||||||
|
|
||||||
|
- **90-100**: Human-sounding. Clean.
|
||||||
|
- **70-89**: Minor AI tells. Quick fixes needed.
|
||||||
|
- **50-69**: Obvious AI patterns. Significant rewrite needed.
|
||||||
|
- **0-49**: Reads like ChatGPT output. Full rewrite.
|
||||||
|
|
||||||
|
## What Good Looks Like
|
||||||
|
|
||||||
|
Good human writing has:
|
||||||
|
- Opinions, not just reporting
|
||||||
|
- Varied sentence rhythm (short punches + longer ones)
|
||||||
|
- Specific details over vague claims
|
||||||
|
- Simple verbs (is, has, does) over elaborate constructions
|
||||||
|
- Acknowledgment of uncertainty or mixed feelings
|
||||||
|
- First-person perspective when appropriate
|
||||||
|
- Humor, edge, or personality
|
||||||
|
- Concrete examples with names, dates, numbers
|
||||||
28
content-ops/experts/instagram.md
Normal file
28
content-ops/experts/instagram.md
Normal file
|
|
@ -0,0 +1,28 @@
|
||||||
|
# Expert Panel: Instagram Visual Content
|
||||||
|
|
||||||
|
## Context
|
||||||
|
- Focus on Instagram infographic and data-driven post captions
|
||||||
|
- Data-forward, insight-dense, and visually bold content
|
||||||
|
- Captions should be punchy, hashtagged, and scroll-stopping
|
||||||
|
|
||||||
|
## The 6 Experts
|
||||||
|
|
||||||
|
1. **Visual Impact Scorer** — Is the image concept scroll-stopping? Would someone pause their scroll for this graphic? Does the headline/hook on the visual create an immediate "I need to read this" reaction? Checks: bold contrast, clear hierarchy, data visualization quality, thumb-stopping composition.
|
||||||
|
|
||||||
|
2. **Caption Copywriter** — Is the caption punchy and platform-native? First line is the hook before "more" truncation. Body delivers the insight in 2-3 tight sentences. Hashtags are relevant and placed at the end. No fluff, no filler. Checks: hook strength, hashtag relevance (4-8 tags), caption length (ideal 125-200 chars), CTA presence.
|
||||||
|
|
||||||
|
3. **Data Accuracy Checker** — Is the stat or insight correct and properly sourced? Is this a real data point, not a vague "studies show" claim? Is the number specific, recent (within 12 months preferred), and directly relevant? Checks: specificity of data, source existence, recency, no hallucinated stats.
|
||||||
|
|
||||||
|
4. **Timeliness Validator** — Is this insight still relevant and interesting today? Has this exact take already flooded Instagram this week? Does the topic align with current conversations in the target space? Checks: topic freshness, differentiation from overposted angles.
|
||||||
|
|
||||||
|
5. **Brand Consistency** — Does this match the configured brand voice? Direct, data-driven, slightly contrarian, ROI-obsessed. No motivation porn. No vague inspiration. Only specific, actionable, data-backed insights. Checks: brand voice alignment, topic fit, no generic "hustle" content.
|
||||||
|
|
||||||
|
6. **AI Writing Detector (Humanizer)** — 1.5x weight. Checks caption text specifically. Instagram captions fail when they sound AI-generated. See `experts/humanizer.md` for the full 24-pattern rubric. Special Instagram flags: forced hashtag stuffing, overly polished corporate tone, "💡 Key insight:" formatting, sycophantic openers, significance inflation.
|
||||||
|
|
||||||
|
## Scoring Criteria
|
||||||
|
- **Visual hook** — Would someone stop scrolling for this?
|
||||||
|
- **Caption punch** — First sentence earns the "more" tap
|
||||||
|
- **Data credibility** — Real numbers, real sources
|
||||||
|
- **Timeliness** — Fresh angle, not stale take
|
||||||
|
- **Brand fit** — Matches configured brand voice
|
||||||
|
- **Human voice** — Reads like a real person, not a content bot
|
||||||
25
content-ops/experts/linkedin.md
Normal file
25
content-ops/experts/linkedin.md
Normal file
|
|
@ -0,0 +1,25 @@
|
||||||
|
# Expert Panel: LinkedIn Posts
|
||||||
|
|
||||||
|
## The 10 Experts
|
||||||
|
|
||||||
|
1. **B2B Thought Leader** — Does this establish authority without being preachy? Would a CMO reshare this?
|
||||||
|
2. **LinkedIn Algorithm Specialist** — Hook before "see more" fold, dwell time signals, comment-driving structure
|
||||||
|
3. **Storytelling Coach** — Is there a real story? Personal anecdote? Emotional arc?
|
||||||
|
4. **Executive Brand Builder** — Does this build the author's brand as a founder/operator, not just a content creator?
|
||||||
|
5. **Engagement Optimizer** — Will this get comments, not just likes? Is there a debate hook?
|
||||||
|
6. **Hook Writer** — First 2-3 lines before the fold. Would you click "see more"?
|
||||||
|
7. **Professional Copywriter** — Professional but not corporate. Warm but not soft. Every sentence counts.
|
||||||
|
8. **Data Visualization Expert** — Are numbers presented compellingly? Could a stat be formatted as a callout?
|
||||||
|
9. **Community Builder** — Does this invite conversation? Does it make readers feel part of something?
|
||||||
|
10. **Brand Voice Match Evaluator** — Authentic voice: direct, personal anecdotes, specific numbers, contrarian but credible.
|
||||||
|
11. **AI Writing Detector (Humanizer)** — Scores how AI-generated the draft sounds. Checks all 24 humanizer patterns. See `experts/humanizer.md` for full rubric. This expert's score is weighted 1.5x.
|
||||||
|
|
||||||
|
## Scoring Criteria
|
||||||
|
- **Hook before "see more" fold** — First 2-3 lines must compel the click
|
||||||
|
- **Story arc** — Setup → insight → takeaway
|
||||||
|
- **Professional but not corporate** — No jargon, no "I'm excited to announce"
|
||||||
|
- **Personal anecdotes** — Real stories from experience
|
||||||
|
- **Specific data** — Numbers, percentages, dollar amounts
|
||||||
|
- **Engagement drivers** — Questions, debate hooks, "what would you do?"
|
||||||
|
- **Line spacing/readability** — Short paragraphs, white space, scannable
|
||||||
|
- **CTA that drives comments** — Not "like and share" but genuine engagement prompts
|
||||||
23
content-ops/experts/newsletter.md
Normal file
23
content-ops/experts/newsletter.md
Normal file
|
|
@ -0,0 +1,23 @@
|
||||||
|
# Expert Panel: Newsletter
|
||||||
|
|
||||||
|
## The 10 Experts
|
||||||
|
|
||||||
|
1. **Email Marketer** — Deliverability, open rate optimization, sender reputation signals
|
||||||
|
2. **Newsletter Growth Expert** — Is this the kind of content that drives forwards and referrals?
|
||||||
|
3. **Copywriter** — Every sentence earns its place. No filler. Punchy and clear.
|
||||||
|
4. **Data Journalist** — Are claims backed by data? Are sources credible and recent?
|
||||||
|
5. **CTA Specialist** — Is there a clear action? Does the reader know what to do next?
|
||||||
|
6. **Subject Line Expert** — Would you open this email? 40-50 chars, curiosity or value signal
|
||||||
|
7. **Retention Specialist** — Will subscribers stay after reading this? Does it deliver on the promise?
|
||||||
|
8. **Layout/Formatting Coach** — Scannable? Headers, bullets, bold text for skimmers?
|
||||||
|
9. **Value-Per-Word Optimizer** — Information density. Could this be 20% shorter and still deliver?
|
||||||
|
10. **Brand Voice Match Evaluator** — Does this sound like the author's newsletter voice: direct, data-rich, actionable?
|
||||||
|
11. **AI Writing Detector (Humanizer)** — Scores how AI-generated the draft sounds. See `experts/humanizer.md` for full rubric. This expert's score is weighted 1.5x.
|
||||||
|
|
||||||
|
## Scoring Criteria
|
||||||
|
- **Value density** — Every paragraph teaches something specific
|
||||||
|
- **Scanability** — Headers, bullets, bold. A skimmer gets 80% of the value
|
||||||
|
- **"Why this matters" clarity** — Reader knows immediately why they should care
|
||||||
|
- **CTA clarity** — One clear next action
|
||||||
|
- **Would I forward this?** — The ultimate newsletter test
|
||||||
|
- **Subject line** — Opens the email, sets expectations
|
||||||
34
content-ops/experts/podcast-quotes.md
Normal file
34
content-ops/experts/podcast-quotes.md
Normal file
|
|
@ -0,0 +1,34 @@
|
||||||
|
# Expert Panel: Podcast Quote Cards
|
||||||
|
|
||||||
|
## Context
|
||||||
|
- Focus on quote cards extracted from podcast episodes or guest appearances
|
||||||
|
- Quote cards live on Instagram, LinkedIn, and X — they must work visually AND as text
|
||||||
|
- Target audience: marketers, agency owners, founders, operators
|
||||||
|
|
||||||
|
## The 6 Experts
|
||||||
|
|
||||||
|
1. **Quote Impact Scorer** — Is this actually quotable? Would someone screenshot this and send it to a friend? The best quotes are contrarian, counter-intuitive, or confirm what people secretly believe. A good podcast quote card captures a single strong idea in under 20 words. Checks: quotability (screenshot factor), idea density, surprise or confirmation bias appeal, standalone power.
|
||||||
|
|
||||||
|
2. **Context Validator** — Does the quote make sense without the full episode? Quote cards get ripped from context constantly. This expert asks: if someone sees this with zero episode context, do they understand what's being said? Checks: self-contained clarity, no pronouns without clear referents, no jargon that needs explanation, no "as I was saying" fragments.
|
||||||
|
|
||||||
|
3. **Attribution Accuracy** — Is the quote attributed correctly? Correct speaker name and title. No misattribution, no paraphrase presented as direct quote. Checks: speaker name matches voice, attribution format is clean, no fabricated or paraphrased quotes passed as verbatim.
|
||||||
|
|
||||||
|
4. **Audience Relevance** — Does the target audience care about this topic? Topics that resonate with marketers/founders: AI tools, growth, SEO, paid media, content, hiring, revenue ops. Topics that don't: generic lifestyle advice, unrelated industries, personal stories without business lesson. Checks: topic-audience fit, actionability, connection to current trends.
|
||||||
|
|
||||||
|
5. **Visual Text Scorer** — Will this text read well on an image card? Quote cards are read at thumbnail size on mobile. Long quotes fail. Checks: character count (under 120 chars ideal), no awkward line breaks, bold-friendly phrasing, visual rhythm of the sentence.
|
||||||
|
|
||||||
|
6. **AI Writing Detector (Humanizer)** — 1.5x weight. Applies to both the quote text and the caption. Quotes from real podcast episodes should sound natural and human. Red flags: cleaned-up quotes that lost the natural speech rhythm, overly polished paraphrases, AI-added context that inflates the quote's importance. See `experts/humanizer.md` for the full rubric.
|
||||||
|
|
||||||
|
## Scoring Criteria
|
||||||
|
- **Screenshot factor** — Would someone save and share this?
|
||||||
|
- **Self-contained** — Makes sense without episode context
|
||||||
|
- **Attribution accuracy** — Correct speaker, correct format
|
||||||
|
- **Audience fit** — Relevant to marketers/founders
|
||||||
|
- **Visual readability** — Works at small size on mobile
|
||||||
|
- **Human voice** — Sounds like a real person said this
|
||||||
|
|
||||||
|
## Quote Standards
|
||||||
|
- Under 120 characters is ideal for visual cards
|
||||||
|
- Direct quotes only — no paraphrasing
|
||||||
|
- Attribution: "— [Name]" or "[Name] on [Show Name]"
|
||||||
|
- Caption: 1-2 sentences + 3-5 hashtags
|
||||||
21
content-ops/experts/recruiting.md
Normal file
21
content-ops/experts/recruiting.md
Normal file
|
|
@ -0,0 +1,21 @@
|
||||||
|
# Expert Panel: Recruiting
|
||||||
|
|
||||||
|
## The 10 Experts
|
||||||
|
|
||||||
|
1. **Agency Recruiter** — Understands agency culture, pace, and what makes someone thrive vs burn out
|
||||||
|
2. **Talent Acquisition Leader** — Pipeline strategy, sourcing channels, employer branding
|
||||||
|
3. **Hiring Manager** — Day-to-day fit. Can this person actually do the job on day one?
|
||||||
|
4. **Culture Fit Assessor** — Values alignment, team dynamics, growth mindset indicators
|
||||||
|
5. **Compensation Analyst** — Is the offer competitive? Market rate awareness
|
||||||
|
6. **LinkedIn Sourcer** — Profile signals, career trajectory patterns, red flags in work history
|
||||||
|
7. **Diversity Specialist** — Diverse perspectives, inclusive hiring practices, bias checks
|
||||||
|
8. **Startup Hiring Expert** — Can this person handle ambiguity, wear multiple hats, move fast?
|
||||||
|
9. **AI Fluency Evaluator** — Does this candidate use AI tools? Can they leverage AI in their role?
|
||||||
|
10. **Industry Insider** — Understands the relevant industry landscape, competitor talent pools
|
||||||
|
|
||||||
|
## Scoring Criteria
|
||||||
|
- **Candidate-role fit** — Skills, experience, and trajectory match the role requirements
|
||||||
|
- **Evidence quality** — Claims backed by portfolio, metrics, references (not just resume bullets)
|
||||||
|
- **Risk assessment accuracy** — Honest about gaps, flight risk, culture mismatch potential
|
||||||
|
- **Outreach angle creativity** — What would make this person respond to a cold message?
|
||||||
|
- **AI fluency signal strength** — Evidence of AI tool usage, automation mindset, future-readiness
|
||||||
22
content-ops/experts/seo-strategy.md
Normal file
22
content-ops/experts/seo-strategy.md
Normal file
|
|
@ -0,0 +1,22 @@
|
||||||
|
# Expert Panel: SEO Strategy
|
||||||
|
|
||||||
|
## The 10 Experts
|
||||||
|
|
||||||
|
1. **Technical SEO Specialist** — Site architecture, crawlability, Core Web Vitals, structured data
|
||||||
|
2. **Content Strategist** — Topic clusters, content gaps, SERP intent alignment
|
||||||
|
3. **Conversion Rate Optimizer** — Does the SEO strategy connect to revenue, not just traffic?
|
||||||
|
4. **Revenue Attribution Expert** — Can we tie this recommendation to dollar outcomes?
|
||||||
|
5. **Competitive Analyst** — What are competitors doing? Where are the gaps?
|
||||||
|
6. **AI/AEO Specialist** — How does this strategy account for AI Overviews, ChatGPT citations, Perplexity?
|
||||||
|
7. **Data Scientist** — Is the analysis statistically sound? Are trends real or noise?
|
||||||
|
8. **Growth Hacker** — What's the fastest path to measurable results?
|
||||||
|
9. **Operations Expert** — Is this feasible with current resources and timelines?
|
||||||
|
10. **ROI Calculator** — Does this pass a 4:1 ROI bar? What's the expected return?
|
||||||
|
|
||||||
|
## Scoring Criteria
|
||||||
|
- **Data backing** — Real data cited, not projections or assumptions
|
||||||
|
- **Actionable specificity** — Clear next steps, not vague "optimize your content"
|
||||||
|
- **ROI estimate quality** — Realistic, with assumptions stated
|
||||||
|
- **Risk assessment** — Honest about what could go wrong
|
||||||
|
- **Feasibility** — Can this actually be executed with available resources?
|
||||||
|
- **Alignment with priorities** — Serves current business goals
|
||||||
30
content-ops/experts/x-articles.md
Normal file
30
content-ops/experts/x-articles.md
Normal file
|
|
@ -0,0 +1,30 @@
|
||||||
|
# Expert Panel: X Articles (Long-Form X Posts)
|
||||||
|
|
||||||
|
## Context
|
||||||
|
- Focus on X ARTICLES (long-form X posts), not just threads
|
||||||
|
- These are meaty, value-dense posts that stop the scroll and deliver insight
|
||||||
|
|
||||||
|
## The 10 Experts
|
||||||
|
|
||||||
|
1. **Viral X Writer** — Judges structure, pacing, and viral mechanics. Does this follow the patterns that get 100K+ impressions?
|
||||||
|
2. **Engagement Strategist** — Analyzes reply-bait, shareability, and algorithm signals. Will this get engagement or just impressions?
|
||||||
|
3. **Hook Specialist** — First 2 lines only. Would you stop scrolling? Is there a curiosity gap, contrarian claim, or surprising stat?
|
||||||
|
4. **Data Storytelling Expert** — Are the numbers specific, recent, and surprising? Are they woven into narrative or just dropped in?
|
||||||
|
5. **Contrarian Positioning Coach** — Is there a genuine contrarian angle? Or is this just conventional wisdom repackaged?
|
||||||
|
6. **CTA Optimizer** — Does the ending drive action? Comments, follows, saves? Is it natural or forced?
|
||||||
|
7. **Audience Growth Expert** — Will this attract NEW followers or just engage existing ones? Does it signal expertise?
|
||||||
|
8. **Algorithm Specialist** — Post length, formatting, engagement signals. Will X's algorithm boost this?
|
||||||
|
9. **Copywriter** — Sentence-level quality. Short punchy sentences? No filler? Every word earns its place?
|
||||||
|
10. **Brand Voice Match Evaluator** — Does this sound like a real person wrote it? Authentic voice: direct, personal anecdotes, specific numbers, contrarian but credible.
|
||||||
|
11. **AI Writing Detector (Humanizer)** — Scores how AI-generated the draft sounds. Checks all 24 humanizer patterns: banned vocabulary, significance inflation, formulaic structures, vague attributions, promotional language, hedging, em dash overuse, triple structures, generic conclusions. See `experts/humanizer.md` for full rubric. This expert's score is weighted 1.5x — if it flags the draft as AI-sounding, the draft MUST be revised.
|
||||||
|
|
||||||
|
## Scoring Criteria
|
||||||
|
- **Hook in first 2 lines** — Would you stop scrolling for this?
|
||||||
|
- **Data specificity** — Real numbers, not vague claims
|
||||||
|
- **Contrarian angle** — Genuine insight, not clickbait
|
||||||
|
- **Story arc** — Setup → tension → payoff
|
||||||
|
- **Voice authenticity** — Sounds like a real person, not a content mill
|
||||||
|
- **CTA strength** — Natural engagement driver
|
||||||
|
- **Readability** — Short paragraphs, line breaks, scannable
|
||||||
|
- **Shareability** — "I need to repost this"
|
||||||
|
- **Visual elements** — At least one ASCII diagram or visual element
|
||||||
24
content-ops/experts/youtube-shorts.md
Normal file
24
content-ops/experts/youtube-shorts.md
Normal file
|
|
@ -0,0 +1,24 @@
|
||||||
|
# Expert Panel: YouTube Shorts
|
||||||
|
|
||||||
|
## The 10 Experts
|
||||||
|
|
||||||
|
1. **Short-Form Creator** — Does this work as a standalone piece? Would you watch it on your For You page?
|
||||||
|
2. **Retention Curve Specialist** — Where will viewers drop off? Is every second justified?
|
||||||
|
3. **Script Doctor** — Is the script tight? No wasted words? Clear structure?
|
||||||
|
4. **Visual Storytelling Expert** — What should be on screen at each moment? B-roll, text overlays, screen shares?
|
||||||
|
5. **TikTok/Reels Crossover Expert** — Would this work cross-platform? Format-native for each?
|
||||||
|
6. **Pacing Coach** — Is the energy right? No dead spots? Builds momentum?
|
||||||
|
7. **Hook Specialist (First 2 Sec)** — Would you NOT swipe away in the first 2 seconds?
|
||||||
|
8. **Payoff Designer** — Does the ending deliver? Is there a satisfying resolution or surprise?
|
||||||
|
9. **Re-watch Optimizer** — Is there a loop? A detail you'd catch on second watch?
|
||||||
|
10. **Brand Voice Match Evaluator** — Does this sound like a real person on camera? Direct, confident, slightly irreverent?
|
||||||
|
11. **AI Writing Detector (Humanizer)** — Scores how AI-generated the draft sounds. See `experts/humanizer.md` for full rubric. This expert's score is weighted 1.5x.
|
||||||
|
|
||||||
|
## Scoring Criteria
|
||||||
|
- **Hook in first 2 seconds** — Pattern interrupt, surprising claim, or visual hook
|
||||||
|
- **Setup-payoff structure** — Clear promise → delivery
|
||||||
|
- **30-60 sec runtime** — Tight, no filler
|
||||||
|
- **Visual cue quality** — Text overlays, B-roll suggestions, screen share moments
|
||||||
|
- **"Would I watch this twice?"** — Re-watch value
|
||||||
|
- **Shareability** — "Send this to someone who needs to hear this"
|
||||||
|
- **CTA** — Natural, not forced ("Comment X and I'll show you")
|
||||||
74
content-ops/references/expert-assembly.md
Normal file
74
content-ops/references/expert-assembly.md
Normal file
|
|
@ -0,0 +1,74 @@
|
||||||
|
# Expert Assembly Guide
|
||||||
|
|
||||||
|
Examples of domain-specific experts to add based on offer context. Use this when
|
||||||
|
auto-assembling panels for unfamiliar domains.
|
||||||
|
|
||||||
|
## Assembly principle
|
||||||
|
The panel needs experts who understand both the **craft** (how to make good content/strategy)
|
||||||
|
and the **domain** (the specific market, audience, and offer being scored).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Domain Expert Examples
|
||||||
|
|
||||||
|
### SaaS / Software
|
||||||
|
- SaaS Conversion Expert — free trial vs demo, PLG patterns, activation metrics
|
||||||
|
- Developer Audience Specialist — if targeting devs, knows what resonates vs cringe
|
||||||
|
- Pricing Page Analyst — tier structure, anchoring, feature comparison tables
|
||||||
|
|
||||||
|
### E-Commerce / DTC
|
||||||
|
- DTC Brand Strategist — unboxing, retention loops, subscription models
|
||||||
|
- Product Page Optimizer — hero images, reviews, urgency without fakeness
|
||||||
|
- Email/SMS Commerce Expert — abandoned cart, post-purchase, winback flows
|
||||||
|
|
||||||
|
### Healthcare / Medical
|
||||||
|
- Healthcare Compliance Expert — HIPAA, FDA advertising rules, claim substantiation
|
||||||
|
- Patient Communication Specialist — empathy without condescension, plain language
|
||||||
|
- Medical Professional Audience Expert — if targeting HCPs, clinical credibility
|
||||||
|
|
||||||
|
### Financial Services
|
||||||
|
- FinServ Compliance Reviewer — SEC/FINRA advertising rules, disclaimers
|
||||||
|
- Trust & Authority Expert — credential signaling, risk communication
|
||||||
|
- Retail Investor Audience Specialist — jargon translation, fear/greed calibration
|
||||||
|
|
||||||
|
### Food & Beverage / Restaurant
|
||||||
|
- Food Marketing Expert — appetite appeal, sensory language, seasonal hooks
|
||||||
|
- Local Business Marketing Specialist — geo-targeting, community signals
|
||||||
|
- Visual Food Stylist — photography/visual standards for food content
|
||||||
|
|
||||||
|
### Professional Services / Agency
|
||||||
|
- B2B Services Buyer Expert — what CMOs/VPs actually respond to
|
||||||
|
- Case Study Analyst — proof structure, metrics that matter, client story arc
|
||||||
|
- Competitive Positioning Expert — differentiation in crowded service markets
|
||||||
|
|
||||||
|
### Education / Courses
|
||||||
|
- Course Launch Expert — urgency, social proof, transformation promise
|
||||||
|
- Curriculum Designer — learning outcomes, module structure, completion optimization
|
||||||
|
- Student Success Storyteller — before/after, specific outcomes, relatable journeys
|
||||||
|
|
||||||
|
### Real Estate
|
||||||
|
- Real Estate Marketing Expert — listing copy, neighborhood selling, visual standards
|
||||||
|
- Luxury Market Specialist — if high-end, understands aspiration vs information
|
||||||
|
- Lead Nurture Expert — long sales cycles, drip sequence optimization
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Universal Experts (always consider)
|
||||||
|
|
||||||
|
These roles apply to nearly any domain:
|
||||||
|
|
||||||
|
- **Audience Empathy Expert** — Does the scorer actually understand the target audience's
|
||||||
|
daily reality, pain points, and language?
|
||||||
|
- **Competitive Context Expert** — What else is the audience seeing? Is this differentiated
|
||||||
|
or just another version of what everyone says?
|
||||||
|
- **Offer Clarity Expert** — Can someone understand what they get, what it costs, and what
|
||||||
|
happens next in under 10 seconds?
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## When no pre-built panel exists
|
||||||
|
|
||||||
|
1. Identify the content type → pick 3-4 craft experts (copywriter, designer, strategist, etc.)
|
||||||
|
2. Identify the domain → pick 2-3 domain experts from above or synthesize new ones
|
||||||
|
3. Add humanizer (mandatory) and brand voice match (mandatory)
|
||||||
|
4. Cap at 10, merge overlapping roles
|
||||||
15
content-ops/references/patterns.md
Normal file
15
content-ops/references/patterns.md
Normal file
|
|
@ -0,0 +1,15 @@
|
||||||
|
# Learned Patterns
|
||||||
|
|
||||||
|
Patterns learned from content approvals and rejections. The expert panel checks these
|
||||||
|
before scoring begins and docks points for known-bad patterns.
|
||||||
|
|
||||||
|
<!-- Add patterns as they are learned. Format:
|
||||||
|
|
||||||
|
## [Pattern Name]
|
||||||
|
- **Type:** rejection | preference | override
|
||||||
|
- **Content types:** [which types this applies to]
|
||||||
|
- **Rule:** [What to always/never do]
|
||||||
|
- **Example:** [The specific instance that triggered this]
|
||||||
|
- **Date:** [YYYY-MM-DD]
|
||||||
|
- **Point dock:** [-N points when detected]
|
||||||
|
-->
|
||||||
7
content-ops/requirements.txt
Normal file
7
content-ops/requirements.txt
Normal file
|
|
@ -0,0 +1,7 @@
|
||||||
|
# Core dependencies
|
||||||
|
anthropic>=0.39.0 # Claude API client (for LLM-powered features)
|
||||||
|
feedparser>=6.0.0 # RSS feed parsing (quote mining engine)
|
||||||
|
|
||||||
|
# Optional: for video clip cutting (editorial brain)
|
||||||
|
# yt-dlp # YouTube subtitle/video download (install separately)
|
||||||
|
# ffmpeg # Video cutting (install via system package manager)
|
||||||
25
content-ops/scoring-rubrics/content-quality.md
Normal file
25
content-ops/scoring-rubrics/content-quality.md
Normal file
|
|
@ -0,0 +1,25 @@
|
||||||
|
## Content Quality Rubric (0-100)
|
||||||
|
|
||||||
|
### Hook Power (0-25)
|
||||||
|
- 0-5: Generic, no reason to keep reading
|
||||||
|
- 6-15: Interesting but not urgent
|
||||||
|
- 16-20: Strong curiosity gap or contrarian claim
|
||||||
|
- 21-25: Impossible to scroll past. Specific, surprising, personal.
|
||||||
|
|
||||||
|
### Voice Authenticity (0-25)
|
||||||
|
- Does this sound like a real person wrote it?
|
||||||
|
- Short punchy sentences? Specific numbers? Personal framing?
|
||||||
|
- No corporate jargon? No filler words?
|
||||||
|
- Contrarian but backed by data?
|
||||||
|
|
||||||
|
### Value Density (0-25)
|
||||||
|
- Every sentence earns its place
|
||||||
|
- Specific data points, not vague claims
|
||||||
|
- Actionable insight, not just observation
|
||||||
|
- "I learned something I can use today"
|
||||||
|
|
||||||
|
### Engagement Potential (0-25)
|
||||||
|
- Would someone share/repost this?
|
||||||
|
- Does the CTA invite genuine response?
|
||||||
|
- Does it spark debate or agreement?
|
||||||
|
- Platform-native formatting?
|
||||||
27
content-ops/scoring-rubrics/conversion-quality.md
Normal file
27
content-ops/scoring-rubrics/conversion-quality.md
Normal file
|
|
@ -0,0 +1,27 @@
|
||||||
|
## Conversion Quality Rubric (0-100)
|
||||||
|
|
||||||
|
For landing pages, ads, CTAs, signup flows, pricing pages.
|
||||||
|
|
||||||
|
### Headline / Hero (0-25)
|
||||||
|
- 0-5: Generic, no clear value prop
|
||||||
|
- 6-15: Communicates offer but not compelling
|
||||||
|
- 16-20: Clear value prop with specificity
|
||||||
|
- 21-25: Impossible to bounce. Specific, urgent, addresses the visitor's exact pain.
|
||||||
|
|
||||||
|
### Clarity & Friction (0-25)
|
||||||
|
- Is the offer immediately obvious? (3-second test)
|
||||||
|
- Can a visitor complete the desired action without confusion?
|
||||||
|
- Are there unnecessary form fields, steps, or distractions?
|
||||||
|
- Does copy match the traffic source expectation?
|
||||||
|
|
||||||
|
### Social Proof & Trust (0-25)
|
||||||
|
- Specific results (numbers, names, companies) vs vague testimonials
|
||||||
|
- Trust signals (logos, security badges, guarantees) present and credible
|
||||||
|
- Case studies or data points that prove the claim
|
||||||
|
- No fake urgency or manufactured scarcity
|
||||||
|
|
||||||
|
### CTA Strength (0-25)
|
||||||
|
- CTA copy specific to the action ("Get my audit" > "Submit")
|
||||||
|
- CTA visible without scrolling
|
||||||
|
- Single clear primary action (no competing CTAs)
|
||||||
|
- Micro-copy reduces anxiety ("No credit card required", "2-minute setup")
|
||||||
27
content-ops/scoring-rubrics/evaluation-quality.md
Normal file
27
content-ops/scoring-rubrics/evaluation-quality.md
Normal file
|
|
@ -0,0 +1,27 @@
|
||||||
|
## Evaluation Quality Rubric (0-100)
|
||||||
|
|
||||||
|
For candidate assessments, vendor evaluations, tool comparisons, opportunity scoring.
|
||||||
|
|
||||||
|
### Evidence Quality (0-25)
|
||||||
|
- Claims backed by data, portfolio, references, or verifiable metrics
|
||||||
|
- No resume-bullet-level assertions without proof
|
||||||
|
- Specific examples cited (projects, outcomes, timelines)
|
||||||
|
- Red flags acknowledged, not glossed over
|
||||||
|
|
||||||
|
### Criteria Relevance (0-25)
|
||||||
|
- Evaluation criteria match the actual role/need
|
||||||
|
- Weighted by what matters most (not equal weight to everything)
|
||||||
|
- Context-appropriate (startup vs enterprise, junior vs senior)
|
||||||
|
- Anti-criteria considered (what would make this a bad fit?)
|
||||||
|
|
||||||
|
### Risk Assessment (0-25)
|
||||||
|
- Honest about gaps, unknowns, and flight risk
|
||||||
|
- Mitigation strategies suggested for identified risks
|
||||||
|
- Comparison to alternatives or market baseline
|
||||||
|
- No false confidence — uncertainty stated clearly
|
||||||
|
|
||||||
|
### Actionability (0-25)
|
||||||
|
- Clear recommendation (hire/pass/shortlist, buy/skip, proceed/wait)
|
||||||
|
- Next steps defined
|
||||||
|
- Decision criteria transparent
|
||||||
|
- Dissenting view included if panel is split
|
||||||
21
content-ops/scoring-rubrics/strategic-quality.md
Normal file
21
content-ops/scoring-rubrics/strategic-quality.md
Normal file
|
|
@ -0,0 +1,21 @@
|
||||||
|
## Strategic Quality Rubric (0-100)
|
||||||
|
|
||||||
|
### Data Foundation (0-25)
|
||||||
|
- Real data cited, not projections
|
||||||
|
- Sources verifiable
|
||||||
|
- Numbers specific and recent
|
||||||
|
|
||||||
|
### Actionability (0-25)
|
||||||
|
- Clear next step
|
||||||
|
- Timeline realistic
|
||||||
|
- Resources identified
|
||||||
|
|
||||||
|
### ROI Clarity (0-25)
|
||||||
|
- 4:1 minimum demonstrated
|
||||||
|
- Costs estimated
|
||||||
|
- Comparison to alternatives
|
||||||
|
|
||||||
|
### Risk Assessment (0-25)
|
||||||
|
- Honest about what could go wrong
|
||||||
|
- Mitigation plan included
|
||||||
|
- Dependencies identified
|
||||||
27
content-ops/scoring-rubrics/visual-quality.md
Normal file
27
content-ops/scoring-rubrics/visual-quality.md
Normal file
|
|
@ -0,0 +1,27 @@
|
||||||
|
## Visual Quality Rubric (0-100)
|
||||||
|
|
||||||
|
For charts, data visualizations, infographics, diagrams, slide decks.
|
||||||
|
|
||||||
|
### Data Accuracy & Integrity (0-25)
|
||||||
|
- Numbers match the source data
|
||||||
|
- Axes labeled correctly, scales not misleading
|
||||||
|
- No cherry-picked timeframes or truncated axes
|
||||||
|
- Source cited
|
||||||
|
|
||||||
|
### Visual Clarity (0-25)
|
||||||
|
- Can a viewer understand the main point in under 5 seconds?
|
||||||
|
- Labels readable at expected display size
|
||||||
|
- Color choices accessible (colorblind-safe)
|
||||||
|
- No chart junk (unnecessary gridlines, 3D effects, decorative elements)
|
||||||
|
|
||||||
|
### Insight Delivery (0-25)
|
||||||
|
- Does the visualization tell a story or just display data?
|
||||||
|
- Is the "so what?" obvious without explanation?
|
||||||
|
- Annotations highlight the key takeaway
|
||||||
|
- Title states the insight, not just the topic ("Revenue doubled in Q3" > "Revenue by quarter")
|
||||||
|
|
||||||
|
### Design & Polish (0-25)
|
||||||
|
- Consistent typography and color palette
|
||||||
|
- Proper alignment and spacing
|
||||||
|
- Brand-appropriate styling
|
||||||
|
- Mobile/thumbnail readable if applicable
|
||||||
254
content-ops/scripts/content-quality-gate.py
Normal file
254
content-ops/scripts/content-quality-gate.py
Normal file
|
|
@ -0,0 +1,254 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Content Quality Gate — CI/CD-style gate for content publishing.
|
||||||
|
|
||||||
|
Filters drafts through quality scorer before they publish.
|
||||||
|
Nothing goes live without passing automated quality scoring.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python content-quality-gate.py --input drafts.json
|
||||||
|
python content-quality-gate.py --input drafts.json --conservative
|
||||||
|
python content-quality-gate.py --input drafts.json --threshold 75
|
||||||
|
"""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import argparse
|
||||||
|
from pathlib import Path
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
import subprocess
|
||||||
|
|
||||||
|
SCRIPT_DIR = Path(__file__).resolve().parent
|
||||||
|
PROJECT_DIR = SCRIPT_DIR.parent
|
||||||
|
DATA_DIR = Path(os.environ.get("CONTENT_OPS_DATA_DIR", PROJECT_DIR / "data"))
|
||||||
|
|
||||||
|
DRAFTS_INPUT_FILE = DATA_DIR / "content-drafts-latest.json"
|
||||||
|
DRAFTS_OUTPUT_FILE = DATA_DIR / "content-drafts-filtered.json"
|
||||||
|
QUALITY_SCORES_FILE = DATA_DIR / "quality-scores-latest.json"
|
||||||
|
|
||||||
|
|
||||||
|
def run_quality_scorer(input_file, verbose=False):
|
||||||
|
"""Run the quality scorer on the drafts file."""
|
||||||
|
scorer_script = SCRIPT_DIR / "content-quality-scorer.py"
|
||||||
|
cmd = [
|
||||||
|
sys.executable,
|
||||||
|
str(scorer_script),
|
||||||
|
"--input", str(input_file)
|
||||||
|
]
|
||||||
|
|
||||||
|
if verbose:
|
||||||
|
cmd.append("--verbose")
|
||||||
|
|
||||||
|
print(f"🔍 Running quality scorer...")
|
||||||
|
result = subprocess.run(cmd, capture_output=True, text=True)
|
||||||
|
|
||||||
|
if result.returncode != 0:
|
||||||
|
print(f"❌ Quality scorer failed:")
|
||||||
|
print(f"STDOUT: {result.stdout}")
|
||||||
|
print(f"STDERR: {result.stderr}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
if verbose:
|
||||||
|
print(result.stdout)
|
||||||
|
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def load_quality_scores():
|
||||||
|
"""Load the latest quality scoring results."""
|
||||||
|
if not QUALITY_SCORES_FILE.exists():
|
||||||
|
print(f"❌ Quality scores file not found: {QUALITY_SCORES_FILE}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
try:
|
||||||
|
with open(QUALITY_SCORES_FILE) as f:
|
||||||
|
return json.load(f)
|
||||||
|
except Exception as e:
|
||||||
|
print(f"❌ Error loading quality scores: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def filter_drafts_by_quality(drafts, quality_results, conservative_mode=False):
|
||||||
|
"""Filter drafts based on quality scores."""
|
||||||
|
if not quality_results or "results" not in quality_results:
|
||||||
|
print("❌ No quality results available for filtering")
|
||||||
|
return drafts, []
|
||||||
|
|
||||||
|
passed_ids = set()
|
||||||
|
failed_drafts = []
|
||||||
|
quality_by_id = {}
|
||||||
|
|
||||||
|
for result in quality_results["results"]:
|
||||||
|
draft_id = result.get("draft_id")
|
||||||
|
quality_by_id[draft_id] = result
|
||||||
|
|
||||||
|
if result.get("passed", False):
|
||||||
|
passed_ids.add(draft_id)
|
||||||
|
else:
|
||||||
|
failed_drafts.append({
|
||||||
|
"draft_id": draft_id,
|
||||||
|
"platform": result.get("platform"),
|
||||||
|
"score": result.get("total_score"),
|
||||||
|
"reasons": result.get("failure_reasons", [])
|
||||||
|
})
|
||||||
|
|
||||||
|
filtered_drafts = []
|
||||||
|
|
||||||
|
for draft in drafts:
|
||||||
|
draft_id = draft.get("id")
|
||||||
|
|
||||||
|
if draft_id in quality_by_id:
|
||||||
|
quality_info = quality_by_id[draft_id]
|
||||||
|
draft["quality_score"] = quality_info.get("total_score")
|
||||||
|
draft["quality_passed"] = quality_info.get("passed")
|
||||||
|
draft["quality_reasons"] = quality_info.get("failure_reasons", [])
|
||||||
|
draft["quality_scored_at"] = quality_info.get("scored_at")
|
||||||
|
|
||||||
|
if conservative_mode:
|
||||||
|
filtered_drafts.append(draft)
|
||||||
|
elif draft_id in passed_ids:
|
||||||
|
filtered_drafts.append(draft)
|
||||||
|
|
||||||
|
return filtered_drafts, failed_drafts
|
||||||
|
|
||||||
|
|
||||||
|
def save_filtered_drafts(original_data, filtered_drafts, quality_results):
|
||||||
|
"""Save filtered drafts with quality metadata."""
|
||||||
|
filtered_data = original_data.copy()
|
||||||
|
filtered_data["drafts"] = filtered_drafts
|
||||||
|
filtered_data["filtered_at"] = datetime.now(timezone.utc).isoformat()
|
||||||
|
filtered_data["quality_gate_applied"] = True
|
||||||
|
filtered_data["original_draft_count"] = original_data.get("draft_count", len(original_data.get("drafts", [])))
|
||||||
|
filtered_data["filtered_draft_count"] = len(filtered_drafts)
|
||||||
|
filtered_data["quality_threshold"] = quality_results.get("threshold")
|
||||||
|
filtered_data["quality_pass_rate"] = quality_results.get("pass_rate")
|
||||||
|
filtered_data["quality_average_score"] = quality_results.get("average_score")
|
||||||
|
filtered_data["draft_count"] = len(filtered_drafts)
|
||||||
|
|
||||||
|
DRAFTS_OUTPUT_FILE.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
with open(DRAFTS_OUTPUT_FILE, 'w') as f:
|
||||||
|
json.dump(filtered_data, f, indent=2)
|
||||||
|
|
||||||
|
return filtered_data
|
||||||
|
|
||||||
|
|
||||||
|
def run_quality_gate(input_file=None, conservative_mode=False, verbose=False):
|
||||||
|
"""Run the complete quality gate process."""
|
||||||
|
input_path = Path(input_file) if input_file else DRAFTS_INPUT_FILE
|
||||||
|
|
||||||
|
if not input_path.exists():
|
||||||
|
print(f"❌ Input file not found: {input_path}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
try:
|
||||||
|
with open(input_path) as f:
|
||||||
|
original_data = json.load(f)
|
||||||
|
drafts = original_data.get("drafts", [])
|
||||||
|
print(f"📊 Loaded {len(drafts)} drafts from {input_path}")
|
||||||
|
except Exception as e:
|
||||||
|
print(f"❌ Error loading drafts: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
if not drafts:
|
||||||
|
print("❌ No drafts found in input file")
|
||||||
|
return None
|
||||||
|
|
||||||
|
if not run_quality_scorer(input_path, verbose):
|
||||||
|
return None
|
||||||
|
|
||||||
|
quality_results = load_quality_scores()
|
||||||
|
if not quality_results:
|
||||||
|
return None
|
||||||
|
|
||||||
|
filtered_drafts, failed_drafts = filter_drafts_by_quality(drafts, quality_results, conservative_mode)
|
||||||
|
filtered_data = save_filtered_drafts(original_data, filtered_drafts, quality_results)
|
||||||
|
|
||||||
|
original_count = len(drafts)
|
||||||
|
filtered_count = len(filtered_drafts)
|
||||||
|
filtered_out = original_count - filtered_count
|
||||||
|
|
||||||
|
print(f"\n{'='*60}")
|
||||||
|
print(f"QUALITY GATE RESULTS")
|
||||||
|
print(f"{'='*60}")
|
||||||
|
print(f"Original drafts: {original_count}")
|
||||||
|
print(f"Passed quality gate: {filtered_count}")
|
||||||
|
print(f"Filtered out: {filtered_out}")
|
||||||
|
print(f"Pass rate: {quality_results.get('pass_rate', 0):.1f}%")
|
||||||
|
print(f"Average score: {quality_results.get('average_score', 0):.1f}/100")
|
||||||
|
print(f"Threshold: {quality_results.get('threshold', 60)}/100")
|
||||||
|
|
||||||
|
if conservative_mode:
|
||||||
|
print(f"\n⚠️ CONSERVATIVE MODE: All drafts passed through with quality flags")
|
||||||
|
|
||||||
|
platform_stats = {}
|
||||||
|
for draft in filtered_drafts:
|
||||||
|
platform = draft.get("platform", "unknown")
|
||||||
|
platform_stats[platform] = platform_stats.get(platform, 0) + 1
|
||||||
|
|
||||||
|
if platform_stats:
|
||||||
|
print(f"\n📱 Filtered Drafts by Platform:")
|
||||||
|
for platform, count in sorted(platform_stats.items()):
|
||||||
|
print(f" {platform}: {count}")
|
||||||
|
|
||||||
|
if failed_drafts:
|
||||||
|
failure_reasons = {}
|
||||||
|
for failed in failed_drafts:
|
||||||
|
for reason in failed["reasons"]:
|
||||||
|
failure_reasons[reason] = failure_reasons.get(reason, 0) + 1
|
||||||
|
|
||||||
|
if failure_reasons:
|
||||||
|
print(f"\n❌ Top Failure Reasons:")
|
||||||
|
for reason, count in sorted(failure_reasons.items(), key=lambda x: x[1], reverse=True)[:3]:
|
||||||
|
print(f" {reason}: {count} drafts")
|
||||||
|
|
||||||
|
print(f"\n💾 Filtered drafts saved to: {DRAFTS_OUTPUT_FILE}")
|
||||||
|
|
||||||
|
if filtered_count == 0:
|
||||||
|
print("\n⚠️ WARNING: No drafts passed quality gate!")
|
||||||
|
print("Consider lowering threshold or improving content quality.")
|
||||||
|
return None
|
||||||
|
|
||||||
|
return filtered_data
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(description="Filter content drafts through quality gate")
|
||||||
|
parser.add_argument("--input", type=str, help="Input drafts JSON file")
|
||||||
|
parser.add_argument("--conservative", action="store_true", help="Pass all drafts but add quality flags")
|
||||||
|
parser.add_argument("--verbose", "-v", action="store_true", help="Verbose output")
|
||||||
|
parser.add_argument("--threshold", type=float, help="Override quality threshold")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
if args.threshold:
|
||||||
|
weights_file = DATA_DIR / "quality-scorer-weights.json"
|
||||||
|
if weights_file.exists():
|
||||||
|
try:
|
||||||
|
with open(weights_file) as f:
|
||||||
|
weights_data = json.load(f)
|
||||||
|
weights_data["threshold"] = args.threshold
|
||||||
|
with open(weights_file, 'w') as f:
|
||||||
|
json.dump(weights_data, f, indent=2)
|
||||||
|
print(f"🎯 Set threshold to {args.threshold}")
|
||||||
|
except Exception as e:
|
||||||
|
print(f"⚠ Could not update threshold: {e}")
|
||||||
|
|
||||||
|
filtered_data = run_quality_gate(
|
||||||
|
input_file=args.input,
|
||||||
|
conservative_mode=args.conservative,
|
||||||
|
verbose=args.verbose
|
||||||
|
)
|
||||||
|
|
||||||
|
if filtered_data:
|
||||||
|
filtered_count = filtered_data.get("filtered_draft_count", 0)
|
||||||
|
if filtered_count > 0:
|
||||||
|
print(f"\n📤 Next: Pass filtered drafts to your publishing pipeline")
|
||||||
|
else:
|
||||||
|
print(f"\n⚠️ No drafts to publish. Consider:")
|
||||||
|
print(f" • Lowering threshold: --threshold 50")
|
||||||
|
print(f" • Conservative mode: --conservative")
|
||||||
|
print(f" • Improving content quality in transform step")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
525
content-ops/scripts/content-quality-scorer.py
Normal file
525
content-ops/scripts/content-quality-scorer.py
Normal file
|
|
@ -0,0 +1,525 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Content Quality Scorer — Automated content scoring engine.
|
||||||
|
|
||||||
|
Scores drafts against configurable voice patterns BEFORE they publish.
|
||||||
|
Five scoring dimensions: voice similarity, specificity, AI slop detection,
|
||||||
|
length appropriateness, and engagement potential.
|
||||||
|
|
||||||
|
Input: JSON file with drafts array
|
||||||
|
Output: scored drafts with pass/fail recommendations
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python content-quality-scorer.py --input drafts.json --verbose
|
||||||
|
python content-quality-scorer.py --input drafts.json --threshold 75
|
||||||
|
python content-quality-scorer.py --init-weights # Create default weights file
|
||||||
|
"""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import re
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import argparse
|
||||||
|
from pathlib import Path
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
from collections import Counter
|
||||||
|
import math
|
||||||
|
|
||||||
|
# ── Configuration (all paths relative/configurable) ──
|
||||||
|
|
||||||
|
SCRIPT_DIR = Path(__file__).resolve().parent
|
||||||
|
PROJECT_DIR = SCRIPT_DIR.parent
|
||||||
|
DATA_DIR = Path(os.environ.get("CONTENT_OPS_DATA_DIR", PROJECT_DIR / "data"))
|
||||||
|
|
||||||
|
DRAFTS_FILE = DATA_DIR / "content-drafts-latest.json"
|
||||||
|
WEIGHTS_FILE = DATA_DIR / "quality-scorer-weights.json"
|
||||||
|
LOG_FILE = DATA_DIR / "quality-scores-log.json"
|
||||||
|
|
||||||
|
# Default scoring threshold (adjustable)
|
||||||
|
DEFAULT_THRESHOLD = 60
|
||||||
|
|
||||||
|
# Platform character limits
|
||||||
|
PLATFORM_LIMITS = {
|
||||||
|
"x": {"min": 50, "max": 280, "optimal_min": 150, "optimal_max": 260},
|
||||||
|
"linkedin": {"min": 200, "max": 1500, "optimal_min": 500, "optimal_max": 1200},
|
||||||
|
"youtube_short": {"min": 100, "max": 800, "optimal_min": 200, "optimal_max": 600},
|
||||||
|
"newsletter": {"min": 300, "max": 2000, "optimal_min": 800, "optimal_max": 1600},
|
||||||
|
}
|
||||||
|
|
||||||
|
# Banned AI words — penalized in scoring
|
||||||
|
BANNED_WORDS = [
|
||||||
|
"leverage", "synergy", "ecosystem", "holistic", "at the end of the day",
|
||||||
|
"delve", "tapestry", "landscape", "multifaceted", "nuanced", "pivotal",
|
||||||
|
"realm", "robust", "seamless", "testament", "transformative", "underscore",
|
||||||
|
"utilize", "whilst", "keen", "embark", "comprehensive", "intricate",
|
||||||
|
"commendable", "meticulous", "paramount", "groundbreaking", "innovative",
|
||||||
|
"cutting-edge", "paradigm", "Additionally", "crucial", "enduring",
|
||||||
|
"enhance", "fostering", "garner", "highlight", "interplay", "intricacies",
|
||||||
|
"showcase", "vibrant", "valuable", "profound", "renowned", "breathtaking",
|
||||||
|
"nestled", "stunning", "I'm excited to share", "I think maybe",
|
||||||
|
"It could potentially", "dive into", "game-changer", "unlock"
|
||||||
|
]
|
||||||
|
|
||||||
|
# AI patterns to detect
|
||||||
|
AI_PATTERNS = [
|
||||||
|
(r"pivotal moment|is a testament|stands as", "significance_inflation"),
|
||||||
|
(r"boasts|vibrant|commitment to", "promotional_language"),
|
||||||
|
(r"experts believe|industry reports|studies show", "vague_attribution"),
|
||||||
|
(r"despite.{1,50}continues to", "formulaic_structure"),
|
||||||
|
(r"serves as|acts as|functions as", "copula_avoidance"),
|
||||||
|
(r"it's not just .{1,30}, it's", "negative_parallelism"),
|
||||||
|
(r"could potentially|might possibly|may perhaps", "excessive_hedging"),
|
||||||
|
(r"the future looks bright|exciting times ahead|stay tuned", "generic_conclusion"),
|
||||||
|
]
|
||||||
|
|
||||||
|
# Voice markers — configurable positive signals for your brand voice
|
||||||
|
# Override these by setting VOICE_MARKERS_FILE env var pointing to a JSON file
|
||||||
|
VOICE_MARKERS = [
|
||||||
|
# Numbers with specificity
|
||||||
|
(r'\$[\d,]+[KkMmBb]?(?:\+)?', 2.0, "revenue_markers"),
|
||||||
|
(r'\d+%', 1.5, "percentage_stats"),
|
||||||
|
(r'\d+x', 1.5, "multiplier_stats"),
|
||||||
|
(r'\d+ (?:hours?|minutes?|days?|weeks?|months?|years?)', 1.0, "time_specifics"),
|
||||||
|
(r'\d+ (?:pages?|pieces?|tools?|agents?|companies|founders?|members)', 1.0, "count_specifics"),
|
||||||
|
# Personal framing
|
||||||
|
(r'I (?:built|found|asked|remember|had lunch)', 2.0, "personal_framing"),
|
||||||
|
(r'Here\'s what happened|A friend who|I asked \d+', 1.5, "story_framing"),
|
||||||
|
# Contrarian hooks
|
||||||
|
(r'Most people .{1,50} wrong|Everyone says .{1,30} That\'s', 2.0, "contrarian_hooks"),
|
||||||
|
(r'Harsh reality:', 1.5, "harsh_reality"),
|
||||||
|
# Engagement patterns
|
||||||
|
(r'What\'s your take\?|What did I miss\?|What would you do', 1.0, "engagement_cta"),
|
||||||
|
# Short sentences (under 15 words)
|
||||||
|
(r'[.!?]\s+[A-Z][^.!?]{1,75}[.!?]', 0.5, "short_sentences"),
|
||||||
|
]
|
||||||
|
|
||||||
|
# Default scoring weights
|
||||||
|
DEFAULT_WEIGHTS = {
|
||||||
|
"voice_similarity": 0.35,
|
||||||
|
"specificity": 0.25,
|
||||||
|
"slop_penalty": 0.20,
|
||||||
|
"length_appropriateness": 0.10,
|
||||||
|
"engagement_potential": 0.10,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def load_weights():
|
||||||
|
"""Load scoring weights from file or return defaults."""
|
||||||
|
if WEIGHTS_FILE.exists():
|
||||||
|
try:
|
||||||
|
with open(WEIGHTS_FILE) as f:
|
||||||
|
data = json.load(f)
|
||||||
|
weights = data.get("weights", DEFAULT_WEIGHTS)
|
||||||
|
threshold = data.get("threshold", DEFAULT_THRESHOLD)
|
||||||
|
return weights, threshold
|
||||||
|
except Exception as e:
|
||||||
|
print(f"⚠ Error loading weights: {e}, using defaults")
|
||||||
|
return DEFAULT_WEIGHTS, DEFAULT_THRESHOLD
|
||||||
|
|
||||||
|
|
||||||
|
def save_weights(weights, threshold):
|
||||||
|
"""Save scoring weights and threshold to file."""
|
||||||
|
data = {
|
||||||
|
"weights": weights,
|
||||||
|
"threshold": threshold,
|
||||||
|
"updated_at": datetime.now(timezone.utc).isoformat(),
|
||||||
|
"version": "1.0"
|
||||||
|
}
|
||||||
|
WEIGHTS_FILE.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
with open(WEIGHTS_FILE, 'w') as f:
|
||||||
|
json.dump(data, f, indent=2)
|
||||||
|
|
||||||
|
|
||||||
|
def log_score(draft_id, platform, scores, passed, reasons):
|
||||||
|
"""Log scoring results for analysis."""
|
||||||
|
log_entry = {
|
||||||
|
"timestamp": datetime.now(timezone.utc).isoformat(),
|
||||||
|
"draft_id": draft_id,
|
||||||
|
"platform": platform,
|
||||||
|
"scores": scores,
|
||||||
|
"total_score": sum(scores.values()),
|
||||||
|
"passed": passed,
|
||||||
|
"failure_reasons": reasons,
|
||||||
|
}
|
||||||
|
|
||||||
|
log_data = []
|
||||||
|
if LOG_FILE.exists():
|
||||||
|
try:
|
||||||
|
with open(LOG_FILE) as f:
|
||||||
|
log_data = json.load(f)
|
||||||
|
except Exception:
|
||||||
|
log_data = []
|
||||||
|
|
||||||
|
log_data.append(log_entry)
|
||||||
|
|
||||||
|
# Keep only last 1000 entries
|
||||||
|
if len(log_data) > 1000:
|
||||||
|
log_data = log_data[-1000:]
|
||||||
|
|
||||||
|
LOG_FILE.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
with open(LOG_FILE, 'w') as f:
|
||||||
|
json.dump(log_data, f, indent=2)
|
||||||
|
|
||||||
|
|
||||||
|
def score_voice_similarity(draft_text):
|
||||||
|
"""Score how well draft matches voice patterns (0-100)."""
|
||||||
|
score = 0
|
||||||
|
matches = {}
|
||||||
|
|
||||||
|
for pattern, weight, category in VOICE_MARKERS:
|
||||||
|
pattern_matches = re.findall(pattern, draft_text, re.IGNORECASE)
|
||||||
|
if pattern_matches:
|
||||||
|
match_count = len(pattern_matches)
|
||||||
|
category_score = min(weight * math.log(match_count + 1) * 10, weight * 25)
|
||||||
|
score += category_score
|
||||||
|
matches[category] = matches.get(category, 0) + match_count
|
||||||
|
|
||||||
|
# Bonus for short punchy sentences
|
||||||
|
sentences = re.split(r'[.!?]+', draft_text)
|
||||||
|
short_sentences = [s for s in sentences if len(s.split()) <= 15 and len(s.split()) >= 3]
|
||||||
|
sentence_ratio = len(short_sentences) / max(len(sentences), 1)
|
||||||
|
score += sentence_ratio * 15
|
||||||
|
|
||||||
|
return min(score, 100), matches
|
||||||
|
|
||||||
|
|
||||||
|
def score_specificity(draft_text):
|
||||||
|
"""Score specificity — real numbers, examples, named entities (0-100)."""
|
||||||
|
score = 0
|
||||||
|
|
||||||
|
number_patterns = [
|
||||||
|
r'\$[\d,]+[KkMmBb]?(?:\+)?',
|
||||||
|
r'\d+%',
|
||||||
|
r'\d+x',
|
||||||
|
r'\d+[\.,]?\d*\s*(?:hours?|minutes?|days?|weeks?|months?|years?)',
|
||||||
|
r'\d+\s*(?:pages?|pieces?|tools?|agents?|companies|founders?|members)',
|
||||||
|
]
|
||||||
|
|
||||||
|
total_numbers = 0
|
||||||
|
for pattern in number_patterns:
|
||||||
|
matches = re.findall(pattern, draft_text, re.IGNORECASE)
|
||||||
|
total_numbers += len(matches)
|
||||||
|
|
||||||
|
word_count = len(draft_text.split())
|
||||||
|
number_density = total_numbers / max(word_count / 50, 1)
|
||||||
|
score += min(number_density * 30, 50)
|
||||||
|
|
||||||
|
# Named entities and specific examples
|
||||||
|
entity_patterns = [
|
||||||
|
r'[A-Z][a-z]+ [A-Z][a-z]+(?:\s[A-Z][a-z]+)*',
|
||||||
|
r'@[A-Za-z0-9_]+',
|
||||||
|
r'(?:Apple|Google|Meta|Microsoft|Amazon|Tesla|ChatGPT|Claude|OpenAI)',
|
||||||
|
]
|
||||||
|
|
||||||
|
entity_count = 0
|
||||||
|
for pattern in entity_patterns:
|
||||||
|
matches = re.findall(pattern, draft_text)
|
||||||
|
entity_count += len(matches)
|
||||||
|
|
||||||
|
score += min(entity_count * 10, 30)
|
||||||
|
|
||||||
|
# Before/after comparisons
|
||||||
|
comparison_patterns = [
|
||||||
|
r'\d+.*→.*\d+',
|
||||||
|
r'from \d+.*to \d+',
|
||||||
|
r'before.*\d+.*after.*\d+',
|
||||||
|
r'used to.*now.*'
|
||||||
|
]
|
||||||
|
|
||||||
|
for pattern in comparison_patterns:
|
||||||
|
if re.search(pattern, draft_text, re.IGNORECASE):
|
||||||
|
score += 10
|
||||||
|
break
|
||||||
|
|
||||||
|
return min(score, 100)
|
||||||
|
|
||||||
|
|
||||||
|
def score_slop_penalty(draft_text):
|
||||||
|
"""Detect and penalize AI slop and banned phrases (0-100, higher = less slop)."""
|
||||||
|
score = 100
|
||||||
|
detected_issues = []
|
||||||
|
|
||||||
|
text_lower = draft_text.lower()
|
||||||
|
|
||||||
|
banned_found = []
|
||||||
|
for word in BANNED_WORDS:
|
||||||
|
if word.lower() in text_lower:
|
||||||
|
banned_found.append(word)
|
||||||
|
score -= 10
|
||||||
|
|
||||||
|
if banned_found:
|
||||||
|
detected_issues.append(f"Banned words: {', '.join(banned_found[:3])}")
|
||||||
|
|
||||||
|
ai_patterns_found = []
|
||||||
|
for pattern, pattern_name in AI_PATTERNS:
|
||||||
|
matches = re.findall(pattern, draft_text, re.IGNORECASE)
|
||||||
|
if matches:
|
||||||
|
ai_patterns_found.append(pattern_name)
|
||||||
|
score -= 8
|
||||||
|
|
||||||
|
if ai_patterns_found:
|
||||||
|
detected_issues.append(f"AI patterns: {', '.join(ai_patterns_found[:3])}")
|
||||||
|
|
||||||
|
# Em dash overuse
|
||||||
|
em_dash_count = draft_text.count('—')
|
||||||
|
word_count = len(draft_text.split())
|
||||||
|
if em_dash_count > word_count / 200:
|
||||||
|
score -= 5
|
||||||
|
detected_issues.append("Excessive em dash usage")
|
||||||
|
|
||||||
|
# Corporate speak
|
||||||
|
corporate_patterns = [
|
||||||
|
r'I\'m excited to share',
|
||||||
|
r'it is important to note',
|
||||||
|
r'in order to',
|
||||||
|
r'we are pleased to announce',
|
||||||
|
r'stay tuned for',
|
||||||
|
]
|
||||||
|
|
||||||
|
for pattern in corporate_patterns:
|
||||||
|
if re.search(pattern, draft_text, re.IGNORECASE):
|
||||||
|
score -= 15
|
||||||
|
detected_issues.append("Corporate speak detected")
|
||||||
|
break
|
||||||
|
|
||||||
|
return max(score, 0), detected_issues
|
||||||
|
|
||||||
|
|
||||||
|
def score_length_appropriateness(draft_text, platform):
|
||||||
|
"""Score if content length is appropriate for platform (0-100)."""
|
||||||
|
char_count = len(draft_text)
|
||||||
|
limits = PLATFORM_LIMITS.get(platform, PLATFORM_LIMITS["x"])
|
||||||
|
|
||||||
|
if char_count < limits["min"]:
|
||||||
|
shortfall_ratio = char_count / limits["min"]
|
||||||
|
return max(shortfall_ratio * 100, 20)
|
||||||
|
elif char_count > limits["max"]:
|
||||||
|
excess_ratio = limits["max"] / char_count
|
||||||
|
return max(excess_ratio * 100, 30)
|
||||||
|
elif limits["optimal_min"] <= char_count <= limits["optimal_max"]:
|
||||||
|
return 100
|
||||||
|
else:
|
||||||
|
return 85
|
||||||
|
|
||||||
|
|
||||||
|
def score_engagement_potential(draft_text, platform):
|
||||||
|
"""Score engagement potential based on CTAs and hooks (0-100)."""
|
||||||
|
score = 0
|
||||||
|
|
||||||
|
cta_patterns = {
|
||||||
|
"x": [r'What\'s your take\?', r'What did I miss\?', r'Reply with'],
|
||||||
|
"linkedin": [r'What would you do', r'What do you think', r'Drop .* below', r'curious.*your'],
|
||||||
|
"youtube_short": [r'Comment.*and I\'ll', r'Follow for more'],
|
||||||
|
"newsletter": [r'subscribe', r'read more', r'check it out'],
|
||||||
|
}
|
||||||
|
|
||||||
|
platform_ctas = cta_patterns.get(platform, cta_patterns["x"])
|
||||||
|
for pattern in platform_ctas:
|
||||||
|
if re.search(pattern, draft_text, re.IGNORECASE):
|
||||||
|
score += 25
|
||||||
|
break
|
||||||
|
|
||||||
|
# Strong hooks (first 100 characters)
|
||||||
|
hook = draft_text[:100]
|
||||||
|
hook_patterns = [
|
||||||
|
r'^\d+.*\.',
|
||||||
|
r'^Most people.*wrong',
|
||||||
|
r'^I (?:built|found|asked)',
|
||||||
|
r'^Harsh reality:',
|
||||||
|
r'^Here\'s what',
|
||||||
|
]
|
||||||
|
|
||||||
|
for pattern in hook_patterns:
|
||||||
|
if re.search(pattern, hook, re.IGNORECASE):
|
||||||
|
score += 25
|
||||||
|
break
|
||||||
|
|
||||||
|
# Question-based engagement
|
||||||
|
question_count = len(re.findall(r'\?', draft_text))
|
||||||
|
if question_count >= 1:
|
||||||
|
score += min(question_count * 15, 30)
|
||||||
|
|
||||||
|
# Debate invitation
|
||||||
|
debate_patterns = [
|
||||||
|
r'Agree or disagree',
|
||||||
|
r'What\'s your experience',
|
||||||
|
r'Change my mind',
|
||||||
|
]
|
||||||
|
|
||||||
|
for pattern in debate_patterns:
|
||||||
|
if re.search(pattern, draft_text, re.IGNORECASE):
|
||||||
|
score += 20
|
||||||
|
break
|
||||||
|
|
||||||
|
return min(score, 100)
|
||||||
|
|
||||||
|
|
||||||
|
def score_draft(draft, weights, threshold):
|
||||||
|
"""Score a single draft against all criteria."""
|
||||||
|
platform = draft.get("platform", "x")
|
||||||
|
draft_text = draft.get("draft", "")
|
||||||
|
|
||||||
|
voice_score, voice_matches = score_voice_similarity(draft_text)
|
||||||
|
specificity_score = score_specificity(draft_text)
|
||||||
|
slop_score, slop_issues = score_slop_penalty(draft_text)
|
||||||
|
length_score = score_length_appropriateness(draft_text, platform)
|
||||||
|
engagement_score = score_engagement_potential(draft_text, platform)
|
||||||
|
|
||||||
|
scores = {
|
||||||
|
"voice_similarity": voice_score,
|
||||||
|
"specificity": specificity_score,
|
||||||
|
"slop_penalty": slop_score,
|
||||||
|
"length_appropriateness": length_score,
|
||||||
|
"engagement_potential": engagement_score,
|
||||||
|
}
|
||||||
|
|
||||||
|
total_score = sum(scores[key] * weights[key] for key in scores.keys())
|
||||||
|
total_score = round(total_score, 1)
|
||||||
|
|
||||||
|
passed = total_score >= threshold
|
||||||
|
|
||||||
|
failure_reasons = []
|
||||||
|
if voice_score < 50:
|
||||||
|
failure_reasons.append("Low voice match - lacks brand voice patterns")
|
||||||
|
if specificity_score < 40:
|
||||||
|
failure_reasons.append("Not specific enough - needs real numbers/examples")
|
||||||
|
if slop_score < 70:
|
||||||
|
failure_reasons.append("Contains AI slop - " + "; ".join(slop_issues))
|
||||||
|
if length_score < 60:
|
||||||
|
failure_reasons.append(f"Length issue for {platform}")
|
||||||
|
if engagement_score < 40:
|
||||||
|
failure_reasons.append("Weak engagement - needs better CTA/hook")
|
||||||
|
|
||||||
|
result = {
|
||||||
|
"draft_id": draft.get("id"),
|
||||||
|
"platform": platform,
|
||||||
|
"total_score": total_score,
|
||||||
|
"scores": scores,
|
||||||
|
"passed": passed,
|
||||||
|
"failure_reasons": failure_reasons,
|
||||||
|
"voice_matches": voice_matches,
|
||||||
|
"slop_issues": slop_issues,
|
||||||
|
"char_count": len(draft_text),
|
||||||
|
"scored_at": datetime.now(timezone.utc).isoformat(),
|
||||||
|
}
|
||||||
|
|
||||||
|
log_score(draft.get("id"), platform, scores, passed, failure_reasons)
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
def score_drafts_file(file_path=None, output_path=None, threshold_override=None, verbose=False):
|
||||||
|
"""Score all drafts in a file."""
|
||||||
|
input_file = Path(file_path) if file_path else DRAFTS_FILE
|
||||||
|
|
||||||
|
if not input_file.exists():
|
||||||
|
print(f"❌ Input file not found: {input_file}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
with open(input_file) as f:
|
||||||
|
data = json.load(f)
|
||||||
|
|
||||||
|
drafts = data.get("drafts", [])
|
||||||
|
if not drafts:
|
||||||
|
print("❌ No drafts found in input file")
|
||||||
|
return None
|
||||||
|
|
||||||
|
weights, threshold = load_weights()
|
||||||
|
if threshold_override:
|
||||||
|
threshold = threshold_override
|
||||||
|
print(f"📊 Using threshold override: {threshold}")
|
||||||
|
|
||||||
|
print(f"📊 Scoring {len(drafts)} drafts with threshold {threshold}")
|
||||||
|
if verbose:
|
||||||
|
print(f"📊 Weights: {weights}")
|
||||||
|
|
||||||
|
results = []
|
||||||
|
passed_count = 0
|
||||||
|
|
||||||
|
for i, draft in enumerate(drafts):
|
||||||
|
result = score_draft(draft, weights, threshold)
|
||||||
|
results.append(result)
|
||||||
|
|
||||||
|
if result["passed"]:
|
||||||
|
passed_count += 1
|
||||||
|
|
||||||
|
if verbose:
|
||||||
|
print(f"\n[{i+1}/{len(drafts)}] {result['platform']} | Score: {result['total_score']}/100")
|
||||||
|
if result["passed"]:
|
||||||
|
print(f" ✅ PASS")
|
||||||
|
else:
|
||||||
|
print(f" ❌ FAIL: {'; '.join(result['failure_reasons'])}")
|
||||||
|
|
||||||
|
total_scores = [r["total_score"] for r in results]
|
||||||
|
avg_score = sum(total_scores) / len(total_scores)
|
||||||
|
pass_rate = (passed_count / len(results)) * 100
|
||||||
|
|
||||||
|
summary = {
|
||||||
|
"scored_at": datetime.now(timezone.utc).isoformat(),
|
||||||
|
"total_drafts": len(drafts),
|
||||||
|
"passed_count": passed_count,
|
||||||
|
"pass_rate": round(pass_rate, 1),
|
||||||
|
"average_score": round(avg_score, 1),
|
||||||
|
"threshold": threshold,
|
||||||
|
"weights": weights,
|
||||||
|
"results": results,
|
||||||
|
}
|
||||||
|
|
||||||
|
if output_path:
|
||||||
|
output_file = Path(output_path)
|
||||||
|
else:
|
||||||
|
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
|
||||||
|
output_file = DATA_DIR / f"quality-scores-{timestamp}.json"
|
||||||
|
|
||||||
|
output_file.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
with open(output_file, 'w') as f:
|
||||||
|
json.dump(summary, f, indent=2)
|
||||||
|
|
||||||
|
latest_file = DATA_DIR / "quality-scores-latest.json"
|
||||||
|
with open(latest_file, 'w') as f:
|
||||||
|
json.dump(summary, f, indent=2)
|
||||||
|
|
||||||
|
print(f"\n{'='*60}")
|
||||||
|
print(f"QUALITY SCORING RESULTS")
|
||||||
|
print(f"{'='*60}")
|
||||||
|
print(f"Total drafts: {len(drafts)}")
|
||||||
|
print(f"Passed: {passed_count} ({pass_rate:.1f}%)")
|
||||||
|
print(f"Failed: {len(drafts) - passed_count}")
|
||||||
|
print(f"Average score: {avg_score:.1f}/100")
|
||||||
|
print(f"Threshold: {threshold}/100")
|
||||||
|
print(f"\nSaved to: {output_file}")
|
||||||
|
print(f"Saved to: {latest_file}")
|
||||||
|
|
||||||
|
if verbose:
|
||||||
|
print(f"\n🏆 TOP SCORING DRAFTS:")
|
||||||
|
top_drafts = sorted(results, key=lambda x: x["total_score"], reverse=True)[:3]
|
||||||
|
for i, result in enumerate(top_drafts):
|
||||||
|
status = "✅ PASS" if result["passed"] else "❌ FAIL"
|
||||||
|
print(f" {i+1}. {result['platform']} | {result['total_score']}/100 | {status}")
|
||||||
|
|
||||||
|
return summary
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(description="Score content drafts for quality")
|
||||||
|
parser.add_argument("--input", type=str, help="Input drafts JSON file")
|
||||||
|
parser.add_argument("--output", type=str, help="Output scores JSON file")
|
||||||
|
parser.add_argument("--threshold", type=float, help="Scoring threshold override")
|
||||||
|
parser.add_argument("--verbose", "-v", action="store_true", help="Verbose output")
|
||||||
|
parser.add_argument("--init-weights", action="store_true", help="Initialize default weights file")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
if args.init_weights:
|
||||||
|
save_weights(DEFAULT_WEIGHTS, DEFAULT_THRESHOLD)
|
||||||
|
print(f"✅ Initialized weights file: {WEIGHTS_FILE}")
|
||||||
|
return
|
||||||
|
|
||||||
|
score_drafts_file(
|
||||||
|
file_path=args.input,
|
||||||
|
output_path=args.output,
|
||||||
|
threshold_override=args.threshold,
|
||||||
|
verbose=args.verbose
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
745
content-ops/scripts/content-transform.py
Normal file
745
content-ops/scripts/content-transform.py
Normal file
|
|
@ -0,0 +1,745 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Content Transform — Repurpose long-form content into platform-native drafts.
|
||||||
|
|
||||||
|
Reads content atoms, generates platform-native drafts using Claude API + optional
|
||||||
|
expert panel quality gate. Supports X threads/posts, LinkedIn, YouTube Shorts, and
|
||||||
|
newsletter formats.
|
||||||
|
|
||||||
|
LLM mode is DEFAULT. Use --template-only for fast template-based drafts (no API needed).
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python content-transform.py --atoms atoms.json --top-n 10
|
||||||
|
python content-transform.py --atoms atoms.json --template-only
|
||||||
|
python content-transform.py --atoms atoms.json --no-expert-panel
|
||||||
|
"""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import uuid
|
||||||
|
import argparse
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import sys
|
||||||
|
import textwrap
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
# ── Configuration ──
|
||||||
|
|
||||||
|
SCRIPT_DIR = Path(__file__).resolve().parent
|
||||||
|
PROJECT_DIR = SCRIPT_DIR.parent
|
||||||
|
DATA_DIR = Path(os.environ.get("CONTENT_OPS_DATA_DIR", PROJECT_DIR / "data"))
|
||||||
|
SKILL_DIR = PROJECT_DIR
|
||||||
|
|
||||||
|
ATOMS_FILE = DATA_DIR / "content-atoms-latest.json"
|
||||||
|
|
||||||
|
# Voice configuration files (optional, for LLM mode)
|
||||||
|
VOICE_CONFIG_FILE = os.environ.get("VOICE_CONFIG_FILE", str(PROJECT_DIR / "config" / "voice.md"))
|
||||||
|
STYLE_GUIDE_FILE = os.environ.get("STYLE_GUIDE_FILE", str(PROJECT_DIR / "config" / "style-guide.md"))
|
||||||
|
|
||||||
|
PLATFORM_MAP = {
|
||||||
|
"x": ["x_thread", "x_post"],
|
||||||
|
"linkedin": ["linkedin_post"],
|
||||||
|
"short_form": ["youtube_short_script"],
|
||||||
|
"newsletter": ["newsletter_section"],
|
||||||
|
"youtube_short": ["youtube_short_script"],
|
||||||
|
}
|
||||||
|
|
||||||
|
MISSING_TO_FORMAT = {
|
||||||
|
"x": "x_thread",
|
||||||
|
"linkedin": "linkedin_post",
|
||||||
|
"short_form": "youtube_short_script",
|
||||||
|
"newsletter": "newsletter_section",
|
||||||
|
"youtube_short": "youtube_short_script",
|
||||||
|
}
|
||||||
|
|
||||||
|
MISSING_TO_PLATFORM = {
|
||||||
|
"x": "x",
|
||||||
|
"linkedin": "linkedin",
|
||||||
|
"short_form": "youtube_short",
|
||||||
|
"newsletter": "newsletter",
|
||||||
|
"youtube_short": "youtube_short",
|
||||||
|
}
|
||||||
|
|
||||||
|
PLATFORM_TO_EXPERT = {
|
||||||
|
"x": "x-articles.md",
|
||||||
|
"linkedin": "linkedin.md",
|
||||||
|
"youtube_short": "youtube-shorts.md",
|
||||||
|
"newsletter": "newsletter.md",
|
||||||
|
}
|
||||||
|
|
||||||
|
EXPERT_PANEL_THRESHOLD = 95
|
||||||
|
EXPERT_PANEL_MAX_ITERATIONS = 3
|
||||||
|
|
||||||
|
|
||||||
|
def load_atoms(path=None):
|
||||||
|
p = Path(path) if path else ATOMS_FILE
|
||||||
|
with open(p) as f:
|
||||||
|
data = json.load(f)
|
||||||
|
return data.get("atoms", data) if isinstance(data, dict) else data
|
||||||
|
|
||||||
|
|
||||||
|
def rank_atoms(atoms, top_n=10):
|
||||||
|
"""Sort by repurpose_score * len(platforms_missing), take top N."""
|
||||||
|
for a in atoms:
|
||||||
|
a["_rank"] = a.get("repurpose_score", 0) * max(len(a.get("platforms_missing", [])), 1)
|
||||||
|
ranked = sorted(atoms, key=lambda x: x["_rank"], reverse=True)
|
||||||
|
return ranked[:top_n]
|
||||||
|
|
||||||
|
|
||||||
|
def clean_content(content):
|
||||||
|
content = re.sub(r'^[\w]+\s*·\s*@[\w]+\s*·.*$', '', content, flags=re.MULTILINE)
|
||||||
|
content = re.sub(r'\n{3,}', '\n\n', content)
|
||||||
|
return content.strip()
|
||||||
|
|
||||||
|
|
||||||
|
def extract_hook(content, max_chars=200):
|
||||||
|
content = clean_content(content)
|
||||||
|
for sep in [". ", ".\n", "\n"]:
|
||||||
|
idx = content.find(sep)
|
||||||
|
if 0 < idx < max_chars:
|
||||||
|
return content[:idx + 1].strip()
|
||||||
|
return content[:max_chars].strip()
|
||||||
|
|
||||||
|
|
||||||
|
def extract_key_points(content, max_points=6):
|
||||||
|
lines = content.split("\n")
|
||||||
|
points = []
|
||||||
|
for line in lines:
|
||||||
|
line = line.strip()
|
||||||
|
if not line:
|
||||||
|
continue
|
||||||
|
if line.startswith(("•", "-", "→", "*")) or re.match(r"^\d+[\.\)]", line):
|
||||||
|
cleaned = re.sub(r"^[•\-→\*\d+\.\)]+\s*", "", line).strip()
|
||||||
|
if len(cleaned) > 15:
|
||||||
|
points.append(cleaned)
|
||||||
|
elif len(line) > 20 and len(line) < 280:
|
||||||
|
points.append(line)
|
||||||
|
return points[:max_points] if points else [content[:200]]
|
||||||
|
|
||||||
|
|
||||||
|
def extract_numbers(content):
|
||||||
|
patterns = [
|
||||||
|
r'\$[\d,]+[KkMmBb]?(?:\+)?',
|
||||||
|
r'\d+%',
|
||||||
|
r'\d+x',
|
||||||
|
r'\d+[\.,]?\d*\s*(?:hours?|minutes?|days?|weeks?|months?|years?)',
|
||||||
|
r'\d+\s*(?:pages?|pieces?|tools?|agents?|companies|founders?|members)',
|
||||||
|
]
|
||||||
|
numbers = []
|
||||||
|
for p in patterns:
|
||||||
|
numbers.extend(re.findall(p, content, re.IGNORECASE))
|
||||||
|
return numbers[:5]
|
||||||
|
|
||||||
|
|
||||||
|
def shorten_sentence(s, max_words=15):
|
||||||
|
words = s.split()
|
||||||
|
if len(words) <= max_words:
|
||||||
|
return s
|
||||||
|
return " ".join(words[:max_words]) + "."
|
||||||
|
|
||||||
|
|
||||||
|
def make_punchy(text, max_words=15):
|
||||||
|
sentences = re.split(r'(?<=[.!?])\s+', text)
|
||||||
|
result = []
|
||||||
|
for s in sentences:
|
||||||
|
s = s.strip()
|
||||||
|
if not s:
|
||||||
|
continue
|
||||||
|
if len(s.split()) > max_words:
|
||||||
|
parts = re.split(r'[,;—]', s)
|
||||||
|
for p in parts:
|
||||||
|
p = p.strip()
|
||||||
|
if p:
|
||||||
|
result.append(shorten_sentence(p) if not p.endswith(('.', '!', '?')) else p)
|
||||||
|
else:
|
||||||
|
result.append(s)
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
# ── TEMPLATE GENERATORS (used with --template-only) ──
|
||||||
|
|
||||||
|
def generate_x_thread(atom):
|
||||||
|
content = clean_content(atom["content"])
|
||||||
|
hook_text = extract_hook(content, 200)
|
||||||
|
points = extract_key_points(content)
|
||||||
|
numbers = extract_numbers(content)
|
||||||
|
tags = atom.get("tags", [])
|
||||||
|
|
||||||
|
atom_type = atom.get("atom_type", "")
|
||||||
|
if "data" in atom_type or numbers:
|
||||||
|
tweet1 = f"{hook_text}\n\nThe numbers tell a different story. 🧵"
|
||||||
|
elif "story" in atom_type or "anecdote" in atom_type:
|
||||||
|
tweet1 = f"{hook_text}\n\nHere's what happened next. 🧵"
|
||||||
|
else:
|
||||||
|
tweet1 = f"Most people get this wrong about {tags[0] if tags else 'this'}.\n\n{hook_text}"
|
||||||
|
|
||||||
|
if len(tweet1) > 280:
|
||||||
|
tweet1 = tweet1[:277] + "..."
|
||||||
|
|
||||||
|
tweets = [tweet1]
|
||||||
|
for i, point in enumerate(points[:5]):
|
||||||
|
point_short = shorten_sentence(point, 15)
|
||||||
|
if numbers and i < len(numbers):
|
||||||
|
tweet = f"{point_short}\n\n{numbers[i]} — that's the real number."
|
||||||
|
else:
|
||||||
|
tweet = point_short
|
||||||
|
if len(tweet) > 280:
|
||||||
|
tweet = tweet[:277] + "..."
|
||||||
|
tweets.append(tweet)
|
||||||
|
|
||||||
|
ctas = [
|
||||||
|
"What's your take? Reply with what you'd add.",
|
||||||
|
"What did I miss? Drop your thoughts below.",
|
||||||
|
"Agree or disagree? I want to hear your take.",
|
||||||
|
]
|
||||||
|
tweets.append(ctas[hash(atom["id"]) % len(ctas)])
|
||||||
|
while len(tweets) < 5:
|
||||||
|
tweets.insert(-1, "The gap is only getting wider. Those who move now win.")
|
||||||
|
|
||||||
|
thread = "\n\n---\n\n".join([f"🧵 {i+1}/{len(tweets)}\n{t}" for i, t in enumerate(tweets)])
|
||||||
|
return thread, tweets[0]
|
||||||
|
|
||||||
|
|
||||||
|
def generate_x_post(atom):
|
||||||
|
content = clean_content(atom["content"])
|
||||||
|
hook = extract_hook(content, 180)
|
||||||
|
numbers = extract_numbers(content)
|
||||||
|
num_str = f"\n\n{numbers[0]}." if numbers else ""
|
||||||
|
post = f"{hook}{num_str}\n\nWhat's your take?"
|
||||||
|
if len(post) > 280:
|
||||||
|
post = post[:277] + "..."
|
||||||
|
return post, hook
|
||||||
|
|
||||||
|
|
||||||
|
def generate_linkedin_post(atom):
|
||||||
|
content = clean_content(atom["content"])
|
||||||
|
hook = extract_hook(content, 150)
|
||||||
|
points = extract_key_points(content)
|
||||||
|
numbers = extract_numbers(content)
|
||||||
|
|
||||||
|
hook_section = f"{hook}\n\nHere's what I learned."
|
||||||
|
punchy = make_punchy(content)
|
||||||
|
story = "\n\n".join(punchy[:6])
|
||||||
|
|
||||||
|
point_section = "\n".join([f"→ {p}" for p in points[:4]]) if len(points) > 2 else ""
|
||||||
|
data_section = f"\nThe data: {', '.join(numbers[:3])}." if numbers else ""
|
||||||
|
|
||||||
|
ctas = [
|
||||||
|
"What would you do differently?",
|
||||||
|
"What's your experience with this?",
|
||||||
|
"Curious — what's your take?",
|
||||||
|
]
|
||||||
|
cta = ctas[hash(atom["id"]) % len(ctas)]
|
||||||
|
|
||||||
|
parts = [hook_section, story]
|
||||||
|
if point_section:
|
||||||
|
parts.append(point_section)
|
||||||
|
if data_section:
|
||||||
|
parts.append(data_section)
|
||||||
|
parts.append(cta)
|
||||||
|
|
||||||
|
post = "\n\n".join(parts)
|
||||||
|
if len(post) > 1500:
|
||||||
|
post = post[:1497] + "..."
|
||||||
|
return post, hook
|
||||||
|
|
||||||
|
|
||||||
|
def generate_youtube_short(atom):
|
||||||
|
content = clean_content(atom["content"])
|
||||||
|
hook = extract_hook(content, 100)
|
||||||
|
points = extract_key_points(content)
|
||||||
|
numbers = extract_numbers(content)
|
||||||
|
tags = atom.get("tags", [])
|
||||||
|
topic = tags[0] if tags else "this"
|
||||||
|
|
||||||
|
hook_line = f"[HOOK] (0:00-0:03)\n[Look directly at camera, energy up]\n\"{hook}\""
|
||||||
|
setup_points = points[:2]
|
||||||
|
setup_text = " ".join([shorten_sentence(p, 12) for p in setup_points])
|
||||||
|
setup_line = f"[SETUP] (0:03-0:13)\n[Cut to B-roll or screen share]\n\"{setup_text}\""
|
||||||
|
payoff_points = points[2:5] if len(points) > 2 else points
|
||||||
|
payoff_items = "\n".join([f" → {shorten_sentence(p, 12)}" for p in payoff_points])
|
||||||
|
num_callout = f"\n[TEXT OVERLAY: {numbers[0]}]" if numbers else ""
|
||||||
|
payoff_line = f"[PAYOFF] (0:13-0:40)\n[Quick cuts between points]{num_callout}\n{payoff_items}"
|
||||||
|
cta_line = f"[CTA] (0:40-0:45)\n[Point at camera]\n\"Comment '{topic.upper()}' and I'll show you exactly how.\"\n[TEXT: Follow for more]"
|
||||||
|
|
||||||
|
script = f"{hook_line}\n\n{setup_line}\n\n{payoff_line}\n\n{cta_line}"
|
||||||
|
return script, hook
|
||||||
|
|
||||||
|
|
||||||
|
def generate_newsletter_section(atom):
|
||||||
|
content = clean_content(atom["content"])
|
||||||
|
hook = extract_hook(content, 150)
|
||||||
|
points = extract_key_points(content)
|
||||||
|
numbers = extract_numbers(content)
|
||||||
|
|
||||||
|
headline = f"**{hook}**"
|
||||||
|
punchy = make_punchy(content)
|
||||||
|
para1 = " ".join(punchy[:4])
|
||||||
|
para2 = " ".join(punchy[4:8]) if len(punchy) > 4 else ""
|
||||||
|
data = f"The numbers: {', '.join(numbers[:3])}." if numbers else ""
|
||||||
|
why = f"> **Why this matters:** {shorten_sentence(points[-1] if points else content[:100], 15)}"
|
||||||
|
|
||||||
|
parts = [headline, para1]
|
||||||
|
if para2:
|
||||||
|
parts.append(para2)
|
||||||
|
if data:
|
||||||
|
parts.append(data)
|
||||||
|
parts.append(why)
|
||||||
|
|
||||||
|
return "\n\n".join([p for p in parts if p.strip()]), hook
|
||||||
|
|
||||||
|
|
||||||
|
FORMAT_GENERATORS = {
|
||||||
|
"x_thread": generate_x_thread,
|
||||||
|
"x_post": generate_x_post,
|
||||||
|
"linkedin_post": generate_linkedin_post,
|
||||||
|
"youtube_short_script": generate_youtube_short,
|
||||||
|
"newsletter_section": generate_newsletter_section,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def estimate_engagement(atom, platform):
|
||||||
|
score = atom.get("repurpose_score", 5)
|
||||||
|
if score >= 8:
|
||||||
|
return "high"
|
||||||
|
elif score >= 5:
|
||||||
|
return "medium"
|
||||||
|
return "low"
|
||||||
|
|
||||||
|
|
||||||
|
def generate_drafts_for_atom(atom):
|
||||||
|
drafts = []
|
||||||
|
missing = atom.get("platforms_missing", [])
|
||||||
|
for platform_key in missing:
|
||||||
|
fmt = MISSING_TO_FORMAT.get(platform_key)
|
||||||
|
platform = MISSING_TO_PLATFORM.get(platform_key)
|
||||||
|
if not fmt or fmt not in FORMAT_GENERATORS:
|
||||||
|
continue
|
||||||
|
generator = FORMAT_GENERATORS[fmt]
|
||||||
|
draft_text, hook = generator(atom)
|
||||||
|
draft = {
|
||||||
|
"id": str(uuid.uuid4()),
|
||||||
|
"atom_id": atom["id"],
|
||||||
|
"atom_content": atom["content"][:500],
|
||||||
|
"atom_source": atom.get("source", "unknown"),
|
||||||
|
"platform": platform,
|
||||||
|
"format": fmt,
|
||||||
|
"draft": draft_text,
|
||||||
|
"hook": hook[:200],
|
||||||
|
"char_count": len(draft_text),
|
||||||
|
"estimated_engagement": estimate_engagement(atom, platform),
|
||||||
|
"created_at": datetime.now(timezone.utc).isoformat(),
|
||||||
|
"status": "draft",
|
||||||
|
"expert_score": None,
|
||||||
|
"iterations": 0,
|
||||||
|
"key_improvements": [],
|
||||||
|
}
|
||||||
|
drafts.append(draft)
|
||||||
|
return drafts
|
||||||
|
|
||||||
|
|
||||||
|
# ── ANTHROPIC API ──
|
||||||
|
|
||||||
|
def get_anthropic_key():
|
||||||
|
"""Get Anthropic API key from environment."""
|
||||||
|
key = os.environ.get("ANTHROPIC_API_KEY")
|
||||||
|
if key:
|
||||||
|
return key
|
||||||
|
print("ERROR: Set ANTHROPIC_API_KEY environment variable")
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def load_file_safe(path):
|
||||||
|
"""Load a text file, return empty string if missing."""
|
||||||
|
try:
|
||||||
|
return Path(path).read_text()
|
||||||
|
except Exception:
|
||||||
|
return ""
|
||||||
|
|
||||||
|
|
||||||
|
def load_expert_panel(platform):
|
||||||
|
"""Load expert panel for a platform."""
|
||||||
|
filename = PLATFORM_TO_EXPERT.get(platform, "x-articles.md")
|
||||||
|
return load_file_safe(SKILL_DIR / "experts" / filename)
|
||||||
|
|
||||||
|
|
||||||
|
def load_scoring_rubric():
|
||||||
|
"""Load content quality scoring rubric."""
|
||||||
|
return load_file_safe(SKILL_DIR / "scoring-rubrics" / "content-quality.md")
|
||||||
|
|
||||||
|
|
||||||
|
def load_voice_references():
|
||||||
|
"""Load voice/style references for content generation."""
|
||||||
|
voice_config = load_file_safe(VOICE_CONFIG_FILE)
|
||||||
|
style_guide = load_file_safe(STYLE_GUIDE_FILE)
|
||||||
|
return voice_config, style_guide
|
||||||
|
|
||||||
|
|
||||||
|
def call_anthropic(client, messages, system=None, model="claude-sonnet-4-20250514", max_tokens=2000):
|
||||||
|
"""Call Anthropic API."""
|
||||||
|
kwargs = {"model": model, "max_tokens": max_tokens, "messages": messages}
|
||||||
|
if system:
|
||||||
|
kwargs["system"] = system
|
||||||
|
|
||||||
|
response = client.messages.create(**kwargs)
|
||||||
|
return response.content[0].text.strip()
|
||||||
|
|
||||||
|
|
||||||
|
def llm_generate_draft(client, atom, platform, fmt, voice_config, style_guide):
|
||||||
|
"""Generate a draft using Claude API."""
|
||||||
|
platform_instructions = {
|
||||||
|
"x": "Write an X article (long-form X post). Include at least one ASCII diagram in a code block. Keep paragraphs to 1-3 sentences. End with a natural CTA.",
|
||||||
|
"linkedin": "Write a LinkedIn post. Hook must work before the 'see more' fold (first 2-3 lines). Use line breaks for readability. Professional but personal. 800-1500 chars.",
|
||||||
|
"youtube_short": "Write a YouTube Short script. Format: [HOOK] (0:00-0:03), [SETUP] (0:03-0:13), [PAYOFF] (0:13-0:40), [CTA] (0:40-0:45). Include visual directions. 30-60 seconds total.",
|
||||||
|
"newsletter": "Write a newsletter section. Subject line + scannable body. Headers, bullets, bold for skimmers. End with 'why this matters'.",
|
||||||
|
}
|
||||||
|
|
||||||
|
system_parts = ["You are a content writer creating platform-native content. Follow the configured voice and style EXACTLY."]
|
||||||
|
|
||||||
|
if voice_config:
|
||||||
|
system_parts.append(f"\nVOICE CONFIGURATION:\n{voice_config}")
|
||||||
|
if style_guide:
|
||||||
|
system_parts.append(f"\nSTYLE GUIDE:\n{style_guide[:2000]}")
|
||||||
|
|
||||||
|
system_parts.append("""
|
||||||
|
RULES:
|
||||||
|
- Short punchy sentences. Max 15 words.
|
||||||
|
- Specific numbers always. Never vague.
|
||||||
|
- Contrarian angles backed by data.
|
||||||
|
- No corporate speak. No "I'm excited to share."
|
||||||
|
- Personal stories and specific examples.
|
||||||
|
- Every sentence earns its place.""")
|
||||||
|
|
||||||
|
system = "\n".join(system_parts)
|
||||||
|
|
||||||
|
topic_tags = atom.get('tags', [])
|
||||||
|
prompt = f"""Create a {platform} draft from this content atom.
|
||||||
|
|
||||||
|
PLATFORM INSTRUCTIONS:
|
||||||
|
{platform_instructions.get(platform, platform_instructions['x'])}
|
||||||
|
|
||||||
|
SOURCE CONTENT:
|
||||||
|
{clean_content(atom['content'])}
|
||||||
|
|
||||||
|
SOURCE: {atom.get('source_title', 'unknown')}
|
||||||
|
TAGS: {', '.join(topic_tags)}
|
||||||
|
|
||||||
|
Write ONLY the draft content. No preamble, no explanation."""
|
||||||
|
|
||||||
|
return call_anthropic(client, [{"role": "user", "content": prompt}], system=system)
|
||||||
|
|
||||||
|
|
||||||
|
def expert_panel_score(client, draft_text, platform, expert_panel, rubric, voice_config):
|
||||||
|
"""Run expert panel scoring. Returns (score, feedback_dict)."""
|
||||||
|
system = f"""You are simulating 10 domain experts reviewing content for quality.
|
||||||
|
|
||||||
|
EXPERT PANEL:
|
||||||
|
{expert_panel}
|
||||||
|
|
||||||
|
SCORING RUBRIC:
|
||||||
|
{rubric}
|
||||||
|
|
||||||
|
VOICE REFERENCE:
|
||||||
|
{voice_config[:1000] if voice_config else 'No specific voice config provided.'}"""
|
||||||
|
|
||||||
|
prompt = f"""Score this {platform} draft. Each of 11 experts scores 0-100 on the rubric criteria.
|
||||||
|
Expert #11 is the AI Writing Detector (Humanizer) — scores how AI-generated the draft sounds.
|
||||||
|
|
||||||
|
BANNED AI VOCABULARY (flag any occurrence):
|
||||||
|
delve, tapestry, landscape (abstract), leverage, multifaceted, nuanced, pivotal, realm, robust, seamless, testament, transformative, underscore (verb), utilize, whilst, keen, embark, comprehensive, intricate, commendable, meticulous, paramount, groundbreaking, innovative, cutting-edge, synergy, holistic, paradigm, ecosystem, Additionally, crucial, enduring, enhance, fostering, garner, highlight (verb), interplay, intricacies, showcase, vibrant, valuable, profound, renowned, breathtaking, nestled, stunning
|
||||||
|
|
||||||
|
AI PATTERNS TO CHECK:
|
||||||
|
- Significance inflation ("pivotal moment", "is a testament", "stands as")
|
||||||
|
- Superficial -ing phrases ("highlighting", "showcasing", "underscoring")
|
||||||
|
- Promotional language ("boasts", "vibrant", "commitment to")
|
||||||
|
- Vague attributions ("Experts believe", "Industry reports")
|
||||||
|
- Formulaic "despite challenges... continues to" structures
|
||||||
|
- Copula avoidance ("serves as" instead of "is")
|
||||||
|
- Negative parallelisms ("It's not just X, it's Y")
|
||||||
|
- Rule-of-three forcing (triple adjectives/clauses)
|
||||||
|
- Em dash overuse (max 1 per 200 words)
|
||||||
|
- Filler phrases ("In order to", "It is important to note")
|
||||||
|
- Excessive hedging ("could potentially")
|
||||||
|
- Generic positive conclusions ("The future looks bright")
|
||||||
|
|
||||||
|
If the Humanizer expert scores below 70, the draft MUST be flagged for revision.
|
||||||
|
|
||||||
|
DRAFT:
|
||||||
|
{draft_text}
|
||||||
|
|
||||||
|
Respond in this EXACT JSON format (no other text):
|
||||||
|
{{
|
||||||
|
"average_score": <number>,
|
||||||
|
"expert_scores": [<11 numbers>],
|
||||||
|
"weaknesses": ["<specific weakness 1>", "<specific weakness 2>", ...],
|
||||||
|
"line_feedback": ["<specific line-by-line fix 1>", "<specific line-by-line fix 2>", ...],
|
||||||
|
"strengths": ["<strength 1>", "<strength 2>"],
|
||||||
|
"ai_patterns_detected": ["<pattern 1>", "<pattern 2>", ...],
|
||||||
|
"humanizer_score": <number>
|
||||||
|
}}
|
||||||
|
|
||||||
|
Be harsh. Score honestly."""
|
||||||
|
|
||||||
|
response = call_anthropic(client, [{"role": "user", "content": prompt}], system=system, max_tokens=1500)
|
||||||
|
|
||||||
|
try:
|
||||||
|
json_match = re.search(r'\{[\s\S]*\}', response)
|
||||||
|
if json_match:
|
||||||
|
result = json.loads(json_match.group())
|
||||||
|
return result.get("average_score", 0), result
|
||||||
|
else:
|
||||||
|
return 0, {"error": "No JSON in response"}
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
return 0, {"error": "Invalid JSON", "raw": response[:500]}
|
||||||
|
|
||||||
|
|
||||||
|
def expert_panel_revise(client, draft_text, platform, feedback, voice_config, style_guide):
|
||||||
|
"""Revise draft based on expert feedback."""
|
||||||
|
system_parts = ["You are revising content based on expert feedback."]
|
||||||
|
if voice_config:
|
||||||
|
system_parts.append(f"\nVOICE CONFIGURATION:\n{voice_config}")
|
||||||
|
system_parts.append("""
|
||||||
|
RULES:
|
||||||
|
- Fix every weakness identified
|
||||||
|
- Keep all strengths
|
||||||
|
- Maintain configured voice exactly
|
||||||
|
- Short punchy sentences, specific numbers, contrarian angles""")
|
||||||
|
|
||||||
|
system = "\n".join(system_parts)
|
||||||
|
|
||||||
|
weaknesses = feedback.get("weaknesses", [])
|
||||||
|
line_fixes = feedback.get("line_feedback", [])
|
||||||
|
ai_patterns = feedback.get("ai_patterns_detected", [])
|
||||||
|
|
||||||
|
ai_section = ""
|
||||||
|
if ai_patterns:
|
||||||
|
ai_section = f"""
|
||||||
|
AI PATTERNS DETECTED (MUST FIX ALL):
|
||||||
|
{chr(10).join(f'- {p}' for p in ai_patterns)}
|
||||||
|
|
||||||
|
BANNED VOCABULARY (replace every occurrence):
|
||||||
|
delve, tapestry, landscape (abstract), leverage, multifaceted, nuanced, pivotal, realm, robust, seamless, testament, transformative, underscore (verb), utilize, whilst, keen, embark, comprehensive, intricate, commendable, meticulous, paramount, groundbreaking, innovative, cutting-edge, synergy, holistic, paradigm, ecosystem, Additionally, crucial, enduring, enhance, fostering, garner, highlight (verb), interplay, intricacies, showcase, vibrant, valuable, profound, renowned, breathtaking, nestled, stunning
|
||||||
|
"""
|
||||||
|
|
||||||
|
prompt = f"""Revise this {platform} draft based on expert feedback.
|
||||||
|
|
||||||
|
CURRENT DRAFT:
|
||||||
|
{draft_text}
|
||||||
|
|
||||||
|
WEAKNESSES TO FIX:
|
||||||
|
{chr(10).join(f'- {w}' for w in weaknesses)}
|
||||||
|
|
||||||
|
SPECIFIC LINE FIXES:
|
||||||
|
{chr(10).join(f'- {f}' for f in line_fixes)}
|
||||||
|
{ai_section}
|
||||||
|
CURRENT SCORE: {feedback.get('average_score', 'unknown')}
|
||||||
|
TARGET SCORE: {EXPERT_PANEL_THRESHOLD}+
|
||||||
|
|
||||||
|
Write ONLY the revised draft. No preamble."""
|
||||||
|
|
||||||
|
return call_anthropic(client, [{"role": "user", "content": prompt}], system=system)
|
||||||
|
|
||||||
|
|
||||||
|
def process_draft_with_expert_panel(client, atom, platform, fmt, voice_config, style_guide):
|
||||||
|
"""Full expert panel pipeline: generate → score → revise loop."""
|
||||||
|
expert_panel = load_expert_panel(platform)
|
||||||
|
rubric = load_scoring_rubric()
|
||||||
|
|
||||||
|
print(f" Generating {platform} draft via Claude...")
|
||||||
|
draft_text = llm_generate_draft(client, atom, platform, fmt, voice_config, style_guide)
|
||||||
|
|
||||||
|
iterations = []
|
||||||
|
best_draft = draft_text
|
||||||
|
best_score = 0
|
||||||
|
|
||||||
|
for iteration in range(1, EXPERT_PANEL_MAX_ITERATIONS + 1):
|
||||||
|
print(f" Expert panel scoring (iteration {iteration})...")
|
||||||
|
score, feedback = expert_panel_score(client, draft_text, platform, expert_panel, rubric, voice_config)
|
||||||
|
print(f" Score: {score}/100")
|
||||||
|
|
||||||
|
iteration_log = {
|
||||||
|
"iteration": iteration,
|
||||||
|
"score": score,
|
||||||
|
"weaknesses": feedback.get("weaknesses", []),
|
||||||
|
"line_feedback": feedback.get("line_feedback", []),
|
||||||
|
"strengths": feedback.get("strengths", []),
|
||||||
|
}
|
||||||
|
iterations.append(iteration_log)
|
||||||
|
|
||||||
|
if score > best_score:
|
||||||
|
best_score = score
|
||||||
|
best_draft = draft_text
|
||||||
|
|
||||||
|
if score >= EXPERT_PANEL_THRESHOLD:
|
||||||
|
print(f" ✓ Passed threshold ({score} >= {EXPERT_PANEL_THRESHOLD})")
|
||||||
|
break
|
||||||
|
|
||||||
|
if iteration < EXPERT_PANEL_MAX_ITERATIONS:
|
||||||
|
print(f" Revising based on feedback...")
|
||||||
|
draft_text = expert_panel_revise(client, draft_text, platform, feedback, voice_config, style_guide)
|
||||||
|
|
||||||
|
key_improvements = []
|
||||||
|
for it in iterations:
|
||||||
|
for w in it.get("weaknesses", []):
|
||||||
|
key_improvements.append(f"Iter {it['iteration']}: Fixed — {w}")
|
||||||
|
|
||||||
|
return best_draft, best_score, len(iterations), key_improvements, iterations
|
||||||
|
|
||||||
|
|
||||||
|
def rewrite_with_llm(drafts, use_expert_panel=False, expert_panel_top_n=10):
|
||||||
|
"""Rewrite drafts using Claude API, optionally with expert panel."""
|
||||||
|
try:
|
||||||
|
import anthropic
|
||||||
|
except ImportError:
|
||||||
|
print("ERROR: anthropic package not installed. Run: pip install anthropic")
|
||||||
|
return drafts
|
||||||
|
|
||||||
|
api_key = get_anthropic_key()
|
||||||
|
if not api_key:
|
||||||
|
return drafts
|
||||||
|
|
||||||
|
client = anthropic.Anthropic(api_key=api_key)
|
||||||
|
voice_config, style_guide = load_voice_references()
|
||||||
|
|
||||||
|
rewritten = []
|
||||||
|
for i, draft in enumerate(drafts):
|
||||||
|
atom = {"content": draft["atom_content"], "source_title": draft.get("atom_source", ""),
|
||||||
|
"tags": [], "atom_type": ""}
|
||||||
|
|
||||||
|
if use_expert_panel and i < expert_panel_top_n:
|
||||||
|
print(f"\n [{i+1}/{len(drafts)}] Expert panel: {draft['format']} (atom {draft['atom_id'][:8]})")
|
||||||
|
try:
|
||||||
|
import time as _time
|
||||||
|
_start = _time.time()
|
||||||
|
new_text, score, iters, improvements, iter_log = process_draft_with_expert_panel(
|
||||||
|
client, atom, draft["platform"], draft["format"],
|
||||||
|
voice_config, style_guide
|
||||||
|
)
|
||||||
|
_elapsed = _time.time() - _start
|
||||||
|
draft["draft"] = new_text
|
||||||
|
draft["hook"] = extract_hook(new_text, 200)
|
||||||
|
draft["char_count"] = len(new_text)
|
||||||
|
draft["expert_score"] = score
|
||||||
|
draft["iterations"] = iters
|
||||||
|
draft["key_improvements"] = improvements
|
||||||
|
draft["iteration_log"] = iter_log
|
||||||
|
status = "✓" if score >= EXPERT_PANEL_THRESHOLD else f"⚠ ({score})"
|
||||||
|
print(f" {status} Final: {score}/100 after {iters} iteration(s) [{_elapsed:.1f}s]")
|
||||||
|
except Exception as e:
|
||||||
|
print(f" ✗ Expert panel failed ({type(e).__name__}): {e}")
|
||||||
|
try:
|
||||||
|
new_text = llm_generate_draft(client, atom, draft["platform"], draft["format"],
|
||||||
|
voice_config, style_guide)
|
||||||
|
draft["draft"] = new_text
|
||||||
|
draft["hook"] = extract_hook(new_text, 200)
|
||||||
|
draft["char_count"] = len(new_text)
|
||||||
|
print(f" ↳ Fell back to simple LLM rewrite")
|
||||||
|
except Exception as e2:
|
||||||
|
print(f" ✗ LLM rewrite also failed: {e2}")
|
||||||
|
else:
|
||||||
|
print(f"\n [{i+1}/{len(drafts)}] LLM rewrite: {draft['format']} (atom {draft['atom_id'][:8]})")
|
||||||
|
try:
|
||||||
|
new_text = llm_generate_draft(client, atom, draft["platform"], draft["format"],
|
||||||
|
voice_config, style_guide)
|
||||||
|
draft["draft"] = new_text
|
||||||
|
draft["hook"] = extract_hook(new_text, 200)
|
||||||
|
draft["char_count"] = len(new_text)
|
||||||
|
print(f" ✓ Rewrote")
|
||||||
|
except Exception as e:
|
||||||
|
print(f" ✗ LLM rewrite failed: {e}")
|
||||||
|
|
||||||
|
rewritten.append(draft)
|
||||||
|
|
||||||
|
return rewritten
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(description="Transform content atoms into platform-native drafts")
|
||||||
|
parser.add_argument("--atoms", type=str, help="Path to atoms JSON file")
|
||||||
|
parser.add_argument("--top-n", type=int, default=10, help="Number of top atoms to process")
|
||||||
|
parser.add_argument("--template-only", action="store_true", help="Use template-based generation (no LLM)")
|
||||||
|
parser.add_argument("--no-expert-panel", action="store_true", help="Disable expert panel quality gate")
|
||||||
|
parser.add_argument("--expert-panel-top-n", type=int, default=10, help="Apply expert panel to top N drafts")
|
||||||
|
parser.add_argument("--output", type=str, help="Output file path")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
use_llm = not args.template_only
|
||||||
|
use_expert_panel = use_llm and not args.no_expert_panel
|
||||||
|
|
||||||
|
atoms = load_atoms(args.atoms)
|
||||||
|
print(f"Loaded {len(atoms)} atoms")
|
||||||
|
|
||||||
|
top_atoms = rank_atoms(atoms, args.top_n)
|
||||||
|
print(f"Selected top {len(top_atoms)} atoms by repurpose_score × missing platforms")
|
||||||
|
|
||||||
|
all_drafts = []
|
||||||
|
for atom in top_atoms:
|
||||||
|
drafts = generate_drafts_for_atom(atom)
|
||||||
|
all_drafts.extend(drafts)
|
||||||
|
missing = atom.get("platforms_missing", [])
|
||||||
|
print(f" Atom {atom['id'][:8]}: {len(drafts)} drafts ({', '.join(missing)})")
|
||||||
|
|
||||||
|
print(f"\nGenerated {len(all_drafts)} total drafts")
|
||||||
|
|
||||||
|
if use_llm:
|
||||||
|
mode = "LLM + Expert Panel" if use_expert_panel else "LLM only"
|
||||||
|
print(f"\n{'='*60}")
|
||||||
|
print(f"Rewriting with {mode}...")
|
||||||
|
print(f"{'='*60}")
|
||||||
|
all_drafts = rewrite_with_llm(all_drafts, use_expert_panel=use_expert_panel,
|
||||||
|
expert_panel_top_n=args.expert_panel_top_n)
|
||||||
|
|
||||||
|
by_platform = {}
|
||||||
|
for d in all_drafts:
|
||||||
|
by_platform[d["platform"]] = by_platform.get(d["platform"], 0) + 1
|
||||||
|
print(f"\n{'='*60}")
|
||||||
|
print("Drafts by platform:")
|
||||||
|
for p, c in sorted(by_platform.items()):
|
||||||
|
print(f" {p}: {c}")
|
||||||
|
|
||||||
|
scored = [d for d in all_drafts if d.get("expert_score")]
|
||||||
|
if scored:
|
||||||
|
avg = sum(d["expert_score"] for d in scored) / len(scored)
|
||||||
|
passed = sum(1 for d in scored if d["expert_score"] >= EXPERT_PANEL_THRESHOLD)
|
||||||
|
print(f"\nExpert panel: {len(scored)} scored, {passed} passed (≥{EXPERT_PANEL_THRESHOLD}), avg {avg:.1f}")
|
||||||
|
|
||||||
|
today = datetime.now().strftime("%Y-%m-%d")
|
||||||
|
output_path = Path(args.output) if args.output else DATA_DIR / f"content-drafts-{today}.json"
|
||||||
|
latest_path = DATA_DIR / "content-drafts-latest.json"
|
||||||
|
|
||||||
|
output = {
|
||||||
|
"generated_at": datetime.now(timezone.utc).isoformat(),
|
||||||
|
"atom_count": len(top_atoms),
|
||||||
|
"draft_count": len(all_drafts),
|
||||||
|
"used_llm": use_llm,
|
||||||
|
"used_expert_panel": use_expert_panel,
|
||||||
|
"expert_panel_threshold": EXPERT_PANEL_THRESHOLD if use_expert_panel else None,
|
||||||
|
"drafts": all_drafts,
|
||||||
|
}
|
||||||
|
|
||||||
|
output_path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
with open(output_path, "w") as f:
|
||||||
|
json.dump(output, f, indent=2)
|
||||||
|
with open(latest_path, "w") as f:
|
||||||
|
json.dump(output, f, indent=2)
|
||||||
|
|
||||||
|
print(f"\nSaved to {output_path}")
|
||||||
|
print(f"Saved to {latest_path}")
|
||||||
|
|
||||||
|
if scored:
|
||||||
|
print(f"\n{'='*60}")
|
||||||
|
print("TOP DRAFTS BY SCORE:")
|
||||||
|
print(f"{'='*60}")
|
||||||
|
for d in sorted(scored, key=lambda x: x["expert_score"], reverse=True)[:5]:
|
||||||
|
print(f"\n[{d['platform'].upper()}] Score: {d['expert_score']}/100 | Iterations: {d['iterations']}")
|
||||||
|
print(f"Hook: {d['hook'][:100]}...")
|
||||||
|
if d.get("key_improvements"):
|
||||||
|
print(f"Key improvements: {d['key_improvements'][0]}")
|
||||||
|
print(f"---")
|
||||||
|
print(d["draft"][:300])
|
||||||
|
print("...\n")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
499
content-ops/scripts/editorial-brain.py
Normal file
499
content-ops/scripts/editorial-brain.py
Normal file
|
|
@ -0,0 +1,499 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Editorial Brain — Top-down clip discovery using LLM analysis.
|
||||||
|
|
||||||
|
Instead of bottom-up keyword matching, this gives the full transcript to an LLM
|
||||||
|
and asks it to find the best clip-worthy moments like a human editor would.
|
||||||
|
|
||||||
|
Two-pass approach:
|
||||||
|
1. Sonnet scans transcript chunks cheaply, finds candidate moments
|
||||||
|
2. Sonnet scores candidates on hook/build/payoff/clean-cut (0-100)
|
||||||
|
3. Only 90+ clips get cut
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python editorial-brain.py --url "https://youtube.com/watch?v=..." [--max-clips 5]
|
||||||
|
python editorial-brain.py --vtt /path/to/file.vtt --video-id ID [--max-clips 5]
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import subprocess
|
||||||
|
import sys
|
||||||
|
import urllib.request
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
# ── Configuration ──
|
||||||
|
|
||||||
|
ANTHROPIC_API_KEY = os.environ.get('ANTHROPIC_API_KEY', '')
|
||||||
|
|
||||||
|
SCRIPT_DIR = Path(__file__).resolve().parent
|
||||||
|
PROJECT_DIR = SCRIPT_DIR.parent
|
||||||
|
DATA_DIR = Path(os.environ.get("CONTENT_OPS_DATA_DIR", PROJECT_DIR / "data"))
|
||||||
|
CLIPS_DIR = DATA_DIR / "clips"
|
||||||
|
|
||||||
|
# Model configuration
|
||||||
|
DEFAULT_MODEL = os.environ.get("EDITORIAL_BRAIN_MODEL", "claude-sonnet-4-20250514")
|
||||||
|
|
||||||
|
|
||||||
|
def call_claude(prompt, model=None, max_tokens=4000):
|
||||||
|
"""Call Claude API."""
|
||||||
|
model = model or DEFAULT_MODEL
|
||||||
|
data = json.dumps({
|
||||||
|
"model": model,
|
||||||
|
"max_tokens": max_tokens,
|
||||||
|
"messages": [{"role": "user", "content": prompt}]
|
||||||
|
}).encode()
|
||||||
|
|
||||||
|
req = urllib.request.Request(
|
||||||
|
"https://api.anthropic.com/v1/messages",
|
||||||
|
data=data,
|
||||||
|
headers={
|
||||||
|
"Content-Type": "application/json",
|
||||||
|
"x-api-key": ANTHROPIC_API_KEY,
|
||||||
|
"anthropic-version": "2023-06-01"
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
with urllib.request.urlopen(req, timeout=120) as resp:
|
||||||
|
result = json.loads(resp.read())
|
||||||
|
return result['content'][0]['text']
|
||||||
|
|
||||||
|
|
||||||
|
def download_vtt(url):
|
||||||
|
"""Download VTT subtitles from YouTube."""
|
||||||
|
video_id = re.search(r'(?:v=|/)([a-zA-Z0-9_-]{11})', url).group(1)
|
||||||
|
vtt_path = f"/tmp/editorial_{video_id}.en.vtt"
|
||||||
|
|
||||||
|
if os.path.exists(vtt_path):
|
||||||
|
return vtt_path, video_id
|
||||||
|
|
||||||
|
subprocess.run([
|
||||||
|
'yt-dlp', '--write-auto-subs', '--sub-lang', 'en', '--sub-format', 'vtt',
|
||||||
|
'--skip-download', '--output', f'/tmp/editorial_{video_id}.%(ext)s', url
|
||||||
|
], capture_output=True, check=True)
|
||||||
|
|
||||||
|
return vtt_path, video_id
|
||||||
|
|
||||||
|
|
||||||
|
def parse_vtt(vtt_path):
|
||||||
|
"""Parse YouTube auto-caption VTT into clean, deduplicated transcript.
|
||||||
|
|
||||||
|
YouTube auto-captions use a scrolling format where each block contains
|
||||||
|
the previous line + new text. We filter out repeat blocks (< 20ms duration)
|
||||||
|
and strip overlapping prefixes to get clean text.
|
||||||
|
"""
|
||||||
|
content = open(vtt_path).read()
|
||||||
|
blocks = content.split('\n\n')
|
||||||
|
segments = []
|
||||||
|
prev_clean = ''
|
||||||
|
|
||||||
|
for block in blocks:
|
||||||
|
lines = block.strip().split('\n')
|
||||||
|
if not lines:
|
||||||
|
continue
|
||||||
|
ts = re.match(r'(\d{2}:\d{2}:\d{2}\.\d{3})\s*-->\s*(\d{2}:\d{2}:\d{2}\.\d{3})', lines[0])
|
||||||
|
if not ts:
|
||||||
|
continue
|
||||||
|
|
||||||
|
p1 = ts.group(1).split(':')
|
||||||
|
p2 = ts.group(2).split(':')
|
||||||
|
s1 = int(p1[0]) * 3600 + int(p1[1]) * 60 + float(p1[2])
|
||||||
|
s2 = int(p2[0]) * 3600 + int(p2[1]) * 60 + float(p2[2])
|
||||||
|
|
||||||
|
if s2 - s1 < 0.02:
|
||||||
|
continue
|
||||||
|
|
||||||
|
raw_text = '\n'.join(lines[1:])
|
||||||
|
clean = re.sub(r'<[^>]+>', '', raw_text).strip()
|
||||||
|
clean = re.sub(r'\s+', ' ', clean)
|
||||||
|
|
||||||
|
if not clean or clean == prev_clean:
|
||||||
|
continue
|
||||||
|
|
||||||
|
new_text = clean
|
||||||
|
if prev_clean:
|
||||||
|
for overlap_len in range(min(len(prev_clean), len(clean)), 0, -1):
|
||||||
|
if clean[:overlap_len] == prev_clean[-overlap_len:]:
|
||||||
|
new_text = clean[overlap_len:].strip()
|
||||||
|
break
|
||||||
|
|
||||||
|
if new_text:
|
||||||
|
segments.append({'start': s1, 'end': s2, 'text': new_text})
|
||||||
|
prev_clean = clean
|
||||||
|
|
||||||
|
return segments
|
||||||
|
|
||||||
|
|
||||||
|
def build_readable_transcript(segments):
|
||||||
|
"""Build a human-readable transcript with timestamps every ~30s."""
|
||||||
|
output = ''
|
||||||
|
last_ts = -30
|
||||||
|
for seg in segments:
|
||||||
|
if seg['start'] - last_ts >= 30:
|
||||||
|
m, s = divmod(int(seg['start']), 60)
|
||||||
|
output += f'\n\n[{m}:{s:02d}] '
|
||||||
|
last_ts = seg['start']
|
||||||
|
output += seg['text'] + ' '
|
||||||
|
return output
|
||||||
|
|
||||||
|
|
||||||
|
def chunk_transcript(transcript_text, chunk_size=12000):
|
||||||
|
"""Split transcript into chunks at timestamp boundaries."""
|
||||||
|
chunks = []
|
||||||
|
remaining = transcript_text
|
||||||
|
|
||||||
|
while remaining:
|
||||||
|
if len(remaining) <= chunk_size:
|
||||||
|
chunks.append(remaining)
|
||||||
|
break
|
||||||
|
break_at = remaining.rfind('\n\n[', 0, chunk_size)
|
||||||
|
if break_at < chunk_size * 0.3:
|
||||||
|
break_at = chunk_size
|
||||||
|
chunks.append(remaining[:break_at])
|
||||||
|
remaining = remaining[break_at:]
|
||||||
|
|
||||||
|
return chunks
|
||||||
|
|
||||||
|
|
||||||
|
def find_moments_full_transcript(full_transcript, video_title=""):
|
||||||
|
"""Analyze the ENTIRE transcript in one call."""
|
||||||
|
prompt = f"""You are a legendary short-form video editor (think: the team behind Hormozi's clips, Chris Williamson's best moments).
|
||||||
|
|
||||||
|
Read this FULL transcript of "{video_title}" and find the 3-5 BEST moments that could become viral 30-60 second clips.
|
||||||
|
|
||||||
|
CRITICAL RULES:
|
||||||
|
- ONLY identify moments that ACTUALLY EXIST in the transcript below
|
||||||
|
- Quote the EXACT words from the transcript — do not paraphrase or invent
|
||||||
|
- Each moment must have a clear HOOK → BUILD → PAYOFF arc
|
||||||
|
- A stranger scrolling at 2am should stop, watch the whole clip, and feel smarter
|
||||||
|
|
||||||
|
What makes a 90+ clip:
|
||||||
|
- HOOK (0-3s): Pattern interrupt — shocking stat, bold claim, provocative question
|
||||||
|
- BUILD (3-30s): Stakes rise — story tension, framework develops, insight escalates
|
||||||
|
- PAYOFF (last 5-10s): The insight LANDS — counterintuitive truth, surprising number, emotional resolution
|
||||||
|
- CLEAN END: Cut immediately after the payoff. Silence > trailing off.
|
||||||
|
|
||||||
|
FULL TRANSCRIPT:
|
||||||
|
{full_transcript}
|
||||||
|
|
||||||
|
Return a JSON array of the best moments (3-5 max). For each:
|
||||||
|
{{
|
||||||
|
"start_timestamp": "[M:SS] exact timestamp from transcript",
|
||||||
|
"end_timestamp": "[M:SS] where to cut",
|
||||||
|
"hook_quote": "EXACT opening words from transcript",
|
||||||
|
"payoff_quote": "EXACT closing words/punchline from transcript",
|
||||||
|
"why_viral": "One sentence on why this stops scrolls",
|
||||||
|
"estimated_score": 0-100,
|
||||||
|
"narrative_arc": "Hook: ... → Build: ... → Payoff: ..."
|
||||||
|
}}
|
||||||
|
|
||||||
|
Be EXTREMELY selective. If nothing scores above 70, return fewer moments or an empty array. Quality > quantity."""
|
||||||
|
|
||||||
|
try:
|
||||||
|
response = call_claude(prompt, max_tokens=3000)
|
||||||
|
json_match = re.search(r'\[[\s\S]*\]', response)
|
||||||
|
if json_match:
|
||||||
|
moments = json.loads(json_match.group())
|
||||||
|
for m in moments:
|
||||||
|
m['hook'] = m.get('hook_quote', m.get('hook', ''))
|
||||||
|
m['payoff'] = m.get('payoff_quote', m.get('payoff', ''))
|
||||||
|
m['suggested_clip_text'] = m.get('narrative_arc', '')
|
||||||
|
return moments
|
||||||
|
return []
|
||||||
|
except Exception as e:
|
||||||
|
print(f" ⚠️ Full transcript analysis failed: {e}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
|
||||||
|
def find_moments_in_chunk(chunk_text, chunk_idx, video_title=""):
|
||||||
|
"""Ask LLM to find clip-worthy moments in a transcript chunk."""
|
||||||
|
prompt = f"""You are a legendary short-form video editor.
|
||||||
|
|
||||||
|
Analyze this transcript section from "{video_title}" and find ANY moments that could become a viral 30-60 second clip.
|
||||||
|
|
||||||
|
A great clip moment has:
|
||||||
|
- A clear HOOK (bold claim, shocking stat, provocative question, emotional statement)
|
||||||
|
- A STORY ARC or BUILD (tension rises, framework develops, stakes increase)
|
||||||
|
- A PAYOFF (insight lands, number drops, counterintuitive truth revealed, punchline hits)
|
||||||
|
- Works STANDALONE — a stranger with zero context would stop scrolling and watch
|
||||||
|
|
||||||
|
TRANSCRIPT SECTION:
|
||||||
|
{chunk_text}
|
||||||
|
|
||||||
|
Return a JSON array of moments found. If no moments qualify, return an empty array.
|
||||||
|
For each moment:
|
||||||
|
{{
|
||||||
|
"start_timestamp": "[M:SS] from the transcript",
|
||||||
|
"end_timestamp": "[M:SS] approximate end",
|
||||||
|
"hook": "The opening line/moment that grabs attention",
|
||||||
|
"payoff": "How this moment resolves/lands",
|
||||||
|
"why_viral": "One sentence on why this would stop a scroll",
|
||||||
|
"estimated_score": 0-100,
|
||||||
|
"suggested_clip_text": "The key 2-3 sentences a viewer would remember"
|
||||||
|
}}
|
||||||
|
|
||||||
|
Be SELECTIVE. Most transcript sections have 0-1 clip-worthy moments. Only include moments you'd bet could score 70+."""
|
||||||
|
|
||||||
|
try:
|
||||||
|
response = call_claude(prompt, max_tokens=2000)
|
||||||
|
json_match = re.search(r'\[[\s\S]*\]', response)
|
||||||
|
if json_match:
|
||||||
|
return json.loads(json_match.group())
|
||||||
|
return []
|
||||||
|
except Exception as e:
|
||||||
|
print(f" ⚠️ Chunk {chunk_idx} failed: {e}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
|
||||||
|
def score_and_refine_moment(moment, full_transcript_context, video_title=""):
|
||||||
|
"""Deep-score a candidate moment and suggest exact trim points."""
|
||||||
|
prompt = f"""You are scoring a potential short-form clip from "{video_title}".
|
||||||
|
|
||||||
|
CANDIDATE MOMENT:
|
||||||
|
Hook: {moment.get('hook', 'N/A')}
|
||||||
|
Payoff: {moment.get('payoff', 'N/A')}
|
||||||
|
Why viral: {moment.get('why_viral', 'N/A')}
|
||||||
|
Key text: {moment.get('suggested_clip_text', 'N/A')}
|
||||||
|
|
||||||
|
SURROUNDING TRANSCRIPT (for context):
|
||||||
|
{full_transcript_context}
|
||||||
|
|
||||||
|
Score this clip candidate on a 0-100 scale:
|
||||||
|
- HOOK (0-25): Does the first sentence stop the scroll?
|
||||||
|
- BUILD (0-25): Does tension/interest rise through the middle?
|
||||||
|
- PAYOFF (0-25): Does the insight LAND? Would the viewer feel smarter/moved?
|
||||||
|
- CLEAN CUT (0-25): Can this end on a strong note without trailing off?
|
||||||
|
|
||||||
|
Also provide:
|
||||||
|
- Exact start quote (the first words of the clip)
|
||||||
|
- Exact end quote (the last words before cutting)
|
||||||
|
- Any adjustments to improve the score
|
||||||
|
|
||||||
|
Return JSON:
|
||||||
|
{{
|
||||||
|
"total_score": 0-100,
|
||||||
|
"hook_score": 0-25,
|
||||||
|
"build_score": 0-25,
|
||||||
|
"payoff_score": 0-25,
|
||||||
|
"clean_cut_score": 0-25,
|
||||||
|
"start_quote": "exact first words",
|
||||||
|
"end_quote": "exact last words",
|
||||||
|
"adjustments": "how to improve",
|
||||||
|
"would_you_post_this": true/false,
|
||||||
|
"reason": "one line summary"
|
||||||
|
}}"""
|
||||||
|
|
||||||
|
try:
|
||||||
|
response = call_claude(prompt, max_tokens=1500)
|
||||||
|
json_match = re.search(r'\{[\s\S]*\}', response)
|
||||||
|
if json_match:
|
||||||
|
return json.loads(json_match.group())
|
||||||
|
return {"total_score": 0, "reason": "Failed to parse"}
|
||||||
|
except Exception as e:
|
||||||
|
return {"total_score": 0, "reason": f"API error: {e}"}
|
||||||
|
|
||||||
|
|
||||||
|
def get_context_around_timestamp(segments, timestamp_str, context_seconds=180):
|
||||||
|
"""Get clean transcript text around a timestamp."""
|
||||||
|
parts = timestamp_str.replace('[', '').replace(']', '').split(':')
|
||||||
|
if len(parts) == 2:
|
||||||
|
target_sec = int(parts[0]) * 60 + int(parts[1])
|
||||||
|
elif len(parts) == 3:
|
||||||
|
target_sec = int(parts[0]) * 3600 + int(parts[1]) * 60 + int(parts[2])
|
||||||
|
else:
|
||||||
|
target_sec = 0
|
||||||
|
|
||||||
|
context = ''
|
||||||
|
last_ts = -15
|
||||||
|
for seg in segments:
|
||||||
|
if target_sec - context_seconds <= seg['start'] <= target_sec + context_seconds:
|
||||||
|
if seg['start'] - last_ts >= 15:
|
||||||
|
m, s = divmod(int(seg['start']), 60)
|
||||||
|
context += f'\n[{m}:{s:02d}] '
|
||||||
|
last_ts = seg['start']
|
||||||
|
context += seg['text'] + ' '
|
||||||
|
|
||||||
|
return context[:5000]
|
||||||
|
|
||||||
|
|
||||||
|
def cut_clip(video_url, start_sec, duration_sec, output_path):
|
||||||
|
"""Download video and cut a clip using ffmpeg."""
|
||||||
|
video_id = re.search(r'(?:v=|/)([a-zA-Z0-9_-]{11})', video_url).group(1)
|
||||||
|
|
||||||
|
video_cache = f"/tmp/editorial_{video_id}.mp4"
|
||||||
|
if not os.path.exists(video_cache):
|
||||||
|
print(f" ⬇️ Downloading video...")
|
||||||
|
subprocess.run([
|
||||||
|
'yt-dlp', '--format', 'best[height<=720]',
|
||||||
|
'--output', video_cache, '--no-playlist', video_url
|
||||||
|
], capture_output=True, check=True)
|
||||||
|
|
||||||
|
CLIPS_DIR.mkdir(parents=True, exist_ok=True)
|
||||||
|
cmd = [
|
||||||
|
'ffmpeg', '-y',
|
||||||
|
'-ss', str(start_sec),
|
||||||
|
'-i', video_cache,
|
||||||
|
'-t', str(duration_sec),
|
||||||
|
'-vf', 'crop=ih*9/16:ih,scale=1080:1920',
|
||||||
|
'-c:a', 'aac', '-b:a', '128k',
|
||||||
|
output_path
|
||||||
|
]
|
||||||
|
subprocess.run(cmd, capture_output=True, check=True)
|
||||||
|
return os.path.exists(output_path)
|
||||||
|
|
||||||
|
|
||||||
|
def timestamp_to_seconds(ts_str):
|
||||||
|
"""Convert timestamp string like '14:31' to seconds."""
|
||||||
|
parts = ts_str.replace('[', '').replace(']', '').strip().split(':')
|
||||||
|
if len(parts) == 2:
|
||||||
|
return int(parts[0]) * 60 + int(parts[1])
|
||||||
|
elif len(parts) == 3:
|
||||||
|
return int(parts[0]) * 3600 + int(parts[1]) * 60 + int(parts[2])
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(description='Editorial Brain — LLM-powered clip discovery')
|
||||||
|
parser.add_argument('--url', help='YouTube URL')
|
||||||
|
parser.add_argument('--vtt', help='VTT file path')
|
||||||
|
parser.add_argument('--video-id', help='Video ID (required with --vtt)')
|
||||||
|
parser.add_argument('--title', default='', help='Video title')
|
||||||
|
parser.add_argument('--max-clips', type=int, default=5, help='Max clips to produce')
|
||||||
|
parser.add_argument('--min-score', type=int, default=90, help='Minimum score threshold')
|
||||||
|
parser.add_argument('--skip-cut', action='store_true', help='Skip video cutting (analysis only)')
|
||||||
|
parser.add_argument('--output', help='Output JSON path')
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
if not ANTHROPIC_API_KEY:
|
||||||
|
print("❌ Set ANTHROPIC_API_KEY environment variable")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
output_path = args.output or str(DATA_DIR / "editorial-clips-latest.json")
|
||||||
|
|
||||||
|
# Step 1: Get transcript
|
||||||
|
if args.url:
|
||||||
|
print(f"📥 Downloading subtitles...")
|
||||||
|
vtt_path, video_id = download_vtt(args.url)
|
||||||
|
elif args.vtt:
|
||||||
|
vtt_path = args.vtt
|
||||||
|
video_id = args.video_id or 'unknown'
|
||||||
|
else:
|
||||||
|
parser.print_help()
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
print(f"📝 Parsing transcript...")
|
||||||
|
segments = parse_vtt(vtt_path)
|
||||||
|
print(f" {len(segments)} segments")
|
||||||
|
|
||||||
|
readable = build_readable_transcript(segments)
|
||||||
|
chunks = chunk_transcript(readable)
|
||||||
|
print(f" {len(chunks)} chunks for analysis")
|
||||||
|
|
||||||
|
# Step 2: Scan for moments
|
||||||
|
all_moments = []
|
||||||
|
|
||||||
|
if len(readable) < 80000:
|
||||||
|
print(f"\n🔍 Pass 1: Full-transcript analysis (single call, {len(readable)//1000}K chars)...")
|
||||||
|
moments = find_moments_full_transcript(readable, args.title)
|
||||||
|
all_moments = moments
|
||||||
|
print(f" Found {len(moments)} candidate(s)")
|
||||||
|
else:
|
||||||
|
print(f"\n🔍 Pass 1: Chunked analysis ({len(chunks)} chunks)...")
|
||||||
|
for i, chunk in enumerate(chunks):
|
||||||
|
moments = find_moments_in_chunk(chunk, i, args.title)
|
||||||
|
if moments:
|
||||||
|
print(f" Chunk {i+1}/{len(chunks)}: Found {len(moments)} candidate(s)")
|
||||||
|
for m in moments:
|
||||||
|
m['chunk_idx'] = i
|
||||||
|
all_moments.append(m)
|
||||||
|
else:
|
||||||
|
print(f" Chunk {i+1}/{len(chunks)}: No moments")
|
||||||
|
|
||||||
|
print(f"\n📊 Pass 1 complete: {len(all_moments)} total candidates")
|
||||||
|
|
||||||
|
if not all_moments:
|
||||||
|
print("❌ No clip-worthy moments found in this episode")
|
||||||
|
sys.exit(0)
|
||||||
|
|
||||||
|
all_moments.sort(key=lambda x: x.get('estimated_score', 0), reverse=True)
|
||||||
|
top_candidates = all_moments[:min(10, len(all_moments))]
|
||||||
|
|
||||||
|
for m in top_candidates:
|
||||||
|
print(f" [{m.get('start_timestamp', '?')}] Score ~{m.get('estimated_score', '?')}: {m.get('hook', '?')[:60]}")
|
||||||
|
|
||||||
|
# Step 3: Deep-score candidates (Pass 2)
|
||||||
|
print(f"\n🎯 Pass 2: Deep-scoring top {len(top_candidates)} candidates...")
|
||||||
|
scored = []
|
||||||
|
for i, moment in enumerate(top_candidates):
|
||||||
|
ts = moment.get('start_timestamp', '0:00')
|
||||||
|
context = get_context_around_timestamp(segments, ts)
|
||||||
|
score = score_and_refine_moment(moment, context, args.title)
|
||||||
|
moment['deep_score'] = score
|
||||||
|
total = score.get('total_score', 0)
|
||||||
|
scored.append(moment)
|
||||||
|
|
||||||
|
status = "✅" if total >= args.min_score else "❌"
|
||||||
|
print(f" {status} [{ts}] Score: {total}/100 — {score.get('reason', '?')[:80]}")
|
||||||
|
|
||||||
|
passed = [m for m in scored if m.get('deep_score', {}).get('total_score', 0) >= args.min_score]
|
||||||
|
print(f"\n🏆 {len(passed)} clips scored {args.min_score}+")
|
||||||
|
|
||||||
|
# Step 4: Cut clips
|
||||||
|
results = {
|
||||||
|
'video_id': video_id,
|
||||||
|
'title': args.title,
|
||||||
|
'url': args.url or '',
|
||||||
|
'total_candidates': len(all_moments),
|
||||||
|
'scored': len(scored),
|
||||||
|
'passed': len(passed),
|
||||||
|
'threshold': args.min_score,
|
||||||
|
'clips': []
|
||||||
|
}
|
||||||
|
|
||||||
|
if passed and not args.skip_cut and args.url:
|
||||||
|
print(f"\n✂️ Cutting {len(passed)} clips...")
|
||||||
|
for i, moment in enumerate(passed[:args.max_clips]):
|
||||||
|
start_sec = timestamp_to_seconds(moment.get('start_timestamp', '0:00'))
|
||||||
|
end_sec = timestamp_to_seconds(moment.get('end_timestamp', '0:00'))
|
||||||
|
duration = max(30, min(60, end_sec - start_sec)) if end_sec > start_sec else 45
|
||||||
|
|
||||||
|
clip_id = f"{video_id}_editorial_{i+1}"
|
||||||
|
clip_output = str(CLIPS_DIR / f"{clip_id}.mp4")
|
||||||
|
|
||||||
|
try:
|
||||||
|
cut_clip(args.url, start_sec, duration, clip_output)
|
||||||
|
print(f" ✅ {clip_id}.mp4 ({duration}s)")
|
||||||
|
results['clips'].append({
|
||||||
|
'id': clip_id,
|
||||||
|
'path': clip_output,
|
||||||
|
'start': start_sec,
|
||||||
|
'duration': duration,
|
||||||
|
'score': moment['deep_score'],
|
||||||
|
'hook': moment.get('hook', ''),
|
||||||
|
'payoff': moment.get('payoff', ''),
|
||||||
|
})
|
||||||
|
except Exception as e:
|
||||||
|
print(f" ❌ Cut failed: {e}")
|
||||||
|
|
||||||
|
results['all_scored'] = [{
|
||||||
|
'timestamp': m.get('start_timestamp', '?'),
|
||||||
|
'score': m.get('deep_score', {}).get('total_score', 0),
|
||||||
|
'hook': m.get('hook', ''),
|
||||||
|
'payoff': m.get('payoff', ''),
|
||||||
|
'reason': m.get('deep_score', {}).get('reason', ''),
|
||||||
|
'adjustments': m.get('deep_score', {}).get('adjustments', ''),
|
||||||
|
} for m in scored]
|
||||||
|
|
||||||
|
Path(output_path).parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
with open(output_path, 'w') as f:
|
||||||
|
json.dump(results, f, indent=2)
|
||||||
|
print(f"\n💾 Saved to {output_path}")
|
||||||
|
|
||||||
|
return 0 if passed else 1
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
sys.exit(main())
|
||||||
420
content-ops/scripts/quote-mining-engine.py
Normal file
420
content-ops/scripts/quote-mining-engine.py
Normal file
|
|
@ -0,0 +1,420 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Quote Mining Engine — Extract viral-worthy quotes from podcasts and notes.
|
||||||
|
|
||||||
|
Scans RSS feeds and local markdown/text files to extract the most quotable,
|
||||||
|
contrarian, and viral-worthy moments. Outputs scored candidates ready to publish.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python quote-mining-engine.py --days 90 --top 50 --min-score 60
|
||||||
|
python quote-mining-engine.py --feeds feeds.json --notes-dir ./notes/
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import sys
|
||||||
|
import hashlib
|
||||||
|
from datetime import datetime, timedelta, timezone
|
||||||
|
from pathlib import Path
|
||||||
|
from html import unescape
|
||||||
|
|
||||||
|
import feedparser
|
||||||
|
|
||||||
|
# ── Configuration ──
|
||||||
|
|
||||||
|
SCRIPT_DIR = Path(__file__).resolve().parent
|
||||||
|
PROJECT_DIR = SCRIPT_DIR.parent
|
||||||
|
DATA_DIR = Path(os.environ.get("CONTENT_OPS_DATA_DIR", PROJECT_DIR / "data"))
|
||||||
|
OUTPUT_PATH = DATA_DIR / "quote-mining-latest.json"
|
||||||
|
|
||||||
|
# Configure feeds via environment variable or JSON file
|
||||||
|
# Format: {"Feed Name": "https://feed-url.com/rss", ...}
|
||||||
|
FEEDS_FILE = os.environ.get("QUOTE_MINING_FEEDS_FILE", str(PROJECT_DIR / "config" / "feeds.json"))
|
||||||
|
|
||||||
|
# Directory containing meeting notes / transcripts (markdown files)
|
||||||
|
NOTES_DIR = os.environ.get("QUOTE_MINING_NOTES_DIR", "")
|
||||||
|
|
||||||
|
# Speaker name to look for in meeting notes (configurable)
|
||||||
|
SPEAKER_NAME = os.environ.get("QUOTE_MINING_SPEAKER", "")
|
||||||
|
|
||||||
|
# ── Viral scoring heuristics ──
|
||||||
|
|
||||||
|
CONTRARIAN_SIGNALS = [
|
||||||
|
r"\b(?:wrong|myth|lie|dead|overrated|underrated|nobody|everyone)\b",
|
||||||
|
r"\b(?:stop|quit|don\'t|never|avoid|mistake|fail)\b",
|
||||||
|
r"\b(?:secret|hidden|overlooked|surprising|counterintuitive)\b",
|
||||||
|
r"\b(?:actually|truth|reality|real reason)\b",
|
||||||
|
r"\b(?:unpopular opinion|hot take|controversial)\b",
|
||||||
|
]
|
||||||
|
|
||||||
|
SPECIFICITY_SIGNALS = [
|
||||||
|
r"\$[\d,.]+[MBKmk]?",
|
||||||
|
r"\b\d{1,3}%\b",
|
||||||
|
r"\b\d+x\b",
|
||||||
|
r"\b(?:doubled|tripled|10x|100x)\b",
|
||||||
|
r"\b\d{4,}\b",
|
||||||
|
r"\b(?:case study|example|data|study|research)\b",
|
||||||
|
]
|
||||||
|
|
||||||
|
EMOTIONAL_TRIGGERS = [
|
||||||
|
r"\b(?:fear|afraid|scared|worried|anxious)\b",
|
||||||
|
r"\b(?:love|hate|obsessed|passionate)\b",
|
||||||
|
r"\b(?:shocking|insane|crazy|wild|unbelievable|mindblowing)\b",
|
||||||
|
r"\b(?:broke|rich|wealthy|millionaire|billionaire)\b",
|
||||||
|
r"\b(?:fired|hired|quit|resigned)\b",
|
||||||
|
r"\b(?:AI|artificial intelligence|ChatGPT|GPT|automation)\b",
|
||||||
|
]
|
||||||
|
|
||||||
|
SHAREABILITY_SIGNALS = [
|
||||||
|
r"\b(?:how to|step.by.step|framework|playbook|strategy)\b",
|
||||||
|
r"\b(?:lesson|learned|mistake|regret)\b",
|
||||||
|
r"\b(?:why (?:most|nobody|everyone))\b",
|
||||||
|
r"\b(?:the (?:one|only|best|worst|biggest))\b",
|
||||||
|
r"\bhack\b",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def score_text(text: str) -> dict:
|
||||||
|
"""Score a text blob for viral potential. Returns breakdown + total."""
|
||||||
|
t = text.lower()
|
||||||
|
|
||||||
|
def count_matches(patterns):
|
||||||
|
return sum(1 for p in patterns if re.search(p, t, re.I))
|
||||||
|
|
||||||
|
contrarian = min(count_matches(CONTRARIAN_SIGNALS) * 15, 35)
|
||||||
|
specificity = min(count_matches(SPECIFICITY_SIGNALS) * 12, 30)
|
||||||
|
emotional = min(count_matches(EMOTIONAL_TRIGGERS) * 12, 25)
|
||||||
|
shareability = min(count_matches(SHAREABILITY_SIGNALS) * 12, 25)
|
||||||
|
|
||||||
|
words = len(text.split())
|
||||||
|
if words <= 15:
|
||||||
|
length_bonus = 10
|
||||||
|
elif words <= 30:
|
||||||
|
length_bonus = 5
|
||||||
|
else:
|
||||||
|
length_bonus = 0
|
||||||
|
|
||||||
|
question_bonus = 8 if re.search(r"\?", text) else 0
|
||||||
|
number_bonus = 8 if re.search(r"\b\d+\b", text) else 0
|
||||||
|
howto_bonus = 8 if re.search(r"^(?:how|why|what|when|the\s+\d)", text, re.I) else 0
|
||||||
|
|
||||||
|
total = min(contrarian + specificity + emotional + shareability + length_bonus + question_bonus + number_bonus + howto_bonus, 100)
|
||||||
|
return {
|
||||||
|
"contrarian": contrarian,
|
||||||
|
"specificity": specificity,
|
||||||
|
"emotional": emotional,
|
||||||
|
"shareability": shareability,
|
||||||
|
"total": total,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def suggest_platform(score_breakdown: dict, text: str) -> str:
|
||||||
|
"""Suggest X, LinkedIn, or both based on content characteristics."""
|
||||||
|
if score_breakdown["specificity"] >= 15 and score_breakdown["shareability"] >= 10:
|
||||||
|
return "both"
|
||||||
|
if score_breakdown["emotional"] >= 15 or len(text.split()) <= 20:
|
||||||
|
return "X"
|
||||||
|
if score_breakdown["specificity"] >= 10 or score_breakdown["shareability"] >= 10:
|
||||||
|
return "LinkedIn"
|
||||||
|
if score_breakdown["total"] >= 60:
|
||||||
|
return "both"
|
||||||
|
return "X"
|
||||||
|
|
||||||
|
|
||||||
|
def generate_hook(quote: str) -> str:
|
||||||
|
"""Generate a punchy X-ready opening line from a quote."""
|
||||||
|
q = quote.strip().rstrip(".")
|
||||||
|
words = q.split()
|
||||||
|
if len(words) <= 20:
|
||||||
|
return q + "."
|
||||||
|
short = " ".join(words[:15])
|
||||||
|
for sep in [". ", ", ", " — ", " - ", ": "]:
|
||||||
|
idx = short.rfind(sep)
|
||||||
|
if idx > 20:
|
||||||
|
return short[: idx + len(sep)].strip().rstrip(",") + "..."
|
||||||
|
return short + "..."
|
||||||
|
|
||||||
|
|
||||||
|
def strip_html(text: str) -> str:
|
||||||
|
"""Remove HTML tags and decode entities."""
|
||||||
|
text = re.sub(r"<[^>]+>", " ", text)
|
||||||
|
text = unescape(text)
|
||||||
|
text = re.sub(r"\s+", " ", text).strip()
|
||||||
|
return text
|
||||||
|
|
||||||
|
|
||||||
|
def make_id(text: str) -> str:
|
||||||
|
return hashlib.md5(text.encode()).hexdigest()[:10]
|
||||||
|
|
||||||
|
|
||||||
|
def load_feeds() -> dict:
|
||||||
|
"""Load RSS feed configuration."""
|
||||||
|
feeds_path = Path(FEEDS_FILE)
|
||||||
|
if feeds_path.exists():
|
||||||
|
try:
|
||||||
|
with open(feeds_path) as f:
|
||||||
|
return json.load(f)
|
||||||
|
except Exception as e:
|
||||||
|
print(f" ⚠ Error loading feeds config: {e}")
|
||||||
|
|
||||||
|
# Check environment variable for inline JSON
|
||||||
|
feeds_env = os.environ.get("QUOTE_MINING_FEEDS", "")
|
||||||
|
if feeds_env:
|
||||||
|
try:
|
||||||
|
return json.loads(feeds_env)
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
print(" ⚠ No feeds configured. Set QUOTE_MINING_FEEDS_FILE or QUOTE_MINING_FEEDS env var.")
|
||||||
|
print(" Example feeds.json: {\"My Podcast\": \"https://feeds.example.com/rss\"}")
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
# ── RSS Feed Processing ──
|
||||||
|
|
||||||
|
def fetch_feed_quotes(feed_name: str, feed_url: str, since: datetime) -> list:
|
||||||
|
"""Parse an RSS feed and extract quotable candidates."""
|
||||||
|
print(f" Fetching {feed_name}...")
|
||||||
|
feed = feedparser.parse(feed_url)
|
||||||
|
candidates = []
|
||||||
|
|
||||||
|
for entry in feed.entries:
|
||||||
|
pub = entry.get("published_parsed") or entry.get("updated_parsed")
|
||||||
|
if not pub:
|
||||||
|
continue
|
||||||
|
pub_dt = datetime(*pub[:6], tzinfo=timezone.utc)
|
||||||
|
if pub_dt < since:
|
||||||
|
continue
|
||||||
|
|
||||||
|
title = entry.get("title", "").strip()
|
||||||
|
desc = strip_html(entry.get("description", "") or entry.get("summary", ""))
|
||||||
|
date_str = pub_dt.strftime("%Y-%m-%d")
|
||||||
|
|
||||||
|
if title:
|
||||||
|
scores = score_text(title + " " + desc[:200])
|
||||||
|
context_sentence = desc[:200].split(".")[0].strip() + "." if desc else title
|
||||||
|
|
||||||
|
candidates.append({
|
||||||
|
"id": make_id(title + date_str),
|
||||||
|
"quote_text": title,
|
||||||
|
"source": f"{feed_name} — {title} ({date_str})",
|
||||||
|
"viral_score": scores["total"],
|
||||||
|
"score_breakdown": scores,
|
||||||
|
"suggested_platform": suggest_platform(scores, title),
|
||||||
|
"hook_version": generate_hook(title),
|
||||||
|
"context": context_sentence,
|
||||||
|
"type": "podcast_title",
|
||||||
|
})
|
||||||
|
|
||||||
|
if desc and len(desc) > 50:
|
||||||
|
sentences = re.split(r"(?<=[.!?])\s+", desc)
|
||||||
|
for sent in sentences:
|
||||||
|
sent = sent.strip()
|
||||||
|
if len(sent) < 30 or len(sent) > 300:
|
||||||
|
continue
|
||||||
|
if any(skip in sent.lower() for skip in [
|
||||||
|
"subscribe", "leave a review", "click here", "sign up",
|
||||||
|
"sponsor", "brought to you", "check out", "visit us",
|
||||||
|
"follow us", "download", "episode is", "links mentioned",
|
||||||
|
"get a free", "use code", "http", "www.", ".com/",
|
||||||
|
]):
|
||||||
|
continue
|
||||||
|
s = score_text(sent)
|
||||||
|
if s["total"] >= 30:
|
||||||
|
candidates.append({
|
||||||
|
"id": make_id(sent + date_str),
|
||||||
|
"quote_text": sent,
|
||||||
|
"source": f"{feed_name} — {title} ({date_str})",
|
||||||
|
"viral_score": s["total"],
|
||||||
|
"score_breakdown": s,
|
||||||
|
"suggested_platform": suggest_platform(s, sent),
|
||||||
|
"hook_version": generate_hook(sent),
|
||||||
|
"context": f"From episode: {title}",
|
||||||
|
"type": "podcast_description",
|
||||||
|
})
|
||||||
|
|
||||||
|
print(f" → {len(candidates)} candidates from {feed_name}")
|
||||||
|
return candidates
|
||||||
|
|
||||||
|
|
||||||
|
# ── Notes Processing ──
|
||||||
|
|
||||||
|
def scan_notes(notes_dir: str, since: datetime, speaker: str = "") -> list:
|
||||||
|
"""Scan meeting notes/transcripts for quotable moments."""
|
||||||
|
notes_path = Path(notes_dir)
|
||||||
|
if not notes_path.exists():
|
||||||
|
print(f" ⚠ Notes directory not found: {notes_dir}, skipping.")
|
||||||
|
return []
|
||||||
|
|
||||||
|
print(f" Scanning notes in {notes_dir}...")
|
||||||
|
candidates = []
|
||||||
|
|
||||||
|
for fpath in sorted(notes_path.glob("**/*.md")):
|
||||||
|
m = re.match(r"(\d{4}-\d{2}-\d{2})", fpath.name)
|
||||||
|
if m:
|
||||||
|
file_date = datetime.strptime(m.group(1), "%Y-%m-%d").replace(tzinfo=timezone.utc)
|
||||||
|
if file_date < since:
|
||||||
|
continue
|
||||||
|
else:
|
||||||
|
# If no date in filename, include by default
|
||||||
|
file_date = datetime.now(timezone.utc)
|
||||||
|
|
||||||
|
try:
|
||||||
|
text = fpath.read_text(errors="replace")
|
||||||
|
except Exception:
|
||||||
|
continue
|
||||||
|
|
||||||
|
meeting_name = fpath.stem.replace("_", " ").lstrip("0123456789- ")
|
||||||
|
|
||||||
|
notable_lines = []
|
||||||
|
for line in text.split("\n"):
|
||||||
|
line = line.strip()
|
||||||
|
if not line or len(line) < 30:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Match lines attributed to configured speaker
|
||||||
|
if speaker and re.match(rf"(?:{re.escape(speaker)})\s*:", line, re.I):
|
||||||
|
content = re.sub(rf"^(?:{re.escape(speaker)})\s*:\s*", "", line, flags=re.I)
|
||||||
|
notable_lines.append(content.strip())
|
||||||
|
# Grab bullet points with viral signals
|
||||||
|
elif re.match(r"[\*\-]\s+", line):
|
||||||
|
bullet = re.sub(r"^[\*\-]\s+", "", line).strip()
|
||||||
|
if len(bullet) > 30 and any(
|
||||||
|
re.search(p, bullet, re.I)
|
||||||
|
for p in CONTRARIAN_SIGNALS + SPECIFICITY_SIGNALS + EMOTIONAL_TRIGGERS
|
||||||
|
):
|
||||||
|
notable_lines.append(bullet)
|
||||||
|
|
||||||
|
for line in notable_lines:
|
||||||
|
if len(line) < 20 or len(line) > 500:
|
||||||
|
continue
|
||||||
|
if any(skip in line.lower() for skip in [
|
||||||
|
"let me share my screen", "can you hear me", "hold on",
|
||||||
|
"one second", "sorry about that", "let me pull up",
|
||||||
|
"next slide", "any questions", "sounds good",
|
||||||
|
]):
|
||||||
|
continue
|
||||||
|
|
||||||
|
s = score_text(line)
|
||||||
|
if s["total"] >= 25:
|
||||||
|
date_str = file_date.strftime("%Y-%m-%d")
|
||||||
|
candidates.append({
|
||||||
|
"id": make_id(line + date_str),
|
||||||
|
"quote_text": line,
|
||||||
|
"source": f"Notes — {meeting_name} ({date_str})",
|
||||||
|
"viral_score": s["total"],
|
||||||
|
"score_breakdown": s,
|
||||||
|
"suggested_platform": suggest_platform(s, line),
|
||||||
|
"hook_version": generate_hook(line),
|
||||||
|
"context": f"From: {meeting_name}",
|
||||||
|
"type": "meeting_notes",
|
||||||
|
})
|
||||||
|
|
||||||
|
print(f" → {len(candidates)} candidates from notes")
|
||||||
|
return candidates
|
||||||
|
|
||||||
|
|
||||||
|
# ── Main ──
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(description="Quote Mining Engine")
|
||||||
|
parser.add_argument("--days", type=int, default=90, help="Look back N days (default: 90)")
|
||||||
|
parser.add_argument("--top", type=int, default=50, help="Return top N quotes (default: 50)")
|
||||||
|
parser.add_argument("--min-score", type=int, default=40, help="Minimum viral score (default: 40)")
|
||||||
|
parser.add_argument("--output", type=str, default=str(OUTPUT_PATH), help="Output JSON path")
|
||||||
|
parser.add_argument("--feeds", type=str, help="Path to feeds JSON config file")
|
||||||
|
parser.add_argument("--notes-dir", type=str, help="Directory of meeting notes to scan")
|
||||||
|
parser.add_argument("--speaker", type=str, help="Speaker name to extract from notes")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
since = datetime.now(timezone.utc) - timedelta(days=args.days)
|
||||||
|
print(f"🔍 Quote Mining Engine — scanning last {args.days} days (since {since.strftime('%Y-%m-%d')})\n")
|
||||||
|
|
||||||
|
all_candidates = []
|
||||||
|
|
||||||
|
# 1. Podcast RSS feeds
|
||||||
|
feeds_file = args.feeds or FEEDS_FILE
|
||||||
|
if args.feeds:
|
||||||
|
os.environ["QUOTE_MINING_FEEDS_FILE"] = args.feeds
|
||||||
|
|
||||||
|
feeds = load_feeds() if not args.feeds else json.load(open(args.feeds))
|
||||||
|
if feeds:
|
||||||
|
print("📡 Fetching podcast feeds...")
|
||||||
|
for name, url in feeds.items():
|
||||||
|
try:
|
||||||
|
all_candidates.extend(fetch_feed_quotes(name, url, since))
|
||||||
|
except Exception as e:
|
||||||
|
print(f" ⚠ Error fetching {name}: {e}")
|
||||||
|
|
||||||
|
# 2. Meeting notes
|
||||||
|
notes_dir = args.notes_dir or NOTES_DIR
|
||||||
|
speaker = args.speaker or SPEAKER_NAME
|
||||||
|
if notes_dir:
|
||||||
|
print("\n📝 Scanning meeting notes...")
|
||||||
|
try:
|
||||||
|
all_candidates.extend(scan_notes(notes_dir, since, speaker))
|
||||||
|
except Exception as e:
|
||||||
|
print(f" ⚠ Error scanning notes: {e}")
|
||||||
|
|
||||||
|
# 3. Deduplicate
|
||||||
|
seen = set()
|
||||||
|
unique = []
|
||||||
|
for c in all_candidates:
|
||||||
|
if c["id"] not in seen:
|
||||||
|
seen.add(c["id"])
|
||||||
|
unique.append(c)
|
||||||
|
all_candidates = unique
|
||||||
|
|
||||||
|
# 4. Filter by min score
|
||||||
|
filtered = [c for c in all_candidates if c["viral_score"] >= args.min_score]
|
||||||
|
|
||||||
|
# 5. Sort and take top N
|
||||||
|
filtered.sort(key=lambda x: x["viral_score"], reverse=True)
|
||||||
|
top = filtered[: args.top]
|
||||||
|
|
||||||
|
# 6. Clean output
|
||||||
|
output = []
|
||||||
|
for c in top:
|
||||||
|
output.append({
|
||||||
|
"quote_text": c["quote_text"],
|
||||||
|
"source": c["source"],
|
||||||
|
"viral_score": c["viral_score"],
|
||||||
|
"suggested_platform": c["suggested_platform"],
|
||||||
|
"hook_version": c["hook_version"],
|
||||||
|
"context": c["context"],
|
||||||
|
})
|
||||||
|
|
||||||
|
# 7. Save
|
||||||
|
os.makedirs(os.path.dirname(args.output), exist_ok=True)
|
||||||
|
with open(args.output, "w") as f:
|
||||||
|
json.dump(output, f, indent=2)
|
||||||
|
|
||||||
|
# 8. Summary
|
||||||
|
print(f"\n{'='*60}")
|
||||||
|
print(f"📊 QUOTE MINING SUMMARY")
|
||||||
|
print(f"{'='*60}")
|
||||||
|
print(f" Total candidates found: {len(all_candidates)}")
|
||||||
|
print(f" Above min score ({args.min_score}): {len(filtered)}")
|
||||||
|
print(f" Top quotes saved: {len(output)}")
|
||||||
|
print(f" Output: {args.output}")
|
||||||
|
print()
|
||||||
|
|
||||||
|
if output:
|
||||||
|
print(f"🏆 Top 10 Quotes:")
|
||||||
|
print(f"{'-'*60}")
|
||||||
|
for i, q in enumerate(output[:10], 1):
|
||||||
|
print(f" {i:2d}. [{q['viral_score']:3d}] {q['quote_text'][:80]}")
|
||||||
|
print(f" → {q['source'][:60]}")
|
||||||
|
print(f" Platform: {q['suggested_platform']} | Hook: {q['hook_version'][:50]}...")
|
||||||
|
print()
|
||||||
|
else:
|
||||||
|
print(" ⚠ No quotes met the minimum score threshold.")
|
||||||
|
print(f" Try lowering --min-score (currently {args.min_score})")
|
||||||
|
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
sys.exit(main())
|
||||||
7
finance-ops/.env.example
Normal file
7
finance-ops/.env.example
Normal file
|
|
@ -0,0 +1,7 @@
|
||||||
|
# AI Finance Ops - Environment Variables
|
||||||
|
# No API keys required — all analysis runs locally.
|
||||||
|
|
||||||
|
# Optional: Override default data directories
|
||||||
|
# CFO_INPUT_DIR=./data/uploads
|
||||||
|
# CFO_HISTORY_DIR=./data/history
|
||||||
|
# SCENARIO_OUTPUT=./data/scenarios.json
|
||||||
128
finance-ops/README.md
Normal file
128
finance-ops/README.md
Normal file
|
|
@ -0,0 +1,128 @@
|
||||||
|
# AI Finance Ops
|
||||||
|
|
||||||
|
> Your AI CFO that finds hidden costs in 30 minutes.
|
||||||
|
|
||||||
|
Upload your QuickBooks exports. Get a full executive CFO briefing with anomaly detection, burn rate analysis, vendor concentration risk, and actionable recommendations. Or point it at a codebase and get a development cost estimate with organizational overhead modeling and AI ROI analysis.
|
||||||
|
|
||||||
|
## What's Inside
|
||||||
|
|
||||||
|
### CFO Briefing Generator
|
||||||
|
Drop in your QuickBooks exports (P&L, Balance Sheet, General Ledger, Expenses by Vendor, Cash Flow, etc.) and get:
|
||||||
|
|
||||||
|
- **Executive financial summary** with traffic-light status indicators (🟢🟡🔴)
|
||||||
|
- **Profitability analysis** — gross margin, net margin, operating income
|
||||||
|
- **People cost breakdown** — salaries vs contractors, payroll taxes, benefits
|
||||||
|
- **Tool & subscription audit** — find the SaaS bloat
|
||||||
|
- **Customer concentration risk** — flag dangerous client dependencies
|
||||||
|
- **Month-over-month comparison** — automatic trend detection
|
||||||
|
- **Anomaly alerts** — expenses that spike, new vendors with big spend, owner draws
|
||||||
|
- **Scenario modeling** — base/bull/bear case projections with monthly burns
|
||||||
|
|
||||||
|
### Codebase Cost Estimator
|
||||||
|
Point it at any codebase and get:
|
||||||
|
|
||||||
|
- **Development hours estimate** by code type and complexity
|
||||||
|
- **Market rate research** with current-year data
|
||||||
|
- **Organizational overhead modeling** — solo founder through enterprise
|
||||||
|
- **Full team cost** — PM, design, QA, DevOps, not just engineering
|
||||||
|
- **AI ROI analysis** — what did each hour of Claude produce in value?
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Install dependencies
|
||||||
|
pip install -r requirements.txt
|
||||||
|
|
||||||
|
# 2. Copy env template
|
||||||
|
cp .env.example .env
|
||||||
|
|
||||||
|
# 3. Drop QuickBooks exports into a folder
|
||||||
|
mkdir -p data/uploads
|
||||||
|
# Copy your CSV/XLSX files there
|
||||||
|
|
||||||
|
# 4. Run CFO analysis
|
||||||
|
python scripts/cfo-analyzer.py --input data/uploads/
|
||||||
|
|
||||||
|
# 5. Or estimate a codebase
|
||||||
|
# Use the SKILL.md workflow with Claude Code
|
||||||
|
```
|
||||||
|
|
||||||
|
## Supported QuickBooks Reports
|
||||||
|
|
||||||
|
Any subset works — P&L alone is enough to start:
|
||||||
|
|
||||||
|
| Report | What It Adds |
|
||||||
|
|--------|-------------|
|
||||||
|
| P&L Summary | Revenue, COGS, expenses, net income (core) |
|
||||||
|
| P&L by Customer | Client concentration analysis |
|
||||||
|
| P&L Detail | Transaction-level drill-down |
|
||||||
|
| Balance Sheet | Assets, liabilities, equity position |
|
||||||
|
| Cash Flow Statement | Operating/investing/financing flows |
|
||||||
|
| General Ledger | Full account transaction history |
|
||||||
|
| Expenses by Vendor | Vendor-level spend breakdown |
|
||||||
|
| Transaction List by Vendor | Detailed vendor transactions |
|
||||||
|
| Bill Payments | AP payment history |
|
||||||
|
| Account List | Chart of accounts |
|
||||||
|
|
||||||
|
## How It Works
|
||||||
|
|
||||||
|
The CFO analyzer:
|
||||||
|
1. **Auto-detects** report types by scanning file headers
|
||||||
|
2. **Parses** QuickBooks CSV/XLSX formats (handles dollar signs, commas, negative formats)
|
||||||
|
3. **Computes KPIs** against benchmarks for your business size
|
||||||
|
4. **Compares** to prior periods if history exists
|
||||||
|
5. **Generates** a formatted executive briefing with status indicators
|
||||||
|
|
||||||
|
The scenario modeler:
|
||||||
|
1. **Reads** the latest financial analysis
|
||||||
|
2. **Models** base case (status quo), bull case (growth targets hit), and bear case (lose top clients)
|
||||||
|
3. **Projects** 12 months forward with monthly P&L
|
||||||
|
4. **Identifies** the fastest cost levers to pull
|
||||||
|
|
||||||
|
## Benchmark Thresholds
|
||||||
|
|
||||||
|
Built-in benchmarks for B2B services businesses:
|
||||||
|
|
||||||
|
| Metric | 🟢 Healthy | 🟡 Watch | 🔴 Action Needed |
|
||||||
|
|--------|-----------|---------|-----------------|
|
||||||
|
| Gross Margin | >60% | 45-60% | <45% |
|
||||||
|
| Net Margin | >10% | 0-10% | Negative |
|
||||||
|
| People Costs (% rev) | <65% | 65-75% | >75% |
|
||||||
|
| Tool/Sub Costs (% rev) | <8% | 8-12% | >12% |
|
||||||
|
| Client Concentration | No client >15% | One at 15-25% | One >25% |
|
||||||
|
| Cash Runway | >3 months | 1-3 months | <1 month |
|
||||||
|
|
||||||
|
All thresholds are configurable in `references/metrics-guide.md`.
|
||||||
|
|
||||||
|
## File Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
finance-ops/
|
||||||
|
├── README.md # This file
|
||||||
|
├── SKILL.md # Claude Code skill definition
|
||||||
|
├── requirements.txt # Python dependencies
|
||||||
|
├── .env.example # Environment template
|
||||||
|
├── scripts/
|
||||||
|
│ ├── cfo-analyzer.py # Main CFO briefing generator
|
||||||
|
│ └── scenario-modeler.py # Base/bull/bear projections
|
||||||
|
└── references/
|
||||||
|
├── metrics-guide.md # KPI thresholds and benchmarks
|
||||||
|
├── quickbooks-formats.md # QB export format specs
|
||||||
|
├── rates.md # Developer productivity rates
|
||||||
|
├── org-overhead.md # Organizational overhead factors
|
||||||
|
├── team-cost.md # Full team cost multipliers
|
||||||
|
├── claude-roi.md # AI ROI calculation method
|
||||||
|
└── output-template.md # Cost estimate output format
|
||||||
|
```
|
||||||
|
|
||||||
|
## Customization
|
||||||
|
|
||||||
|
**Adjust for your business size:** Edit `references/metrics-guide.md` to change the revenue range and benchmark thresholds. A $500K startup has different healthy ranges than a $10M agency.
|
||||||
|
|
||||||
|
**Add report types:** The file detection in `cfo-analyzer.py` uses header scanning. Add new patterns to `detect_file_type()` for custom QB report layouts.
|
||||||
|
|
||||||
|
**Change categories:** Expense categorization keywords are in `compute_kpis()`. Adjust the keyword lists to match your chart of accounts.
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
MIT
|
||||||
120
finance-ops/SKILL.md
Normal file
120
finance-ops/SKILL.md
Normal file
|
|
@ -0,0 +1,120 @@
|
||||||
|
---
|
||||||
|
name: finance-ops
|
||||||
|
description: "AI-powered financial analysis suite. Generates executive CFO briefings from QuickBooks exports (P&L, Balance Sheet, General Ledger, Cash Flow, etc.) with anomaly detection, burn rate, runway analysis, and scenario modeling. Also estimates codebase development costs with organizational overhead and AI ROI analysis. Triggers on: 'CFO briefing', 'financial analysis', 'cost briefing', 'expense review', 'runway analysis', 'burn rate', 'cost estimate', 'how much would this cost to build', 'development cost', 'Claude ROI'."
|
||||||
|
---
|
||||||
|
|
||||||
|
# AI Finance Ops
|
||||||
|
|
||||||
|
Two tools: CFO Briefing Generator and Codebase Cost Estimator.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Tool 1: CFO Briefing Generator
|
||||||
|
|
||||||
|
Generate executive financial summaries from QuickBooks exports.
|
||||||
|
|
||||||
|
### Workflow
|
||||||
|
|
||||||
|
#### 1. Ingest Files
|
||||||
|
|
||||||
|
Place QuickBooks export files (CSV, XLSX, XLS) in a working directory. Accepted report types (any subset works — P&L alone is sufficient):
|
||||||
|
|
||||||
|
- **P&L Summary** — Revenue, COGS, expenses, net income (MOST IMPORTANT)
|
||||||
|
- **P&L by Customer** — Revenue breakdown by client
|
||||||
|
- **P&L Detail** — Transaction-level detail (XLSX)
|
||||||
|
- **Balance Sheet** — Assets, liabilities, equity
|
||||||
|
- **General Ledger** — All account transactions
|
||||||
|
- **Expenses by Vendor** — Vendor-level expense breakdown
|
||||||
|
- **Transaction List by Vendor** — Detailed vendor transactions
|
||||||
|
- **Bill Payments** — AP payment history
|
||||||
|
- **Cash Flow Statement** — Operating/investing/financing flows (XLSX)
|
||||||
|
- **Account List** — Chart of accounts
|
||||||
|
|
||||||
|
#### 2. Run Analysis
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python3 scripts/cfo-analyzer.py --input ./data/uploads/ [--period YYYY-MM]
|
||||||
|
```
|
||||||
|
|
||||||
|
Options:
|
||||||
|
- `--input DIR` — Directory with QB exports
|
||||||
|
- `--period YYYY-MM` — Override period label (default: auto-detected from files)
|
||||||
|
- `--history DIR` — History directory for MoM comparison (default: `./data/history/`)
|
||||||
|
- `--no-history` — Skip saving to history
|
||||||
|
|
||||||
|
The script:
|
||||||
|
1. Auto-detects file types by scanning headers
|
||||||
|
2. Parses each file into structured data
|
||||||
|
3. Computes all KPIs (see `references/metrics-guide.md` for definitions and healthy ranges)
|
||||||
|
4. Loads prior period from history for MoM comparison
|
||||||
|
5. Saves current period to history
|
||||||
|
6. Outputs formatted executive summary to stdout
|
||||||
|
|
||||||
|
#### 3. Scenario Modeling (Optional)
|
||||||
|
|
||||||
|
After running the CFO analysis, model base/bull/bear scenarios:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python3 scripts/scenario-modeler.py --input ./data/financial-latest.json
|
||||||
|
```
|
||||||
|
|
||||||
|
This generates 12-month projections for:
|
||||||
|
- **Base case** — current trajectory continues
|
||||||
|
- **Bull case** — growth targets met (new product revenue + new clients)
|
||||||
|
- **Bear case** — lose top clients
|
||||||
|
|
||||||
|
#### 4. Deliver Summary
|
||||||
|
|
||||||
|
The script outputs a formatted briefing with emoji status indicators (🟢🟡🔴), suitable for Slack, email, or any messaging surface.
|
||||||
|
|
||||||
|
### File Format Details
|
||||||
|
|
||||||
|
See `references/quickbooks-formats.md` for expected CSV/XLSX column formats and detection heuristics.
|
||||||
|
|
||||||
|
### Metric Thresholds
|
||||||
|
|
||||||
|
See `references/metrics-guide.md` for healthy ranges, red/yellow/green thresholds, and benchmark context. Adjust thresholds for your business size and type.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Tool 2: Codebase Cost Estimator
|
||||||
|
|
||||||
|
Estimate full development cost of a codebase.
|
||||||
|
|
||||||
|
### Workflow
|
||||||
|
|
||||||
|
#### Step 1: Analyze the Codebase
|
||||||
|
|
||||||
|
Read the entire codebase. Catalog total lines of code by language/type, architectural complexity, advanced features, testing coverage, and documentation quality.
|
||||||
|
|
||||||
|
#### Step 2: Calculate Development Hours
|
||||||
|
|
||||||
|
Apply productivity rates from `references/rates.md`. Calculate base hours per code type, then apply overhead multipliers for architecture, debugging, review, docs, integration, and learning curve.
|
||||||
|
|
||||||
|
#### Step 3: Research Market Rates
|
||||||
|
|
||||||
|
Use web search to find current hourly rates for the relevant specializations. Build a rate table with low / median / high for the project's tech stack.
|
||||||
|
|
||||||
|
#### Step 4: Calculate Organizational Overhead
|
||||||
|
|
||||||
|
Convert raw dev hours to calendar time using efficiency factors from `references/org-overhead.md`. Show estimates across company types (Solo through Enterprise).
|
||||||
|
|
||||||
|
#### Step 5: Calculate Full Team Cost
|
||||||
|
|
||||||
|
Apply supporting role ratios and team multipliers from `references/team-cost.md`. Show role-by-role breakdown, plus summary across all company stages.
|
||||||
|
|
||||||
|
#### Step 6: Generate Cost Estimate
|
||||||
|
|
||||||
|
Output the full estimate using the template in `references/output-template.md`. Include all sections: codebase metrics, dev hours, calendar time, market rates, engineering cost, full team cost, grand total summary, and assumptions.
|
||||||
|
|
||||||
|
#### Step 7: AI ROI Analysis (Optional)
|
||||||
|
|
||||||
|
If the codebase was built with AI assistance, calculate value per AI hour using `references/claude-roi.md`. Determine active hours via git history clustering, calculate speed multiplier vs human developer, and compute cost savings and ROI.
|
||||||
|
|
||||||
|
### Key Principles
|
||||||
|
|
||||||
|
- Present professionally, suitable for stakeholders
|
||||||
|
- Include confidence level (low/medium/high) and key assumptions
|
||||||
|
- Highlight highest-complexity areas that drive cost
|
||||||
|
- Always show ranges (low/avg/high), never a single number
|
||||||
|
- Search for CURRENT year market rates, don't use stale data
|
||||||
98
finance-ops/references/claude-roi.md
Normal file
98
finance-ops/references/claude-roi.md
Normal file
|
|
@ -0,0 +1,98 @@
|
||||||
|
# Claude ROI — Value Per Claude Hour
|
||||||
|
|
||||||
|
The most important metric for AI-assisted development. Answers: "What did each hour of Claude's actual working time produce?"
|
||||||
|
|
||||||
|
## Step 1: Determine Actual Claude Clock Time
|
||||||
|
|
||||||
|
### Method 1: Git History (preferred)
|
||||||
|
|
||||||
|
Run `git log --format="%ai" | sort` to get all commit timestamps. Then:
|
||||||
|
1. First commit = project start
|
||||||
|
2. Last commit = current state
|
||||||
|
3. Total calendar days = last - first
|
||||||
|
4. Cluster commits into sessions: group commits within 4-hour windows as one session
|
||||||
|
5. Estimate session duration using commit density:
|
||||||
|
|
||||||
|
| Commits in Window | Estimated Session Duration |
|
||||||
|
|-------------------|---------------------------|
|
||||||
|
| 1-2 commits | ~1 hour |
|
||||||
|
| 3-5 commits | ~2 hours |
|
||||||
|
| 6-10 commits | ~3 hours |
|
||||||
|
| 10+ commits | ~4 hours |
|
||||||
|
|
||||||
|
### Method 2: File Modification Timestamps (no git)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
find . -name "*.ts" -o -name "*.swift" -o -name "*.py" | xargs stat -f "%Sm" | sort
|
||||||
|
```
|
||||||
|
Apply same session clustering logic.
|
||||||
|
|
||||||
|
### Method 3: Fallback Estimate (no timestamps)
|
||||||
|
|
||||||
|
Assume Claude writes 200-500 lines of meaningful code per hour (much faster than humans).
|
||||||
|
|
||||||
|
`Claude active hours ≈ Total LOC ÷ 350`
|
||||||
|
|
||||||
|
## Step 2: Calculate Value per Claude Hour
|
||||||
|
|
||||||
|
`Value per Claude Hour = Total Code Value (from team cost) ÷ Estimated Claude Active Hours`
|
||||||
|
|
||||||
|
Calculate across scenarios:
|
||||||
|
|
||||||
|
| Code Value Scenario | Claude Hours (est.) | Value per Claude Hour |
|
||||||
|
|--------------------|--------------------|-----------------------|
|
||||||
|
| Engineering only (avg) | [X] hrs | $[X,XXX]/hr |
|
||||||
|
| Full team equivalent (Growth Co) | [X] hrs | $[X,XXX]/hr |
|
||||||
|
| Full team equivalent (Enterprise) | [X] hrs | $[X,XXX]/hr |
|
||||||
|
|
||||||
|
## Step 3: Claude Efficiency vs. Human Developer
|
||||||
|
|
||||||
|
**Speed Multiplier:**
|
||||||
|
`Speed Multiplier = Human Dev Hours ÷ Claude Active Hours`
|
||||||
|
|
||||||
|
Example: Human needs 500 hours, Claude did it in 20 hours → 25x faster
|
||||||
|
|
||||||
|
**Cost Efficiency:**
|
||||||
|
```
|
||||||
|
Human Cost = Human Hours × $150/hr
|
||||||
|
Claude Cost = Subscription ($20-200/month) + API costs
|
||||||
|
Savings = Human Cost - Claude Cost
|
||||||
|
ROI = Savings ÷ Claude Cost
|
||||||
|
```
|
||||||
|
|
||||||
|
## Output Format
|
||||||
|
|
||||||
|
```
|
||||||
|
### Claude ROI Analysis
|
||||||
|
|
||||||
|
Project Timeline:
|
||||||
|
- First commit / project start: [date]
|
||||||
|
- Latest commit: [date]
|
||||||
|
- Total calendar time: [X] days ([X] weeks)
|
||||||
|
|
||||||
|
Claude Active Hours Estimate:
|
||||||
|
- Total sessions identified: [X] sessions
|
||||||
|
- Estimated active hours: [X] hours
|
||||||
|
- Method: [git clustering / file timestamps / LOC estimate]
|
||||||
|
|
||||||
|
Value per Claude Hour:
|
||||||
|
|
||||||
|
| Value Basis | Total Value | Claude Hours | $/Claude Hour |
|
||||||
|
|-------------|-------------|--------------|---------------|
|
||||||
|
| Engineering only | $[X] | [X] hrs | $[X,XXX]/hr |
|
||||||
|
| Full team (Growth Co) | $[X] | [X] hrs | $[X,XXX]/hr |
|
||||||
|
|
||||||
|
Speed vs. Human Developer:
|
||||||
|
- Estimated human hours for same work: [X] hours
|
||||||
|
- Claude active hours: [X] hours
|
||||||
|
- Speed multiplier: [X]x (Claude was [X]x faster)
|
||||||
|
|
||||||
|
Cost Comparison:
|
||||||
|
- Human developer cost: $[X] (at $150/hr avg)
|
||||||
|
- Estimated Claude cost: $[X] (subscription + API)
|
||||||
|
- Net savings: $[X]
|
||||||
|
- ROI: [X]x (every $1 spent on Claude produced $[X] of value)
|
||||||
|
|
||||||
|
The headline: Claude worked ~[X] hours and produced $[X] in professional
|
||||||
|
development value — roughly $[X,XXX] per Claude hour.
|
||||||
|
```
|
||||||
62
finance-ops/references/metrics-guide.md
Normal file
62
finance-ops/references/metrics-guide.md
Normal file
|
|
@ -0,0 +1,62 @@
|
||||||
|
# Financial Metrics Guide
|
||||||
|
|
||||||
|
Healthy ranges for B2B services/agency businesses. Adjust thresholds for your revenue range.
|
||||||
|
|
||||||
|
## Revenue & Profitability
|
||||||
|
|
||||||
|
| Metric | 🟢 Green | 🟡 Yellow | 🔴 Red |
|
||||||
|
|---|---|---|---|
|
||||||
|
| TTM Revenue Growth | >10% YoY | 0-10% YoY | Declining |
|
||||||
|
| Net Income Margin | >10% | 0-10% | Negative |
|
||||||
|
| Gross Margin | >60% | 45-60% | <45% |
|
||||||
|
| Revenue per Employee | >$150K | $100-150K | <$100K |
|
||||||
|
| Client Concentration | No client >15% | One client 15-25% | One client >25% |
|
||||||
|
|
||||||
|
## Cost Structure (% of Revenue)
|
||||||
|
|
||||||
|
| Metric | 🟢 Green | 🟡 Yellow | 🔴 Red |
|
||||||
|
|---|---|---|---|
|
||||||
|
| Total People Costs | <65% | 65-75% | >75% |
|
||||||
|
| COGS (direct delivery) | <50% | 50-60% | >60% |
|
||||||
|
| Sales & Marketing | 10-20% | 20-30% | >30% |
|
||||||
|
| G&A | <15% | 15-25% | >25% |
|
||||||
|
| Tool/Subscription Costs | <8% | 8-12% | >12% |
|
||||||
|
| Contractor % of People | <30% | 30-50% | >50% |
|
||||||
|
|
||||||
|
## Cash & Liquidity
|
||||||
|
|
||||||
|
| Metric | 🟢 Green | 🟡 Yellow | 🔴 Red |
|
||||||
|
|---|---|---|---|
|
||||||
|
| Cash Runway (months) | >3 months | 1-3 months | <1 month |
|
||||||
|
| AR Days Outstanding | <45 days | 45-60 days | >60 days |
|
||||||
|
| AP Days Outstanding | <45 days | 45-60 days | >60 days |
|
||||||
|
| Current Ratio (CA/CL) | >1.5 | 1.0-1.5 | <1.0 |
|
||||||
|
|
||||||
|
## Debt & Obligations
|
||||||
|
|
||||||
|
| Metric | 🟢 Green | 🟡 Yellow | 🔴 Red |
|
||||||
|
|---|---|---|---|
|
||||||
|
| Debt-to-Revenue | <0.5x | 0.5-1.0x | >1.0x |
|
||||||
|
| Debt Service Coverage | >2.0x | 1.0-2.0x | <1.0x |
|
||||||
|
| Interest as % of Revenue | <3% | 3-5% | >5% |
|
||||||
|
|
||||||
|
## Anomaly Detection
|
||||||
|
|
||||||
|
Flag these automatically:
|
||||||
|
- Any single expense line item >$5,000 not in payroll/rent/amortization
|
||||||
|
- Any category with >10% MoM increase
|
||||||
|
- Any new vendor with >$2,000 spend
|
||||||
|
- Owner expenses >$10K/month
|
||||||
|
- Recruiting spend >$8K/month sustained
|
||||||
|
- Revenue from any single client >20% of total
|
||||||
|
|
||||||
|
## Scaling Thresholds by Revenue
|
||||||
|
|
||||||
|
These benchmarks shift as you grow:
|
||||||
|
|
||||||
|
| Revenue Range | Target Gross Margin | Target People % | Target G&A % |
|
||||||
|
|---|---|---|---|
|
||||||
|
| <$1M | >50% | <70% | <20% |
|
||||||
|
| $1-3M | >55% | <65% | <18% |
|
||||||
|
| $3-10M | >60% | <60% | <15% |
|
||||||
|
| $10M+ | >65% | <55% | <12% |
|
||||||
33
finance-ops/references/org-overhead.md
Normal file
33
finance-ops/references/org-overhead.md
Normal file
|
|
@ -0,0 +1,33 @@
|
||||||
|
# Organizational Overhead
|
||||||
|
|
||||||
|
## Weekly Time Allocation (Typical Developer)
|
||||||
|
|
||||||
|
| Activity | Hours/Week | Notes |
|
||||||
|
|----------|-----------|-------|
|
||||||
|
| Pure coding time | 20-25 | Actual focused development |
|
||||||
|
| Daily standups | 1.25 | 15 min × 5 days |
|
||||||
|
| Weekly team sync | 1-2 | All-hands, team meetings |
|
||||||
|
| 1:1s with manager | 0.5-1 | Weekly or biweekly |
|
||||||
|
| Sprint planning/retro | 1-2 | Per week average |
|
||||||
|
| Code reviews (giving) | 2-3 | Reviewing teammates' work |
|
||||||
|
| Slack/email/async | 3-5 | Communication overhead |
|
||||||
|
| Context switching | 2-4 | Interruptions, task switching |
|
||||||
|
| Ad-hoc meetings | 1-2 | Unplanned discussions |
|
||||||
|
| Admin/HR/tooling | 1-2 | Timesheets, tools, access |
|
||||||
|
|
||||||
|
## Coding Efficiency by Company Type
|
||||||
|
|
||||||
|
| Company Type | Efficiency | Coding Hrs/Week |
|
||||||
|
|-------------|-----------|-----------------|
|
||||||
|
| Startup (lean) | 60-70% | 24-28 |
|
||||||
|
| Growth company | 50-60% | 20-24 |
|
||||||
|
| Enterprise | 40-50% | 16-20 |
|
||||||
|
| Large bureaucracy | 30-40% | 12-16 |
|
||||||
|
|
||||||
|
## Calendar Time Formula
|
||||||
|
|
||||||
|
```
|
||||||
|
Calendar Weeks = Raw Dev Hours ÷ (40 × Efficiency Factor)
|
||||||
|
```
|
||||||
|
|
||||||
|
Example: 3,288 raw hours at 50% efficiency = 3,288 ÷ 20 = 164 weeks (~3.2 years)
|
||||||
120
finance-ops/references/output-template.md
Normal file
120
finance-ops/references/output-template.md
Normal file
|
|
@ -0,0 +1,120 @@
|
||||||
|
# Output Template
|
||||||
|
|
||||||
|
Use this structure for the final estimate. Replace all [X] placeholders with calculated values.
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
## [Project Name] - Development Cost Estimate
|
||||||
|
|
||||||
|
Analysis Date: [Current Date]
|
||||||
|
Codebase Version: [From project status/README]
|
||||||
|
|
||||||
|
### Codebase Metrics
|
||||||
|
|
||||||
|
- Total Lines of Code: [number]
|
||||||
|
- [Language 1]: [number] lines
|
||||||
|
- [Language 2]: [number] lines
|
||||||
|
- Tests: [number] lines
|
||||||
|
- Documentation: [number] lines
|
||||||
|
|
||||||
|
- Complexity Factors:
|
||||||
|
- Advanced frameworks: [list key ones]
|
||||||
|
- System-level programming: [if applicable]
|
||||||
|
- GPU programming: [if applicable]
|
||||||
|
- Third-party integrations: [list]
|
||||||
|
|
||||||
|
### Development Time Estimate
|
||||||
|
|
||||||
|
Base Development Hours: [number] hours
|
||||||
|
- [Phase/Module 1]: [hours] hours
|
||||||
|
- [Phase/Module 2]: [hours] hours
|
||||||
|
- [Phase/Module 3]: [hours] hours
|
||||||
|
|
||||||
|
Overhead Multipliers:
|
||||||
|
- Architecture & Design: +[X]% ([hours] hours)
|
||||||
|
- Debugging & Troubleshooting: +[X]% ([hours] hours)
|
||||||
|
- Code Review & Refactoring: +[X]% ([hours] hours)
|
||||||
|
- Documentation: +[X]% ([hours] hours)
|
||||||
|
- Integration & Testing: +[X]% ([hours] hours)
|
||||||
|
- Learning Curve: +[X]% ([hours] hours)
|
||||||
|
|
||||||
|
Total Estimated Hours: [number] hours
|
||||||
|
|
||||||
|
### Realistic Calendar Time (with Organizational Overhead)
|
||||||
|
|
||||||
|
| Company Type | Efficiency | Coding Hrs/Week | Calendar Weeks | Calendar Time |
|
||||||
|
|--------------|------------|-----------------|----------------|---------------|
|
||||||
|
| Solo/Startup (lean) | 65% | 26 hrs | [X] weeks | ~[X] months |
|
||||||
|
| Growth Company | 55% | 22 hrs | [X] weeks | ~[X] years |
|
||||||
|
| Enterprise | 45% | 18 hrs | [X] weeks | ~[X] years |
|
||||||
|
| Large Bureaucracy | 35% | 14 hrs | [X] weeks | ~[X] years |
|
||||||
|
|
||||||
|
### Market Rate Research
|
||||||
|
|
||||||
|
Senior Developer Rates ([current year]):
|
||||||
|
- Low end: $[X]/hour (remote, mid-level market)
|
||||||
|
- Average: $[X]/hour (standard US market)
|
||||||
|
- High end: $[X]/hour (SF Bay Area, NYC, specialized)
|
||||||
|
|
||||||
|
Recommended Rate: $[X]/hour
|
||||||
|
*Rationale*: [Why this rate for this project's tech stack]
|
||||||
|
|
||||||
|
### Total Cost Estimate
|
||||||
|
|
||||||
|
| Scenario | Hourly Rate | Total Hours | Total Cost |
|
||||||
|
|----------|-------------|-------------|------------|
|
||||||
|
| Low-end | $[X] | [hours] | $[X,XXX] |
|
||||||
|
| Average | $[X] | [hours] | $[X,XXX] |
|
||||||
|
| High-end | $[X] | [hours] | $[X,XXX] |
|
||||||
|
|
||||||
|
Recommended Estimate (Engineering Only): $[X,XXX] - $[X,XXX]
|
||||||
|
|
||||||
|
### Full Team Cost (All Roles)
|
||||||
|
|
||||||
|
| Company Stage | Team Multiplier | Engineering Cost | Full Team Cost |
|
||||||
|
|---------------|-----------------|------------------|----------------|
|
||||||
|
| Solo/Founder | 1.0x | $[X] | $[X] |
|
||||||
|
| Lean Startup | 1.45x | $[X] | $[X] |
|
||||||
|
| Growth Company | 2.2x | $[X] | $[X] |
|
||||||
|
| Enterprise | 2.65x | $[X] | $[X] |
|
||||||
|
|
||||||
|
Role Breakdown (Growth Company Example):
|
||||||
|
|
||||||
|
| Role | Hours | Rate | Cost |
|
||||||
|
|------|-------|------|------|
|
||||||
|
| Engineering | [X] hrs | $[X]/hr | $[X] |
|
||||||
|
| Product Management | [X] hrs | $[X]/hr | $[X] |
|
||||||
|
| UX/UI Design | [X] hrs | $[X]/hr | $[X] |
|
||||||
|
| Engineering Management | [X] hrs | $[X]/hr | $[X] |
|
||||||
|
| QA/Testing | [X] hrs | $[X]/hr | $[X] |
|
||||||
|
| Project Management | [X] hrs | $[X]/hr | $[X] |
|
||||||
|
| Technical Writing | [X] hrs | $[X]/hr | $[X] |
|
||||||
|
| DevOps/Platform | [X] hrs | $[X]/hr | $[X] |
|
||||||
|
| TOTAL | [X] hrs | | $[X] |
|
||||||
|
|
||||||
|
### Grand Total Summary
|
||||||
|
|
||||||
|
| Metric | Solo | Lean Startup | Growth Co | Enterprise |
|
||||||
|
|--------|------|--------------|-----------|------------|
|
||||||
|
| Calendar Time | [X] | [X] | [X] | [X] |
|
||||||
|
| Total Human Hours | [X] | [X] | [X] | [X] |
|
||||||
|
| Total Cost | $[X] | $[X] | $[X] | $[X] |
|
||||||
|
|
||||||
|
### Assumptions
|
||||||
|
|
||||||
|
1. Rates based on US market averages ([year])
|
||||||
|
2. Full-time equivalent allocation for all roles
|
||||||
|
3. Includes complete implementation of [scope]
|
||||||
|
4. Does not include:
|
||||||
|
- Marketing & sales
|
||||||
|
- Legal & compliance
|
||||||
|
- Office/equipment
|
||||||
|
- Hosting/infrastructure
|
||||||
|
- Ongoing maintenance post-launch
|
||||||
|
|
||||||
|
### Comparison: AI-Assisted Development
|
||||||
|
|
||||||
|
Estimated time savings with Claude Code: [X]%
|
||||||
|
Effective hourly rate with AI assistance: ~$[X]/hour equivalent productivity
|
||||||
|
```
|
||||||
|
|
||||||
|
Present professionally, suitable for stakeholders. Include confidence level (low/medium/high) and key assumptions. Highlight highest-complexity areas that drive cost.
|
||||||
81
finance-ops/references/quickbooks-formats.md
Normal file
81
finance-ops/references/quickbooks-formats.md
Normal file
|
|
@ -0,0 +1,81 @@
|
||||||
|
# QuickBooks Export Formats
|
||||||
|
|
||||||
|
## File Detection Heuristics
|
||||||
|
|
||||||
|
The analyzer identifies report type by scanning the first 10 lines for signature patterns:
|
||||||
|
|
||||||
|
| Report Type | Detection Pattern |
|
||||||
|
|---|---|
|
||||||
|
| P&L Summary | "Profit and Loss" in header, two-column (account, total) |
|
||||||
|
| P&L by Customer | "Profit and Loss by Customer" in header, multi-column with customer names |
|
||||||
|
| P&L Detail | "Profit and Loss Detail" in sheet name or header; columns: Date, Transaction Type, Num, Name, Class, Memo, Split, Amount, Balance |
|
||||||
|
| Balance Sheet | "Balance Sheet" in header |
|
||||||
|
| Cash Flow Statement | "Statement of Cash Flows" in sheet name or header; monthly columns |
|
||||||
|
| General Ledger | "General Ledger" in header; columns include Date, Account, Debit, Credit |
|
||||||
|
| Expenses by Vendor | "Expenses by Vendor" in header |
|
||||||
|
| Transaction List by Vendor | "Transaction List by Vendor" in header |
|
||||||
|
| Bill Payments | "Bill Payment" in header or transaction types dominated by "Bill Payment" |
|
||||||
|
| Account List | "Account List" in header; columns include Account, Type, Balance |
|
||||||
|
|
||||||
|
## P&L Summary CSV Format
|
||||||
|
|
||||||
|
```
|
||||||
|
Profit and Loss,
|
||||||
|
Your Company Name,
|
||||||
|
"January, 2025-December, 2025",
|
||||||
|
<blank line>
|
||||||
|
Distribution account,Total
|
||||||
|
Income,
|
||||||
|
40000 Revenue,
|
||||||
|
40010 Consulting Revenue,"250,000.00"
|
||||||
|
...
|
||||||
|
Total for 40000 Revenue,"$2,000,000.00"
|
||||||
|
Total for Income,"$2,000,000.00"
|
||||||
|
Cost of Goods Sold,
|
||||||
|
...
|
||||||
|
Total for Cost of Goods Sold,"$900,000.00"
|
||||||
|
Gross Profit,"$1,100,000.00"
|
||||||
|
Expenses,
|
||||||
|
...
|
||||||
|
Total for Expenses,"$850,000.00"
|
||||||
|
Net Operating Income,"$250,000.00"
|
||||||
|
Other Income,
|
||||||
|
...
|
||||||
|
Net Income,"$200,000.00"
|
||||||
|
```
|
||||||
|
|
||||||
|
Key parsing rules:
|
||||||
|
- Dollar values may have `$` prefix, commas, quotes, negative in parens or with `-`
|
||||||
|
- "Total for X" lines aggregate their parent category
|
||||||
|
- Indentation (spaces) indicates hierarchy depth
|
||||||
|
- Account numbers (5 digits) prefix account names
|
||||||
|
|
||||||
|
## P&L by Customer CSV Format
|
||||||
|
|
||||||
|
Same structure as P&L Summary but with customer names as column headers. The last column is "Total". Revenue and COGS broken down per customer.
|
||||||
|
|
||||||
|
## P&L Detail XLSX Format
|
||||||
|
|
||||||
|
Columns: (blank), Date, Transaction Type, Num, Name, Class, Memo/Description, Split, Amount, Balance
|
||||||
|
|
||||||
|
- Hierarchical account names in column A
|
||||||
|
- Transaction rows have Date populated
|
||||||
|
- Subtotal rows show "Total for [Account]"
|
||||||
|
|
||||||
|
## Cash Flow Statement XLSX Format
|
||||||
|
|
||||||
|
Monthly columns (e.g., "Feb 12-28, 2025", "Mar 2025", ..., "Total")
|
||||||
|
Sections: OPERATING ACTIVITIES, INVESTING ACTIVITIES, FINANCING ACTIVITIES
|
||||||
|
Values stored as Excel formulas (e.g., `=-236705.50`) — use `data_only=False` and parse formula strings, or open with `data_only=True`.
|
||||||
|
|
||||||
|
Note: openpyxl with `data_only=True` may return None for formula cells unless the file was last saved by Excel. Parse formula strings as fallback: strip `=` prefix, evaluate simple numeric expressions.
|
||||||
|
|
||||||
|
## Common Parsing Pitfalls
|
||||||
|
|
||||||
|
1. **Comma-separated thousands in quotes**: `"1,250,000.00"` — strip commas before float conversion
|
||||||
|
2. **Dollar signs**: `"$2,000,000.00"` — strip `$`
|
||||||
|
3. **Negative values**: Both `-50,000.00` and `($50,000.00)` formats
|
||||||
|
4. **Empty rows**: QB exports include blank lines between sections
|
||||||
|
5. **Header rows**: First 3-5 rows are company name, report name, date range
|
||||||
|
6. **"(deleted)" in names**: Customer/vendor names may include "(deleted)"
|
||||||
|
7. **Formula cells in XLSX**: May need formula string parsing as fallback
|
||||||
29
finance-ops/references/rates.md
Normal file
29
finance-ops/references/rates.md
Normal file
|
|
@ -0,0 +1,29 @@
|
||||||
|
# Hourly Productivity Rates by Code Type
|
||||||
|
|
||||||
|
## Lines Per Hour (Senior Developer, 5+ years)
|
||||||
|
|
||||||
|
| Code Type | Lines/Hour | Notes |
|
||||||
|
|-----------|-----------|-------|
|
||||||
|
| Simple CRUD/UI | 30-50 | Forms, basic views, standard patterns |
|
||||||
|
| Complex business logic | 20-30 | State machines, workflows, validations |
|
||||||
|
| GPU/Metal/shader programming | 10-20 | Metal, CUDA, OpenGL, compute shaders |
|
||||||
|
| Native C/C++ interop | 10-20 | FFI, bridging headers, memory management |
|
||||||
|
| Video/audio processing | 10-15 | AVFoundation, CoreMedia, codecs |
|
||||||
|
| System extensions/plugins | 8-12 | Kernel extensions, DAL plugins, drivers |
|
||||||
|
| Comprehensive tests | 25-40 | Unit, integration, snapshot tests |
|
||||||
|
| Infrastructure/DevOps | 15-25 | CI/CD, Docker, Terraform, scripts |
|
||||||
|
| API integrations | 20-30 | REST, GraphQL, WebSocket clients |
|
||||||
|
| Data processing/ETL | 15-25 | Parsers, transformers, pipelines |
|
||||||
|
|
||||||
|
## Overhead Multipliers
|
||||||
|
|
||||||
|
| Activity | % of Coding Time |
|
||||||
|
|----------|-----------------|
|
||||||
|
| Architecture & design | +15-20% |
|
||||||
|
| Debugging & troubleshooting | +25-30% |
|
||||||
|
| Code review & refactoring | +10-15% |
|
||||||
|
| Documentation | +10-15% |
|
||||||
|
| Integration & testing | +20-25% |
|
||||||
|
| Learning curve (new frameworks) | +10-20% |
|
||||||
|
|
||||||
|
Total overhead typically: 1.9x-2.25x raw coding hours.
|
||||||
35
finance-ops/references/team-cost.md
Normal file
35
finance-ops/references/team-cost.md
Normal file
|
|
@ -0,0 +1,35 @@
|
||||||
|
# Full Team Cost Calculation
|
||||||
|
|
||||||
|
## Supporting Role Ratios
|
||||||
|
|
||||||
|
| Role | Ratio to Eng Hours | Typical Rate | Notes |
|
||||||
|
|------|-------------------|--------------|-------|
|
||||||
|
| Product Management | 0.25-0.40x | $125-200/hr | PRDs, roadmap, stakeholder mgmt |
|
||||||
|
| UX/UI Design | 0.20-0.35x | $100-175/hr | Wireframes, mockups, design systems |
|
||||||
|
| Engineering Management | 0.12-0.20x | $150-225/hr | 1:1s, hiring, performance, strategy |
|
||||||
|
| QA/Testing | 0.15-0.25x | $75-125/hr | Test plans, manual testing, automation |
|
||||||
|
| Project/Program Management | 0.08-0.15x | $100-150/hr | Schedules, dependencies, status |
|
||||||
|
| Technical Writing | 0.05-0.10x | $75-125/hr | User docs, API docs, internal docs |
|
||||||
|
| DevOps/Platform | 0.10-0.20x | $125-200/hr | CI/CD, infra, deployments |
|
||||||
|
|
||||||
|
## Team Composition by Company Stage
|
||||||
|
|
||||||
|
| Stage | PM | Design | EM | QA | PgM | Docs | DevOps |
|
||||||
|
|-------|-----|--------|-----|-----|------|------|--------|
|
||||||
|
| Solo/Founder | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
|
||||||
|
| Lean Startup | 15% | 15% | 5% | 5% | 0% | 0% | 5% |
|
||||||
|
| Growth Company | 30% | 25% | 15% | 20% | 10% | 5% | 15% |
|
||||||
|
| Enterprise | 40% | 35% | 20% | 25% | 15% | 10% | 20% |
|
||||||
|
|
||||||
|
## Full Team Multiplier
|
||||||
|
|
||||||
|
| Stage | Multiplier |
|
||||||
|
|-------|-----------|
|
||||||
|
| Solo/Founder | 1.0x (just engineering) |
|
||||||
|
| Lean Startup | ~1.45x engineering cost |
|
||||||
|
| Growth Company | ~2.2x engineering cost |
|
||||||
|
| Enterprise | ~2.65x engineering cost |
|
||||||
|
|
||||||
|
Formula: `Full Team Cost = Engineering Cost × Team Multiplier`
|
||||||
|
|
||||||
|
Example: $500K engineering cost at Growth Company = $500K × 2.2 = $1.1M total team cost
|
||||||
3
finance-ops/requirements.txt
Normal file
3
finance-ops/requirements.txt
Normal file
|
|
@ -0,0 +1,3 @@
|
||||||
|
# CFO Briefing Analyzer
|
||||||
|
pandas>=2.0.0
|
||||||
|
openpyxl>=3.1.0
|
||||||
821
finance-ops/scripts/cfo-analyzer.py
Normal file
821
finance-ops/scripts/cfo-analyzer.py
Normal file
|
|
@ -0,0 +1,821 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
CFO Briefing Analyzer — Parse QuickBooks exports and generate executive financial summaries.
|
||||||
|
Outputs formatted briefing to stdout. Stores history for MoM comparison.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python3 cfo-analyzer.py --input ./data/uploads/
|
||||||
|
python3 cfo-analyzer.py --input ./data/uploads/ --period 2025-01
|
||||||
|
python3 cfo-analyzer.py --input ./data/uploads/ --no-history
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import sys
|
||||||
|
from datetime import datetime
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
import pandas as pd
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Helpers
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def parse_dollar(val: Any) -> float:
|
||||||
|
"""Convert QB dollar string to float. Handles $, commas, parens for negatives, formula strings."""
|
||||||
|
if val is None:
|
||||||
|
return 0.0
|
||||||
|
if isinstance(val, (int, float)):
|
||||||
|
return float(val)
|
||||||
|
s = str(val).strip()
|
||||||
|
if not s or s == '-' or s.lower() == 'none':
|
||||||
|
return 0.0
|
||||||
|
# Handle Excel formula strings like =-236705.50
|
||||||
|
if s.startswith('='):
|
||||||
|
s = s[1:]
|
||||||
|
try:
|
||||||
|
return float(s)
|
||||||
|
except ValueError:
|
||||||
|
return 0.0
|
||||||
|
neg = False
|
||||||
|
if s.startswith('(') and s.endswith(')'):
|
||||||
|
neg = True
|
||||||
|
s = s[1:-1]
|
||||||
|
s = s.replace('$', '').replace(',', '').replace('"', '').strip()
|
||||||
|
if not s:
|
||||||
|
return 0.0
|
||||||
|
try:
|
||||||
|
v = float(s)
|
||||||
|
return -v if neg else v
|
||||||
|
except ValueError:
|
||||||
|
return 0.0
|
||||||
|
|
||||||
|
|
||||||
|
def status_emoji(value: float, green_range: tuple, yellow_range: tuple) -> str:
|
||||||
|
"""Return 🟢/🟡/🔴 based on thresholds. green_range/yellow_range are (min, max) inclusive."""
|
||||||
|
gmin, gmax = green_range
|
||||||
|
ymin, ymax = yellow_range
|
||||||
|
if gmin <= value <= gmax:
|
||||||
|
return "🟢"
|
||||||
|
if ymin <= value <= ymax:
|
||||||
|
return "🟡"
|
||||||
|
return "🔴"
|
||||||
|
|
||||||
|
|
||||||
|
def pct(part: float, whole: float) -> float:
|
||||||
|
return (part / whole * 100) if whole else 0.0
|
||||||
|
|
||||||
|
|
||||||
|
def fmt_k(val: float) -> str:
|
||||||
|
"""Format as $XXK or $X.XM."""
|
||||||
|
if abs(val) >= 1_000_000:
|
||||||
|
return f"${val/1_000_000:.2f}M"
|
||||||
|
return f"${val/1_000:.0f}K"
|
||||||
|
|
||||||
|
|
||||||
|
def fmt_pct(val: float) -> str:
|
||||||
|
return f"{val:.1f}%"
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# File detection & parsing
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def detect_file_type(filepath: Path) -> str | None:
|
||||||
|
"""Detect QB report type from file content."""
|
||||||
|
ext = filepath.suffix.lower()
|
||||||
|
|
||||||
|
if ext in ('.xlsx', '.xls'):
|
||||||
|
try:
|
||||||
|
import openpyxl
|
||||||
|
wb = openpyxl.load_workbook(str(filepath), data_only=False)
|
||||||
|
for name in wb.sheetnames:
|
||||||
|
nl = name.lower()
|
||||||
|
if 'cash flow' in nl or 'statement of cash' in nl:
|
||||||
|
return 'cash_flow'
|
||||||
|
if 'profit and loss detail' in nl or 'p&l detail' in nl:
|
||||||
|
return 'pl_detail'
|
||||||
|
if 'profit and loss' in nl or 'p&l' in nl:
|
||||||
|
return 'pl_summary'
|
||||||
|
if 'balance sheet' in nl:
|
||||||
|
return 'balance_sheet'
|
||||||
|
if 'general ledger' in nl:
|
||||||
|
return 'general_ledger'
|
||||||
|
ws = wb.active
|
||||||
|
for row in ws.iter_rows(max_row=5, values_only=True):
|
||||||
|
text = ' '.join(str(c).lower() for c in row if c)
|
||||||
|
if 'statement of cash flow' in text:
|
||||||
|
return 'cash_flow'
|
||||||
|
if 'profit and loss detail' in text:
|
||||||
|
return 'pl_detail'
|
||||||
|
if 'profit and loss' in text:
|
||||||
|
return 'pl_summary'
|
||||||
|
if 'balance sheet' in text:
|
||||||
|
return 'balance_sheet'
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
return None
|
||||||
|
|
||||||
|
if ext != '.csv':
|
||||||
|
return None
|
||||||
|
|
||||||
|
try:
|
||||||
|
with open(filepath, 'r', encoding='utf-8', errors='replace') as f:
|
||||||
|
head = ''.join(f.readline() for _ in range(10)).lower()
|
||||||
|
except Exception:
|
||||||
|
return None
|
||||||
|
|
||||||
|
if 'profit and loss by customer' in head or 'profit and loss by job' in head:
|
||||||
|
return 'pl_by_customer'
|
||||||
|
if 'profit and loss' in head:
|
||||||
|
return 'pl_summary'
|
||||||
|
if 'balance sheet' in head:
|
||||||
|
return 'balance_sheet'
|
||||||
|
if 'general ledger' in head:
|
||||||
|
return 'general_ledger'
|
||||||
|
if 'expenses by vendor' in head:
|
||||||
|
return 'expenses_by_vendor'
|
||||||
|
if 'transaction list by vendor' in head:
|
||||||
|
return 'transactions_by_vendor'
|
||||||
|
if 'bill payment' in head:
|
||||||
|
return 'bill_payments'
|
||||||
|
if 'account list' in head:
|
||||||
|
return 'account_list'
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def detect_period(filepath: Path) -> str | None:
|
||||||
|
"""Try to extract period from file header."""
|
||||||
|
ext = filepath.suffix.lower()
|
||||||
|
text = ''
|
||||||
|
if ext == '.csv':
|
||||||
|
try:
|
||||||
|
with open(filepath, 'r', encoding='utf-8', errors='replace') as f:
|
||||||
|
text = ''.join(f.readline() for _ in range(5))
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
elif ext in ('.xlsx', '.xls'):
|
||||||
|
try:
|
||||||
|
import openpyxl
|
||||||
|
wb = openpyxl.load_workbook(str(filepath), data_only=False)
|
||||||
|
ws = wb.active
|
||||||
|
for row in ws.iter_rows(max_row=5, values_only=True):
|
||||||
|
text += ' '.join(str(c) for c in row if c) + '\n'
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
# Look for patterns like "February, 2025-January, 2026" or "March 2025"
|
||||||
|
m = re.search(r'(\w+),?\s*(\d{4})\s*[-–]\s*(\w+),?\s*(\d{4})', text)
|
||||||
|
if m:
|
||||||
|
end_month, end_year = m.group(3), m.group(4)
|
||||||
|
try:
|
||||||
|
dt = datetime.strptime(f"{end_month} {end_year}", "%B %Y")
|
||||||
|
return dt.strftime("%Y-%m")
|
||||||
|
except ValueError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
m = re.search(r'(\w+ \d{4})', text)
|
||||||
|
if m:
|
||||||
|
try:
|
||||||
|
dt = datetime.strptime(m.group(1), "%B %Y")
|
||||||
|
return dt.strftime("%Y-%m")
|
||||||
|
except ValueError:
|
||||||
|
pass
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# P&L Summary parser
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def parse_pl_summary(filepath: Path) -> dict:
|
||||||
|
"""Parse P&L Summary CSV into structured data."""
|
||||||
|
results: dict[str, Any] = {
|
||||||
|
'revenue': {},
|
||||||
|
'cogs': {},
|
||||||
|
'expenses': {},
|
||||||
|
'other_income': {},
|
||||||
|
'other_expenses': {},
|
||||||
|
'totals': {}
|
||||||
|
}
|
||||||
|
|
||||||
|
with open(filepath, 'r', encoding='utf-8', errors='replace') as f:
|
||||||
|
lines = f.readlines()
|
||||||
|
|
||||||
|
section = None
|
||||||
|
|
||||||
|
for line in lines:
|
||||||
|
line = line.strip()
|
||||||
|
if not line:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Split on first comma that's not inside quotes
|
||||||
|
parts = []
|
||||||
|
in_quote = False
|
||||||
|
current = ''
|
||||||
|
for ch in line:
|
||||||
|
if ch == '"':
|
||||||
|
in_quote = not in_quote
|
||||||
|
elif ch == ',' and not in_quote:
|
||||||
|
parts.append(current.strip().strip('"'))
|
||||||
|
current = ''
|
||||||
|
continue
|
||||||
|
current += ch
|
||||||
|
parts.append(current.strip().strip('"'))
|
||||||
|
|
||||||
|
account = parts[0] if parts else ''
|
||||||
|
value = parse_dollar(parts[-1]) if len(parts) > 1 else 0.0
|
||||||
|
|
||||||
|
# Track sections
|
||||||
|
if account == 'Income':
|
||||||
|
section = 'revenue'
|
||||||
|
continue
|
||||||
|
elif account == 'Cost of Goods Sold':
|
||||||
|
section = 'cogs'
|
||||||
|
continue
|
||||||
|
elif account == 'Expenses':
|
||||||
|
section = 'expenses'
|
||||||
|
continue
|
||||||
|
elif account == 'Other Income':
|
||||||
|
section = 'other_income'
|
||||||
|
continue
|
||||||
|
elif account == 'Other Expenses':
|
||||||
|
section = 'other_expenses'
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Capture totals
|
||||||
|
if account.startswith('Total for Income'):
|
||||||
|
results['totals']['total_income'] = value
|
||||||
|
section = None
|
||||||
|
elif account.startswith('Total for Cost of Goods Sold'):
|
||||||
|
results['totals']['total_cogs'] = value
|
||||||
|
elif account.startswith('Gross Profit'):
|
||||||
|
results['totals']['gross_profit'] = value
|
||||||
|
elif account.startswith('Total for Expenses'):
|
||||||
|
results['totals']['total_expenses'] = value
|
||||||
|
elif account == 'Net Operating Income':
|
||||||
|
results['totals']['net_operating_income'] = value
|
||||||
|
elif account.startswith('Total for Other Income'):
|
||||||
|
results['totals']['total_other_income'] = value
|
||||||
|
elif account.startswith('Total for Other Expenses'):
|
||||||
|
results['totals']['total_other_expenses'] = value
|
||||||
|
elif account == 'Net Income':
|
||||||
|
results['totals']['net_income'] = value
|
||||||
|
elif account.startswith('Net Other Income'):
|
||||||
|
results['totals']['net_other_income'] = value
|
||||||
|
elif section and value != 0.0 and not account.startswith('Total for'):
|
||||||
|
# Store individual line items
|
||||||
|
clean_name = re.sub(r'^\d{5}\s+', '', account).strip()
|
||||||
|
if section in results:
|
||||||
|
results[section][clean_name] = value
|
||||||
|
|
||||||
|
return results
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# P&L by Customer parser
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def parse_pl_by_customer(filepath: Path) -> dict:
|
||||||
|
"""Parse P&L by Customer CSV. Returns revenue by customer."""
|
||||||
|
customers: dict[str, float] = {}
|
||||||
|
|
||||||
|
with open(filepath, 'r', encoding='utf-8', errors='replace') as f:
|
||||||
|
lines = f.readlines()
|
||||||
|
|
||||||
|
# Find header row with customer names
|
||||||
|
header_idx = None
|
||||||
|
headers = []
|
||||||
|
for i, line in enumerate(lines):
|
||||||
|
if 'Distribution account' in line:
|
||||||
|
header_idx = i
|
||||||
|
import csv
|
||||||
|
reader = csv.reader([line])
|
||||||
|
headers = next(reader)
|
||||||
|
break
|
||||||
|
|
||||||
|
if not headers:
|
||||||
|
return {'customers': customers}
|
||||||
|
|
||||||
|
# Find Total for Income row
|
||||||
|
for line in lines[header_idx+1:]:
|
||||||
|
if 'Total for Income' in line or 'Total for' in line and 'Revenue' in line:
|
||||||
|
import csv
|
||||||
|
reader = csv.reader([line])
|
||||||
|
values = next(reader)
|
||||||
|
for j, val in enumerate(values):
|
||||||
|
if j > 0 and j < len(headers) and headers[j] and headers[j] != 'Total':
|
||||||
|
v = parse_dollar(val)
|
||||||
|
if v > 0:
|
||||||
|
name = headers[j].replace(' (deleted)', '').strip()
|
||||||
|
customers[name] = v
|
||||||
|
break
|
||||||
|
|
||||||
|
return {'customers': customers}
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Cash Flow XLSX parser
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def parse_cash_flow(filepath: Path) -> dict:
|
||||||
|
"""Parse Cash Flow Statement XLSX."""
|
||||||
|
import openpyxl
|
||||||
|
wb = openpyxl.load_workbook(str(filepath), data_only=False)
|
||||||
|
ws = wb.active
|
||||||
|
|
||||||
|
rows = list(ws.iter_rows(values_only=True))
|
||||||
|
result: dict[str, Any] = {'monthly_net_income': {}, 'monthly_net_cash': {}}
|
||||||
|
|
||||||
|
# Find header row with month columns
|
||||||
|
headers = []
|
||||||
|
header_row_idx = None
|
||||||
|
for i, row in enumerate(rows):
|
||||||
|
vals = [str(c) for c in row if c]
|
||||||
|
for v in vals:
|
||||||
|
if re.search(r'(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+\d{4}', str(v)):
|
||||||
|
headers = list(row)
|
||||||
|
header_row_idx = i
|
||||||
|
break
|
||||||
|
if headers:
|
||||||
|
break
|
||||||
|
|
||||||
|
if not headers:
|
||||||
|
return result
|
||||||
|
|
||||||
|
for row in rows[header_row_idx+1:]:
|
||||||
|
label = str(row[0]).strip() if row[0] else ''
|
||||||
|
if 'Net Income' in label and 'reconcile' not in label.lower():
|
||||||
|
for j, val in enumerate(row[1:], 1):
|
||||||
|
if j < len(headers) and headers[j]:
|
||||||
|
month_str = str(headers[j])
|
||||||
|
v = parse_dollar(val)
|
||||||
|
result['monthly_net_income'][month_str] = v
|
||||||
|
if 'net cash' in label.lower() or 'net change in cash' in label.lower():
|
||||||
|
for j, val in enumerate(row[1:], 1):
|
||||||
|
if j < len(headers) and headers[j]:
|
||||||
|
month_str = str(headers[j])
|
||||||
|
v = parse_dollar(val)
|
||||||
|
result['monthly_net_cash'][month_str] = v
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# KPI computation
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def compute_kpis(pl_data: dict, customer_data: dict | None, cash_flow_data: dict | None) -> dict:
|
||||||
|
"""Compute all KPIs from parsed data."""
|
||||||
|
kpis: dict[str, Any] = {}
|
||||||
|
totals = pl_data.get('totals', {})
|
||||||
|
revenue_items = pl_data.get('revenue', {})
|
||||||
|
cogs_items = pl_data.get('cogs', {})
|
||||||
|
expense_items = pl_data.get('expenses', {})
|
||||||
|
other_exp = pl_data.get('other_expenses', {})
|
||||||
|
other_inc = pl_data.get('other_income', {})
|
||||||
|
|
||||||
|
# --- Revenue ---
|
||||||
|
total_revenue = totals.get('total_income', 0)
|
||||||
|
kpis['total_revenue'] = total_revenue
|
||||||
|
|
||||||
|
# Revenue by service line
|
||||||
|
service_revenue = {}
|
||||||
|
for k, v in revenue_items.items():
|
||||||
|
if v != 0:
|
||||||
|
service_revenue[k] = v
|
||||||
|
kpis['revenue_by_service'] = dict(sorted(service_revenue.items(), key=lambda x: -x[1]))
|
||||||
|
|
||||||
|
# --- COGS & Gross Margin ---
|
||||||
|
total_cogs = totals.get('total_cogs', 0)
|
||||||
|
gross_profit = totals.get('gross_profit', total_revenue - total_cogs)
|
||||||
|
kpis['total_cogs'] = total_cogs
|
||||||
|
kpis['gross_profit'] = gross_profit
|
||||||
|
kpis['gross_margin_pct'] = pct(gross_profit, total_revenue)
|
||||||
|
|
||||||
|
# --- Net Income ---
|
||||||
|
kpis['net_income'] = totals.get('net_income', 0)
|
||||||
|
kpis['net_operating_income'] = totals.get('net_operating_income', 0)
|
||||||
|
kpis['net_margin_pct'] = pct(kpis['net_income'], total_revenue)
|
||||||
|
|
||||||
|
# --- People Costs ---
|
||||||
|
salary_total = 0
|
||||||
|
contractor_total = 0
|
||||||
|
payroll_tax_total = 0
|
||||||
|
benefits_total = 0
|
||||||
|
|
||||||
|
all_items = {}
|
||||||
|
all_items.update(cogs_items)
|
||||||
|
all_items.update(expense_items)
|
||||||
|
|
||||||
|
for k, v in all_items.items():
|
||||||
|
kl = k.lower()
|
||||||
|
if 'salari' in kl or 'wages' in kl or 'payroll -' in kl or 'payroll - ' in kl:
|
||||||
|
salary_total += v
|
||||||
|
elif 'contractor' in kl:
|
||||||
|
contractor_total += v
|
||||||
|
elif 'payroll tax' in kl:
|
||||||
|
payroll_tax_total += v
|
||||||
|
elif 'benefit' in kl or 'insurance' in kl and 'benefit' in kl:
|
||||||
|
benefits_total += v
|
||||||
|
elif 'commission' in kl:
|
||||||
|
contractor_total += v
|
||||||
|
|
||||||
|
people_total = salary_total + contractor_total + payroll_tax_total + benefits_total
|
||||||
|
kpis['people_costs'] = {
|
||||||
|
'total': people_total,
|
||||||
|
'salaries': salary_total,
|
||||||
|
'contractors': contractor_total,
|
||||||
|
'payroll_taxes': payroll_tax_total,
|
||||||
|
'benefits': benefits_total,
|
||||||
|
'pct_of_revenue': pct(people_total, total_revenue),
|
||||||
|
'contractor_pct': pct(contractor_total, people_total) if people_total else 0
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- Tool/Subscription Costs ---
|
||||||
|
tool_total = 0
|
||||||
|
tool_items = {}
|
||||||
|
for k, v in all_items.items():
|
||||||
|
kl = k.lower()
|
||||||
|
if any(w in kl for w in ['subscription', 'tools', 'hosting', 'it hosting', 'software']):
|
||||||
|
tool_total += v
|
||||||
|
tool_items[k] = v
|
||||||
|
|
||||||
|
kpis['tool_costs'] = {
|
||||||
|
'total': tool_total,
|
||||||
|
'pct_of_revenue': pct(tool_total, total_revenue),
|
||||||
|
'items': dict(sorted(tool_items.items(), key=lambda x: -x[1]))
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- Expense Categories ---
|
||||||
|
total_expenses = totals.get('total_expenses', 0)
|
||||||
|
kpis['total_opex'] = total_expenses
|
||||||
|
|
||||||
|
categories = {
|
||||||
|
'Sales & Growth': 0,
|
||||||
|
'Marketing & Branding': 0,
|
||||||
|
'G&A': 0,
|
||||||
|
'Facilities': 0,
|
||||||
|
'Other Expenses': 0,
|
||||||
|
}
|
||||||
|
|
||||||
|
for k, v in expense_items.items():
|
||||||
|
kl = k.lower()
|
||||||
|
if any(w in kl for w in ['sales', 'commission', 'growth']):
|
||||||
|
categories['Sales & Growth'] += v
|
||||||
|
elif any(w in kl for w in ['marketing', 'advertising', 'writing', 'content', 'podcast', 'branding', 'networking']):
|
||||||
|
categories['Marketing & Branding'] += v
|
||||||
|
elif any(w in kl for w in ['rent', 'facilit']):
|
||||||
|
categories['Facilities'] += v
|
||||||
|
else:
|
||||||
|
categories['G&A'] += v
|
||||||
|
|
||||||
|
kpis['expense_categories'] = {k: {'amount': v, 'pct_of_revenue': pct(v, total_revenue)} for k, v in categories.items() if v}
|
||||||
|
|
||||||
|
# --- Interest & Debt ---
|
||||||
|
interest = other_exp.get('Interest Expense', 0)
|
||||||
|
amortization = other_exp.get('Amortization Expense', 0)
|
||||||
|
kpis['interest_expense'] = interest
|
||||||
|
kpis['amortization'] = amortization
|
||||||
|
kpis['interest_pct_of_revenue'] = pct(interest, total_revenue)
|
||||||
|
|
||||||
|
# --- Other Income ---
|
||||||
|
kpis['other_income'] = {k: v for k, v in other_inc.items() if v}
|
||||||
|
kpis['total_other_income'] = totals.get('total_other_income', 0)
|
||||||
|
|
||||||
|
# --- Notable Expense Items (anomaly detection) ---
|
||||||
|
notable = {}
|
||||||
|
for k, v in all_items.items():
|
||||||
|
kl = k.lower()
|
||||||
|
if v > 5000 and not any(w in kl for w in ['salari', 'wages', 'payroll', 'rent', 'amortization']):
|
||||||
|
notable[k] = v
|
||||||
|
kpis['notable_expenses'] = dict(sorted(notable.items(), key=lambda x: -x[1])[:15])
|
||||||
|
|
||||||
|
# --- Recruiting ---
|
||||||
|
recruiting = 0
|
||||||
|
for k, v in all_items.items():
|
||||||
|
if 'recruit' in k.lower():
|
||||||
|
recruiting += v
|
||||||
|
kpis['recruiting_spend'] = recruiting
|
||||||
|
|
||||||
|
# --- Owner Expenses ---
|
||||||
|
owner_total = 0
|
||||||
|
for k, v in all_items.items():
|
||||||
|
if 'owner' in k.lower() or k.startswith('OE -'):
|
||||||
|
owner_total += v
|
||||||
|
kpis['owner_expenses'] = owner_total
|
||||||
|
|
||||||
|
# --- Customer Data ---
|
||||||
|
if customer_data and customer_data.get('customers'):
|
||||||
|
custs = customer_data['customers']
|
||||||
|
sorted_custs = sorted(custs.items(), key=lambda x: -x[1])
|
||||||
|
kpis['top_customers'] = sorted_custs[:10]
|
||||||
|
kpis['customer_count'] = len([c for c in custs.values() if c > 0])
|
||||||
|
if sorted_custs and total_revenue:
|
||||||
|
top_pct = sorted_custs[0][1] / total_revenue * 100
|
||||||
|
kpis['top_customer_concentration'] = (sorted_custs[0][0], top_pct)
|
||||||
|
|
||||||
|
# --- Cash Flow Data ---
|
||||||
|
if cash_flow_data:
|
||||||
|
monthly_ni = cash_flow_data.get('monthly_net_income', {})
|
||||||
|
if monthly_ni:
|
||||||
|
vals = list(monthly_ni.values())
|
||||||
|
kpis['monthly_net_income'] = monthly_ni
|
||||||
|
if len(vals) >= 3:
|
||||||
|
kpis['last_3mo_avg_ni'] = sum(vals[-3:]) / 3
|
||||||
|
kpis['monthly_burn'] = abs(min(vals)) if any(v < 0 for v in vals) else 0
|
||||||
|
|
||||||
|
return kpis
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# MoM comparison
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def load_prior_period(history_dir: Path, current_period: str) -> dict | None:
|
||||||
|
"""Load most recent prior period from history."""
|
||||||
|
if not history_dir.exists():
|
||||||
|
return None
|
||||||
|
|
||||||
|
files = sorted(history_dir.glob('*.json'), reverse=True)
|
||||||
|
for f in files:
|
||||||
|
if f.stem != current_period:
|
||||||
|
try:
|
||||||
|
with open(f) as fh:
|
||||||
|
return json.load(fh)
|
||||||
|
except Exception:
|
||||||
|
continue
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def compute_variance(current: float, prior: float) -> str:
|
||||||
|
"""Format MoM variance."""
|
||||||
|
if prior == 0:
|
||||||
|
return "N/A"
|
||||||
|
change = ((current - prior) / abs(prior)) * 100
|
||||||
|
arrow = "↑" if change > 0 else "↓" if change < 0 else "→"
|
||||||
|
return f"{arrow} {abs(change):.1f}%"
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Output formatting
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def format_briefing(kpis: dict, prior: dict | None, period: str) -> str:
|
||||||
|
"""Format KPIs into executive briefing with status indicators."""
|
||||||
|
lines = []
|
||||||
|
lines.append(f"*📊 CFO Briefing — {period}*")
|
||||||
|
lines.append("=" * 40)
|
||||||
|
|
||||||
|
rev = kpis['total_revenue']
|
||||||
|
|
||||||
|
# --- Revenue ---
|
||||||
|
lines.append("")
|
||||||
|
lines.append("*💰 Revenue*")
|
||||||
|
lines.append(f"• Total Revenue: *{fmt_k(rev)}*" + (f" ({compute_variance(rev, prior.get('total_revenue', 0))} MoM)" if prior else ""))
|
||||||
|
|
||||||
|
if kpis.get('revenue_by_service'):
|
||||||
|
top3 = list(kpis['revenue_by_service'].items())[:5]
|
||||||
|
for name, val in top3:
|
||||||
|
lines.append(f" · {name}: {fmt_k(val)} ({fmt_pct(pct(val, rev))})")
|
||||||
|
|
||||||
|
# --- Profitability ---
|
||||||
|
gm = kpis['gross_margin_pct']
|
||||||
|
nm = kpis['net_margin_pct']
|
||||||
|
gm_status = status_emoji(gm, (60, 999), (45, 60))
|
||||||
|
nm_status = status_emoji(nm, (10, 999), (0, 10))
|
||||||
|
|
||||||
|
lines.append("")
|
||||||
|
lines.append("*📈 Profitability*")
|
||||||
|
lines.append(f"• {gm_status} Gross Margin: *{fmt_pct(gm)}* (Gross Profit: {fmt_k(kpis['gross_profit'])})")
|
||||||
|
lines.append(f"• {nm_status} Net Income: *{fmt_k(kpis['net_income'])}* ({fmt_pct(nm)} margin)")
|
||||||
|
lines.append(f"• Net Operating Income: {fmt_k(kpis['net_operating_income'])}")
|
||||||
|
|
||||||
|
if prior:
|
||||||
|
lines.append(f" MoM: Revenue {compute_variance(rev, prior.get('total_revenue', 0))}, "
|
||||||
|
f"Net Income {compute_variance(kpis['net_income'], prior.get('net_income', 0))}")
|
||||||
|
|
||||||
|
# --- People Costs ---
|
||||||
|
pc = kpis['people_costs']
|
||||||
|
pc_status = status_emoji(pc['pct_of_revenue'], (0, 65), (65, 75))
|
||||||
|
contr_status = status_emoji(pc['contractor_pct'], (0, 30), (30, 50))
|
||||||
|
|
||||||
|
lines.append("")
|
||||||
|
lines.append("*👥 People Costs*")
|
||||||
|
lines.append(f"• {pc_status} Total: *{fmt_k(pc['total'])}* ({fmt_pct(pc['pct_of_revenue'])} of revenue)")
|
||||||
|
lines.append(f" · Salaries: {fmt_k(pc['salaries'])}")
|
||||||
|
lines.append(f" · {contr_status} Contractors: {fmt_k(pc['contractors'])} ({fmt_pct(pc['contractor_pct'])} of people costs)")
|
||||||
|
lines.append(f" · Payroll Taxes: {fmt_k(pc['payroll_taxes'])}")
|
||||||
|
lines.append(f" · Benefits: {fmt_k(pc['benefits'])}")
|
||||||
|
|
||||||
|
# --- Tools & Subscriptions ---
|
||||||
|
tc = kpis['tool_costs']
|
||||||
|
tc_status = status_emoji(tc['pct_of_revenue'], (0, 8), (8, 12))
|
||||||
|
|
||||||
|
lines.append("")
|
||||||
|
lines.append("*🔧 Tools & Subscriptions*")
|
||||||
|
lines.append(f"• {tc_status} Total: *{fmt_k(tc['total'])}* ({fmt_pct(tc['pct_of_revenue'])} of revenue)")
|
||||||
|
if tc['items']:
|
||||||
|
for name, val in list(tc['items'].items())[:5]:
|
||||||
|
lines.append(f" · {name}: {fmt_k(val)}")
|
||||||
|
|
||||||
|
# --- Expense Categories ---
|
||||||
|
if kpis.get('expense_categories'):
|
||||||
|
lines.append("")
|
||||||
|
lines.append("*📋 Operating Expenses*")
|
||||||
|
lines.append(f"• Total OpEx: *{fmt_k(kpis['total_opex'])}* ({fmt_pct(pct(kpis['total_opex'], rev))} of revenue)")
|
||||||
|
for cat, data in kpis['expense_categories'].items():
|
||||||
|
lines.append(f" · {cat}: {fmt_k(data['amount'])} ({fmt_pct(data['pct_of_revenue'])})")
|
||||||
|
|
||||||
|
# --- Interest & Debt ---
|
||||||
|
int_status = status_emoji(kpis['interest_pct_of_revenue'], (0, 3), (3, 5))
|
||||||
|
lines.append("")
|
||||||
|
lines.append("*🏦 Debt & Interest*")
|
||||||
|
lines.append(f"• {int_status} Interest Expense: *{fmt_k(kpis['interest_expense'])}* ({fmt_pct(kpis['interest_pct_of_revenue'])} of revenue)")
|
||||||
|
lines.append(f"• Amortization (non-cash): {fmt_k(kpis['amortization'])}")
|
||||||
|
|
||||||
|
# --- Notable Items ---
|
||||||
|
if kpis.get('recruiting_spend'):
|
||||||
|
lines.append(f"• Recruiting: {fmt_k(kpis['recruiting_spend'])}")
|
||||||
|
|
||||||
|
if kpis.get('owner_expenses'):
|
||||||
|
lines.append(f"• Owner Expenses: {fmt_k(kpis['owner_expenses'])}")
|
||||||
|
|
||||||
|
# --- Customer Concentration ---
|
||||||
|
if kpis.get('top_customers'):
|
||||||
|
lines.append("")
|
||||||
|
lines.append("*🏢 Top Customers*")
|
||||||
|
for name, val in kpis['top_customers'][:5]:
|
||||||
|
conc = pct(val, rev)
|
||||||
|
conc_flag = " ⚠️" if conc > 15 else ""
|
||||||
|
lines.append(f" · {name}: {fmt_k(val)} ({fmt_pct(conc)}){conc_flag}")
|
||||||
|
if kpis.get('top_customer_concentration'):
|
||||||
|
cname, cpct = kpis['top_customer_concentration']
|
||||||
|
conc_status = status_emoji(100 - cpct, (75, 100), (60, 75))
|
||||||
|
lines.append(f"• {conc_status} Top client concentration: {cname} at {fmt_pct(cpct)}")
|
||||||
|
if kpis.get('customer_count'):
|
||||||
|
lines.append(f"• Active customers: {kpis['customer_count']}")
|
||||||
|
|
||||||
|
# --- Other Income ---
|
||||||
|
if kpis.get('other_income'):
|
||||||
|
lines.append("")
|
||||||
|
lines.append("*📥 Other Income*")
|
||||||
|
for k, v in kpis['other_income'].items():
|
||||||
|
lines.append(f" · {k}: {fmt_k(v)}")
|
||||||
|
|
||||||
|
# --- Monthly Trends ---
|
||||||
|
if kpis.get('monthly_net_income'):
|
||||||
|
lines.append("")
|
||||||
|
lines.append("*📉 Monthly Net Income Trend*")
|
||||||
|
for month, val in kpis['monthly_net_income'].items():
|
||||||
|
ml = month.lower()
|
||||||
|
if 'total' in ml:
|
||||||
|
continue
|
||||||
|
if val == 0.0:
|
||||||
|
continue
|
||||||
|
indicator = "✅" if val > 0 else "❌"
|
||||||
|
lines.append(f" · {month}: {fmt_k(val)} {indicator}")
|
||||||
|
|
||||||
|
# --- Alerts ---
|
||||||
|
alerts = []
|
||||||
|
if kpis['gross_margin_pct'] < 45:
|
||||||
|
alerts.append("🔴 Gross margin critically low (<45%). Target: 60%+")
|
||||||
|
if pc['pct_of_revenue'] > 75:
|
||||||
|
alerts.append("🔴 People costs >75% of revenue. Target: 55-65%")
|
||||||
|
if pc['contractor_pct'] > 50:
|
||||||
|
alerts.append("🟡 Contractor spend >50% of people costs — dependency risk")
|
||||||
|
if kpis['interest_pct_of_revenue'] > 5:
|
||||||
|
alerts.append("🔴 Interest >5% of revenue — debt load is heavy")
|
||||||
|
if kpis.get('recruiting_spend', 0) > 80000:
|
||||||
|
alerts.append("🟡 Recruiting spend elevated — review if active hires justify")
|
||||||
|
if kpis.get('owner_expenses', 0) > 100000:
|
||||||
|
alerts.append("🟡 Owner expenses >$100K TTM")
|
||||||
|
if kpis['net_income'] < 0:
|
||||||
|
deficit = abs(kpis['net_income'])
|
||||||
|
alerts.append(f"🔴 Operating at a loss: {fmt_k(deficit)} deficit")
|
||||||
|
|
||||||
|
if alerts:
|
||||||
|
lines.append("")
|
||||||
|
lines.append("*⚠️ Alerts*")
|
||||||
|
for a in alerts:
|
||||||
|
lines.append(f"• {a}")
|
||||||
|
|
||||||
|
lines.append("")
|
||||||
|
lines.append("_Data from QuickBooks exports. Review with your finance team before acting._")
|
||||||
|
|
||||||
|
return '\n'.join(lines)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Main
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(description='CFO Briefing Analyzer')
|
||||||
|
parser.add_argument('--input', '-i', default='./data/uploads/',
|
||||||
|
help='Directory containing QB export files')
|
||||||
|
parser.add_argument('--period', '-p', help='Override period label (YYYY-MM)')
|
||||||
|
parser.add_argument('--history', default='./data/history/',
|
||||||
|
help='History directory for MoM comparison')
|
||||||
|
parser.add_argument('--no-history', action='store_true', help='Skip saving to history')
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
input_dir = Path(args.input)
|
||||||
|
history_dir = Path(args.history)
|
||||||
|
|
||||||
|
if not input_dir.exists():
|
||||||
|
print(f"Error: Input directory not found: {input_dir}", file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
# Find and classify files
|
||||||
|
files = list(input_dir.glob('*.csv')) + list(input_dir.glob('*.xlsx')) + list(input_dir.glob('*.xls'))
|
||||||
|
if not files:
|
||||||
|
print(f"Error: No CSV/XLSX files found in {input_dir}", file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
classified: dict[str, Path] = {}
|
||||||
|
period = args.period
|
||||||
|
|
||||||
|
for f in files:
|
||||||
|
ftype = detect_file_type(f)
|
||||||
|
if ftype:
|
||||||
|
classified[ftype] = f
|
||||||
|
if not period:
|
||||||
|
p = detect_period(f)
|
||||||
|
if p:
|
||||||
|
period = p
|
||||||
|
print(f" Detected: {f.name} → {ftype}", file=sys.stderr)
|
||||||
|
else:
|
||||||
|
print(f" Skipped (unknown format): {f.name}", file=sys.stderr)
|
||||||
|
|
||||||
|
if not period:
|
||||||
|
period = datetime.now().strftime("%Y-%m")
|
||||||
|
|
||||||
|
print(f" Period: {period}", file=sys.stderr)
|
||||||
|
print(f" Files classified: {len(classified)}", file=sys.stderr)
|
||||||
|
|
||||||
|
# Parse files
|
||||||
|
pl_data = {}
|
||||||
|
customer_data = None
|
||||||
|
cash_flow_data = None
|
||||||
|
|
||||||
|
if 'pl_summary' in classified:
|
||||||
|
pl_data = parse_pl_summary(classified['pl_summary'])
|
||||||
|
elif 'pl_by_customer' not in classified:
|
||||||
|
print("Warning: No P&L file found. Output will be limited.", file=sys.stderr)
|
||||||
|
pl_data = {'revenue': {}, 'cogs': {}, 'expenses': {}, 'other_income': {}, 'other_expenses': {}, 'totals': {}}
|
||||||
|
|
||||||
|
if 'pl_by_customer' in classified:
|
||||||
|
customer_data = parse_pl_by_customer(classified['pl_by_customer'])
|
||||||
|
if not pl_data.get('totals'):
|
||||||
|
pl_data = parse_pl_summary(classified['pl_by_customer'])
|
||||||
|
|
||||||
|
if 'cash_flow' in classified:
|
||||||
|
cash_flow_data = parse_cash_flow(classified['cash_flow'])
|
||||||
|
|
||||||
|
# Compute KPIs
|
||||||
|
kpis = compute_kpis(pl_data, customer_data, cash_flow_data)
|
||||||
|
|
||||||
|
# Load prior period
|
||||||
|
prior = None
|
||||||
|
if history_dir.exists():
|
||||||
|
prior = load_prior_period(history_dir, period)
|
||||||
|
|
||||||
|
# Save current period
|
||||||
|
if not args.no_history:
|
||||||
|
history_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
history_file = history_dir / f"{period}.json"
|
||||||
|
|
||||||
|
save_data = {
|
||||||
|
'period': period,
|
||||||
|
'total_revenue': kpis['total_revenue'],
|
||||||
|
'gross_profit': kpis['gross_profit'],
|
||||||
|
'gross_margin_pct': kpis['gross_margin_pct'],
|
||||||
|
'net_income': kpis['net_income'],
|
||||||
|
'net_margin_pct': kpis['net_margin_pct'],
|
||||||
|
'total_cogs': kpis['total_cogs'],
|
||||||
|
'total_opex': kpis['total_opex'],
|
||||||
|
'people_costs_total': kpis['people_costs']['total'],
|
||||||
|
'people_costs_pct': kpis['people_costs']['pct_of_revenue'],
|
||||||
|
'tool_costs_total': kpis['tool_costs']['total'],
|
||||||
|
'tool_costs_pct': kpis['tool_costs']['pct_of_revenue'],
|
||||||
|
'interest_expense': kpis['interest_expense'],
|
||||||
|
'recruiting_spend': kpis.get('recruiting_spend', 0),
|
||||||
|
'owner_expenses': kpis.get('owner_expenses', 0),
|
||||||
|
'customer_count': kpis.get('customer_count', 0),
|
||||||
|
'timestamp': datetime.now().isoformat(),
|
||||||
|
}
|
||||||
|
with open(history_file, 'w') as f:
|
||||||
|
json.dump(save_data, f, indent=2)
|
||||||
|
print(f" Saved history: {history_file}", file=sys.stderr)
|
||||||
|
|
||||||
|
# Output briefing
|
||||||
|
briefing = format_briefing(kpis, prior, period)
|
||||||
|
print(briefing)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
main()
|
||||||
238
finance-ops/scripts/scenario-modeler.py
Normal file
238
finance-ops/scripts/scenario-modeler.py
Normal file
|
|
@ -0,0 +1,238 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Scenario Modeler — Models base/bull/bear cases from financial analysis.
|
||||||
|
|
||||||
|
Takes a JSON file with financial summary data and projects 12-month scenarios.
|
||||||
|
Outputs to stdout and saves JSON for further analysis.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python3 scenario-modeler.py --input ./data/financial-latest.json
|
||||||
|
python3 scenario-modeler.py --input ./data/financial-latest.json --output ./data/scenarios.json
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
from datetime import datetime
|
||||||
|
|
||||||
|
|
||||||
|
def load_financial_data(input_path: str) -> dict:
|
||||||
|
"""Load financial summary JSON."""
|
||||||
|
with open(input_path) as f:
|
||||||
|
return json.load(f)
|
||||||
|
|
||||||
|
|
||||||
|
def model_base_case(data: dict) -> dict:
|
||||||
|
"""Current trajectory continues. No growth, no cuts."""
|
||||||
|
monthly_rev = data["total_revenue"] / 12
|
||||||
|
monthly_cogs = data["total_cogs"] / 12
|
||||||
|
monthly_opex = data["total_opex"] / 12
|
||||||
|
monthly_other = data.get("other_expenses", 0) / 12 - data.get("other_income", 0) / 12
|
||||||
|
monthly_net = data["net_income"] / 12
|
||||||
|
|
||||||
|
projections = []
|
||||||
|
for m in range(1, 13):
|
||||||
|
projections.append({
|
||||||
|
"month": m,
|
||||||
|
"revenue": round(monthly_rev, 2),
|
||||||
|
"total_costs": round(monthly_cogs + monthly_opex + monthly_other, 2),
|
||||||
|
"net_income": round(monthly_net, 2),
|
||||||
|
"cumulative_pl": round(monthly_net * m, 2),
|
||||||
|
})
|
||||||
|
|
||||||
|
monthly_burn = abs(monthly_net) if monthly_net < 0 else 0
|
||||||
|
breakeven_monthly_cut = monthly_burn # How much to cut monthly to break even
|
||||||
|
|
||||||
|
return {
|
||||||
|
"name": "Base Case — Status Quo",
|
||||||
|
"description": "Current trajectory continues. No new clients, no lost clients, no cost changes.",
|
||||||
|
"assumptions": [
|
||||||
|
f"Revenue stays flat at ~${monthly_rev:,.0f}/mo",
|
||||||
|
f"COGS stays at ~${monthly_cogs:,.0f}/mo",
|
||||||
|
f"OpEx stays at ~${monthly_opex:,.0f}/mo",
|
||||||
|
"No new hires, no layoffs",
|
||||||
|
],
|
||||||
|
"monthly_burn": round(monthly_burn, 2),
|
||||||
|
"annual_projected_loss": round(data["net_income"], 2) if data["net_income"] < 0 else 0,
|
||||||
|
"annual_projected_profit": round(data["net_income"], 2) if data["net_income"] > 0 else 0,
|
||||||
|
"months_to_breakeven": "N/A (already profitable)" if monthly_net > 0 else "Never (at current trajectory)",
|
||||||
|
"key_levers": [
|
||||||
|
f"Cut ${breakeven_monthly_cut:,.0f}/mo from costs to break even" if monthly_burn > 0 else "Maintain current discipline",
|
||||||
|
f"Or grow revenue {round(monthly_burn / monthly_rev * 100, 1)}% while holding costs flat" if monthly_burn > 0 and monthly_rev > 0 else "Focus on margin expansion",
|
||||||
|
"Audit subscriptions and contractor spend for quick wins",
|
||||||
|
],
|
||||||
|
"projections": projections,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def model_bull_case(data: dict, new_product_arr: float = 500000, new_clients: int = 3, avg_client_mrr: float = 15000) -> dict:
|
||||||
|
"""Growth targets met: new product revenue + new agency clients."""
|
||||||
|
monthly_rev = data["total_revenue"] / 12
|
||||||
|
monthly_cogs = data["total_cogs"] / 12
|
||||||
|
monthly_opex = data["total_opex"] / 12
|
||||||
|
monthly_other = data.get("other_expenses", 0) / 12 - data.get("other_income", 0) / 12
|
||||||
|
|
||||||
|
product_monthly = new_product_arr / 12
|
||||||
|
new_clients_monthly = new_clients * avg_client_mrr
|
||||||
|
# Product has SaaS margins (~80%), services have ~50% margin
|
||||||
|
product_cogs = product_monthly * 0.20
|
||||||
|
services_cogs = new_clients_monthly * 0.50
|
||||||
|
|
||||||
|
projections = []
|
||||||
|
for m in range(1, 13):
|
||||||
|
# Ramp: product over 6 months, clients added quarterly
|
||||||
|
product_ramp = min(m / 6, 1.0)
|
||||||
|
client_ramp = min(m, new_clients) / new_clients
|
||||||
|
month_rev = monthly_rev + (product_monthly * product_ramp) + (new_clients_monthly * client_ramp)
|
||||||
|
month_costs = monthly_cogs + monthly_opex + monthly_other + (product_cogs * product_ramp) + (services_cogs * client_ramp)
|
||||||
|
month_net = month_rev - month_costs
|
||||||
|
|
||||||
|
projections.append({
|
||||||
|
"month": m,
|
||||||
|
"revenue": round(month_rev, 2),
|
||||||
|
"total_costs": round(month_costs, 2),
|
||||||
|
"net_income": round(month_net, 2),
|
||||||
|
"cumulative_pl": round(sum(p["net_income"] for p in projections) + month_net, 2),
|
||||||
|
})
|
||||||
|
|
||||||
|
breakeven_month = None
|
||||||
|
for p in projections:
|
||||||
|
if p["net_income"] > 0:
|
||||||
|
breakeven_month = p["month"]
|
||||||
|
break
|
||||||
|
|
||||||
|
return {
|
||||||
|
"name": "Bull Case — Product + Growth",
|
||||||
|
"description": f"New product hits ${new_product_arr/1000:.0f}K ARR, add {new_clients} clients at ${avg_client_mrr/1000:.0f}K/mo.",
|
||||||
|
"assumptions": [
|
||||||
|
f"Product ramps to ${product_monthly:,.0f}/mo over 6 months",
|
||||||
|
f"{new_clients} new clients at ${avg_client_mrr:,.0f}/mo each, added quarterly",
|
||||||
|
"Product has 80% gross margin (SaaS)",
|
||||||
|
"New services clients at 50% margin",
|
||||||
|
"No additional OpEx needed (existing team absorbs)",
|
||||||
|
],
|
||||||
|
"additional_annual_revenue": round((product_monthly + new_clients_monthly) * 12, 2),
|
||||||
|
"monthly_profit_at_full_ramp": round(projections[-1]["net_income"], 2) if projections[-1]["net_income"] > 0 else 0,
|
||||||
|
"months_to_breakeven": breakeven_month if breakeven_month else ">12 months",
|
||||||
|
"key_levers": [
|
||||||
|
"Product-market fit and sales execution",
|
||||||
|
f"Services pipeline — need 1 new ${avg_client_mrr/1000:.0f}K client per quarter",
|
||||||
|
"Keep OpEx flat during growth phase",
|
||||||
|
"SaaS margins dramatically improve blended margin",
|
||||||
|
],
|
||||||
|
"projections": projections,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def model_bear_case(data: dict, pct_revenue_lost: float = 0.30) -> dict:
|
||||||
|
"""Lose significant portion of revenue (e.g., top clients churn)."""
|
||||||
|
monthly_rev = data["total_revenue"] / 12
|
||||||
|
monthly_cogs = data["total_cogs"] / 12
|
||||||
|
monthly_opex = data["total_opex"] / 12
|
||||||
|
monthly_other = data.get("other_expenses", 0) / 12 - data.get("other_income", 0) / 12
|
||||||
|
|
||||||
|
lost_revenue = data["total_revenue"] * pct_revenue_lost
|
||||||
|
monthly_lost = lost_revenue / 12
|
||||||
|
# Save ~45% of lost revenue in COGS (team partially redeployed)
|
||||||
|
monthly_saved_cogs = monthly_lost * 0.45
|
||||||
|
|
||||||
|
projections = []
|
||||||
|
for m in range(1, 13):
|
||||||
|
month_rev = monthly_rev - monthly_lost
|
||||||
|
month_costs = (monthly_cogs - monthly_saved_cogs) + monthly_opex + monthly_other
|
||||||
|
month_net = month_rev - month_costs
|
||||||
|
|
||||||
|
projections.append({
|
||||||
|
"month": m,
|
||||||
|
"revenue": round(month_rev, 2),
|
||||||
|
"total_costs": round(month_costs, 2),
|
||||||
|
"net_income": round(month_net, 2),
|
||||||
|
"cumulative_pl": round(month_net * m, 2),
|
||||||
|
})
|
||||||
|
|
||||||
|
return {
|
||||||
|
"name": f"Bear Case — Lose {pct_revenue_lost*100:.0f}% Revenue",
|
||||||
|
"description": f"Top clients churn, losing {pct_revenue_lost*100:.0f}% of revenue.",
|
||||||
|
"assumptions": [
|
||||||
|
f"Lose ${lost_revenue:,.0f}/yr ({pct_revenue_lost*100:.0f}% of revenue)",
|
||||||
|
"COGS reduces ~45% of lost revenue (team partially redeployed)",
|
||||||
|
"OpEx stays fixed (can't cut fast enough)",
|
||||||
|
"No replacement clients in forecast period",
|
||||||
|
],
|
||||||
|
"lost_annual_revenue": round(lost_revenue, 2),
|
||||||
|
"new_annual_revenue": round(data["total_revenue"] - lost_revenue, 2),
|
||||||
|
"monthly_burn": round(abs(projections[0]["net_income"]), 2) if projections[0]["net_income"] < 0 else 0,
|
||||||
|
"annual_projected_loss": round(projections[0]["net_income"] * 12, 2),
|
||||||
|
"months_to_breakeven": "Requires major restructuring",
|
||||||
|
"key_levers": [
|
||||||
|
f"Immediate need: cut ${abs(projections[0]['net_income']):,.0f}/mo in costs",
|
||||||
|
"Reduce headcount in affected service lines",
|
||||||
|
"Accelerate sales pipeline to replace lost revenue",
|
||||||
|
"Consider consolidating service lines",
|
||||||
|
],
|
||||||
|
"required_monthly_cost_cuts": round(abs(projections[0]["net_income"]), 2) if projections[0]["net_income"] < 0 else 0,
|
||||||
|
"projections": projections,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(description='Financial Scenario Modeler')
|
||||||
|
parser.add_argument('--input', '-i', required=True,
|
||||||
|
help='Path to financial summary JSON (output from cfo-analyzer history)')
|
||||||
|
parser.add_argument('--output', '-o', default=None,
|
||||||
|
help='Output path for scenarios JSON (default: stdout only)')
|
||||||
|
parser.add_argument('--product-arr', type=float, default=500000,
|
||||||
|
help='Bull case: new product ARR target (default: 500000)')
|
||||||
|
parser.add_argument('--new-clients', type=int, default=3,
|
||||||
|
help='Bull case: number of new clients (default: 3)')
|
||||||
|
parser.add_argument('--client-mrr', type=float, default=15000,
|
||||||
|
help='Bull case: average new client MRR (default: 15000)')
|
||||||
|
parser.add_argument('--bear-loss-pct', type=float, default=0.30,
|
||||||
|
help='Bear case: percentage of revenue lost (default: 0.30)')
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
print("🔮 Scenario Modeler — Building projections...", file=sys.stderr)
|
||||||
|
|
||||||
|
data = load_financial_data(args.input)
|
||||||
|
|
||||||
|
scenarios = {
|
||||||
|
"base_case": model_base_case(data),
|
||||||
|
"bull_case": model_bull_case(data, args.product_arr, args.new_clients, args.client_mrr),
|
||||||
|
"bear_case": model_bear_case(data, args.bear_loss_pct),
|
||||||
|
"generated_at": datetime.now().isoformat(),
|
||||||
|
"based_on_period": data.get("period", "Unknown"),
|
||||||
|
}
|
||||||
|
|
||||||
|
# Summary comparison
|
||||||
|
scenarios["summary"] = {
|
||||||
|
"base_monthly_burn": scenarios["base_case"]["monthly_burn"],
|
||||||
|
"bull_monthly_profit": scenarios["bull_case"].get("monthly_profit_at_full_ramp", 0),
|
||||||
|
"bear_monthly_burn": scenarios["bear_case"]["monthly_burn"],
|
||||||
|
"current_net_income": data["net_income"],
|
||||||
|
}
|
||||||
|
|
||||||
|
if args.output:
|
||||||
|
os.makedirs(os.path.dirname(args.output) or '.', exist_ok=True)
|
||||||
|
with open(args.output, "w") as f:
|
||||||
|
json.dump(scenarios, f, indent=2)
|
||||||
|
print(f"✅ Scenarios saved to {args.output}", file=sys.stderr)
|
||||||
|
|
||||||
|
# Print summary to stdout
|
||||||
|
print(f"\n{'='*60}")
|
||||||
|
for case_key in ["base_case", "bull_case", "bear_case"]:
|
||||||
|
case = scenarios[case_key]
|
||||||
|
print(f"\n📌 {case['name']}")
|
||||||
|
print(f" {case['description']}")
|
||||||
|
if case.get("monthly_burn"):
|
||||||
|
print(f" Monthly burn: ${case['monthly_burn']:,.0f}")
|
||||||
|
if case.get("monthly_profit_at_full_ramp"):
|
||||||
|
print(f" Monthly profit at ramp: ${case['monthly_profit_at_full_ramp']:,.0f}")
|
||||||
|
print(f" Breakeven: {case['months_to_breakeven']}")
|
||||||
|
print(f" Key levers:")
|
||||||
|
for lever in case.get("key_levers", []):
|
||||||
|
print(f" • {lever}")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
73
growth-engine/.env.example
Normal file
73
growth-engine/.env.example
Normal file
|
|
@ -0,0 +1,73 @@
|
||||||
|
# Growth Engine Configuration
|
||||||
|
# Copy this file to .env and fill in your values
|
||||||
|
|
||||||
|
# ── Core Settings ──────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
# Where experiment data is stored (default: ./data/experiments)
|
||||||
|
GROWTH_ENGINE_DATA_DIR=./data/experiments
|
||||||
|
|
||||||
|
# Comma-separated list of agent/channel names to track
|
||||||
|
GROWTH_ENGINE_AGENTS=content,email,linkedin,seo,blog
|
||||||
|
|
||||||
|
# Agents with high-volume data (need only 10 samples/variant for significance)
|
||||||
|
HIGH_VOLUME_AGENTS=content,email
|
||||||
|
|
||||||
|
# Agents with low-volume data (need 30 samples/variant for significance)
|
||||||
|
LOW_VOLUME_AGENTS=seo,linkedin,blog
|
||||||
|
|
||||||
|
# ── Statistical Thresholds ─────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
# p-value threshold for declaring a winner (default: 0.05)
|
||||||
|
P_WINNER=0.05
|
||||||
|
|
||||||
|
# p-value threshold for "trending" early signal (default: 0.10)
|
||||||
|
P_TREND=0.10
|
||||||
|
|
||||||
|
# Minimum % lift required for "keep" decision (default: 15.0)
|
||||||
|
LIFT_WIN=15.0
|
||||||
|
|
||||||
|
# Number of bootstrap resamples for confidence intervals (default: 1000)
|
||||||
|
BOOTSTRAP_ITERATIONS=1000
|
||||||
|
|
||||||
|
# Maximum variants allowed in batch mode (default: 10)
|
||||||
|
BATCH_MODE_MAX_VARIANTS=10
|
||||||
|
|
||||||
|
# ── Pacing Alert: Pipeline API ─────────────────────────────────────────────────
|
||||||
|
|
||||||
|
# Your pipeline/CRM dashboard API endpoint
|
||||||
|
PIPELINE_API_URL=
|
||||||
|
# Bearer token for pipeline API authentication
|
||||||
|
PIPELINE_AUTH_TOKEN=
|
||||||
|
|
||||||
|
# ── Pacing Alert: Recruiting API ───────────────────────────────────────────────
|
||||||
|
|
||||||
|
# Your recruiting/candidate dashboard API endpoint
|
||||||
|
RECRUITING_API_URL=
|
||||||
|
# Bearer token for recruiting API authentication
|
||||||
|
RECRUITING_AUTH_TOKEN=
|
||||||
|
|
||||||
|
# ── Pacing Alert: Email Platform ───────────────────────────────────────────────
|
||||||
|
|
||||||
|
# Email sending platform API base URL (e.g., Instantly, Lemlist, Smartlead)
|
||||||
|
EMAIL_API_URL=
|
||||||
|
# Bearer token for email platform API
|
||||||
|
EMAIL_AUTH_TOKEN=
|
||||||
|
|
||||||
|
# Campaign IDs as JSON objects: {"Campaign Name": "campaign-uuid"}
|
||||||
|
OUTBOUND_CAMPAIGNS={}
|
||||||
|
RECRUITING_CAMPAIGNS={}
|
||||||
|
|
||||||
|
# ── Pacing Alert: Targets ─────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
# Minimum leads to stage per day (alert if below this)
|
||||||
|
DAILY_LEAD_TARGET=10
|
||||||
|
|
||||||
|
# Weekly candidate sourcing target
|
||||||
|
WEEKLY_CANDIDATE_TARGET=400
|
||||||
|
|
||||||
|
# ── Timezone ───────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
# UTC offset for local time display (e.g., -7 for PDT, -8 for PST)
|
||||||
|
TZ_OFFSET=-7
|
||||||
|
# Label for display
|
||||||
|
TZ_LABEL=PDT
|
||||||
328
growth-engine/README.md
Normal file
328
growth-engine/README.md
Normal file
|
|
@ -0,0 +1,328 @@
|
||||||
|
# 🧬 Growth Engine
|
||||||
|
|
||||||
|
**What if your marketing experiments ran themselves?**
|
||||||
|
|
||||||
|
Autonomous growth experimentation for AI agents. Inspired by [Karpathy's autoresearch](https://x.com/karpathy/status/1886192184808149383) pattern applied to marketing: create experiments with hypotheses, collect data, run statistical analysis, auto-promote winners to a living playbook, and suggest what to test next.
|
||||||
|
|
||||||
|
Your AI agents stop guessing and start *knowing* what works.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What You Get
|
||||||
|
|
||||||
|
- **🔬 Experiment Engine** — Create A/B or batch (up to 10 variants) experiments with hypotheses, track them, and let the math decide winners
|
||||||
|
- **📊 Bootstrap CI + Mann-Whitney U** — Real statistical rigor, not vibes. Non-parametric tests that work with small samples and non-normal distributions
|
||||||
|
- **📖 Auto-Playbook** — Winners automatically promote to a living playbook of empirically proven best practices
|
||||||
|
- **💡 Next-Experiment Suggestions** — The system knows what you haven't tested yet and suggests what to run next
|
||||||
|
- **📈 Weekly Scorecard** — Automated report across all channels: wins, trends, running experiments, discards
|
||||||
|
- **⚠️ Pacing Alerts** — Monitor campaign health, lead staging rates, and candidate pipelines against targets
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### 1. Install dependencies
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install -r requirements.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Set environment variables
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cp .env.example .env
|
||||||
|
# Edit .env with your configuration
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Run your first experiment
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Create an experiment
|
||||||
|
python3 experiment-engine.py create \
|
||||||
|
--agent content \
|
||||||
|
--hypothesis "Thread posts get 2x impressions vs single posts" \
|
||||||
|
--variable "format" \
|
||||||
|
--variants '["thread", "single"]' \
|
||||||
|
--metric "impressions" \
|
||||||
|
--cycle-hours 8
|
||||||
|
|
||||||
|
# Log data as it comes in
|
||||||
|
python3 experiment-engine.py log \
|
||||||
|
--agent content \
|
||||||
|
--experiment-id EXP-CONTENT-001 \
|
||||||
|
--variant "thread" \
|
||||||
|
--metrics '{"impressions": 4500, "clicks": 120, "replies": 8}'
|
||||||
|
|
||||||
|
# Score when you have enough data
|
||||||
|
python3 experiment-engine.py score \
|
||||||
|
--agent content \
|
||||||
|
--experiment-id EXP-CONTENT-001
|
||||||
|
|
||||||
|
# Check your playbook of proven winners
|
||||||
|
python3 experiment-engine.py playbook --agent content
|
||||||
|
|
||||||
|
# What should you test next?
|
||||||
|
python3 experiment-engine.py suggest --agent content
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Commands
|
||||||
|
|
||||||
|
### experiment-engine.py
|
||||||
|
|
||||||
|
The core engine. Manages the full experiment lifecycle.
|
||||||
|
|
||||||
|
| Command | Description |
|
||||||
|
|---------|-------------|
|
||||||
|
| `create` | Create a new A/B or batch experiment with a hypothesis |
|
||||||
|
| `log` | Log a data point (metrics) for a running experiment variant |
|
||||||
|
| `score` | Run statistical analysis. Auto-promotes winners to playbook |
|
||||||
|
| `list` | List experiments by agent, optionally filtered by status |
|
||||||
|
| `playbook` | Show the living playbook of empirically proven best practices |
|
||||||
|
| `suggest` | Suggest untested variables to experiment on next |
|
||||||
|
|
||||||
|
**Batch mode** — test up to 10 variants simultaneously:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python3 experiment-engine.py create \
|
||||||
|
--agent email \
|
||||||
|
--hypothesis "Which subject line style drives highest open rate?" \
|
||||||
|
--variable "subject_line_style" \
|
||||||
|
--variants '["question", "number", "how-to", "curiosity-gap", "personalized"]' \
|
||||||
|
--metric "open_rate" \
|
||||||
|
--batch-mode
|
||||||
|
```
|
||||||
|
|
||||||
|
### autogrowth-weekly-scorecard.py
|
||||||
|
|
||||||
|
Generates a weekly report across all agents/channels.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Current week scorecard
|
||||||
|
python3 autogrowth-weekly-scorecard.py
|
||||||
|
|
||||||
|
# Two weeks ago
|
||||||
|
python3 autogrowth-weekly-scorecard.py --weeks 2
|
||||||
|
|
||||||
|
# Save to file
|
||||||
|
python3 autogrowth-weekly-scorecard.py --output reports/week-12.md
|
||||||
|
```
|
||||||
|
|
||||||
|
### pacing-alert.py
|
||||||
|
|
||||||
|
Monitors campaign health and pacing against targets.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Formatted text output
|
||||||
|
python3 pacing-alert.py
|
||||||
|
|
||||||
|
# JSON output for integrations
|
||||||
|
python3 pacing-alert.py --json
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Example Output
|
||||||
|
|
||||||
|
### Experiment Scoring
|
||||||
|
|
||||||
|
```
|
||||||
|
🏆 EXP-CONTENT-003: KEEP — 'thread' +23.4% lift (p=0.0312, 95% CI [8.2, 41.7]%)
|
||||||
|
📖 Playbook updated: format → 'thread'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Playbook
|
||||||
|
|
||||||
|
```
|
||||||
|
📖 CONTENT PLAYBOOK — Empirically Proven Best Practices
|
||||||
|
|
||||||
|
format: 'thread' (+23.4% on impressions, p=0.0312, 95% CI [8.2, 41.7])
|
||||||
|
Source: EXP-CONTENT-003 | Promoted: 2026-03-15
|
||||||
|
|
||||||
|
hook_style: 'contrarian' (+18.7% on clicks, p=0.0421, 95% CI [5.1, 34.2])
|
||||||
|
Source: EXP-CONTENT-007 | Promoted: 2026-03-22
|
||||||
|
```
|
||||||
|
|
||||||
|
### Weekly Scorecard
|
||||||
|
|
||||||
|
```
|
||||||
|
# AutoGrowth Weekly Scorecard — Week of Mar 17 – Mar 23, 2026
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
- Total experiments active: 4
|
||||||
|
- New experiments launched: 2
|
||||||
|
- Experiments completed: 3 (2 kept, 1 discarded)
|
||||||
|
- Total data points collected: 847
|
||||||
|
|
||||||
|
## 🏆 Big Wins (keep status this week)
|
||||||
|
### EXP-EMAIL-012 (email)
|
||||||
|
- Tested: subject_line_style → variant: question
|
||||||
|
- Metric value: 0.3420 | Sample n: 156
|
||||||
|
- Lift: 31.2% | p-value: 0.008
|
||||||
|
|
||||||
|
## 📈 Trending (watch these)
|
||||||
|
- EXP-CONTENT-015 (content) — variant `data_hook` leading at 0.1250 | 42 samples so far
|
||||||
|
```
|
||||||
|
|
||||||
|
### Pacing Alert
|
||||||
|
|
||||||
|
```
|
||||||
|
⚠️ Pacing Alert — Thu Mar 27 2:15 PM PDT
|
||||||
|
|
||||||
|
🟢 📧 Outbound Pipeline:
|
||||||
|
• 12 leads staged today | 8 approved | 6 sent
|
||||||
|
• Campaigns: 🟢 3/3 sending | 450 emails/day
|
||||||
|
|
||||||
|
🟡 🔍 Recruiting Pipeline:
|
||||||
|
• 5 candidates added today | 187 this week | target: 400/week
|
||||||
|
• Campaigns: 🟢 5/5 sending | 200 emails/day
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## How It Works
|
||||||
|
|
||||||
|
### The Autoresearch Loop
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────┐
|
||||||
|
│ │
|
||||||
|
│ 1. HYPOTHESIZE │
|
||||||
|
│ "Thread posts get 2x impressions" │
|
||||||
|
│ │ │
|
||||||
|
│ ▼ │
|
||||||
|
│ 2. EXPERIMENT │
|
||||||
|
│ Run variants, collect data points │
|
||||||
|
│ │ │
|
||||||
|
│ ▼ │
|
||||||
|
│ 3. ANALYZE │
|
||||||
|
│ Bootstrap CI + Mann-Whitney U │
|
||||||
|
│ p < 0.05 + lift ≥ 15% = winner │
|
||||||
|
│ │ │
|
||||||
|
│ ▼ │
|
||||||
|
│ 4. PROMOTE or DISCARD │
|
||||||
|
│ Winner → playbook (auto) │
|
||||||
|
│ Loser → discard pile (learned) │
|
||||||
|
│ │ │
|
||||||
|
│ ▼ │
|
||||||
|
│ 5. SUGGEST NEXT │
|
||||||
|
│ System identifies untested variables │
|
||||||
|
│ └──────────── loops back to 1 ─────┘ │
|
||||||
|
│ │
|
||||||
|
└─────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
### Statistical Methods
|
||||||
|
|
||||||
|
- **Mann-Whitney U test** — Non-parametric. Works with small samples. No normality assumption needed.
|
||||||
|
- **Bootstrap confidence intervals** — 1,000 resamples to estimate the true lift range.
|
||||||
|
- **Dual threshold** — Both statistical significance (p < 0.05) AND practical significance (≥15% lift) required to declare a winner. No more "statistically significant but useless" results.
|
||||||
|
- **Trending detection** — Early signal detection at p < 0.10 with 15+ samples, so you know what's promising before it's conclusive.
|
||||||
|
|
||||||
|
### Configurable Thresholds
|
||||||
|
|
||||||
|
| Parameter | Default | What It Controls |
|
||||||
|
|-----------|---------|-----------------|
|
||||||
|
| `P_WINNER` | 0.05 | p-value threshold for declaring a winner |
|
||||||
|
| `P_TREND` | 0.10 | p-value threshold for "trending" status |
|
||||||
|
| `LIFT_WIN` | 15.0% | Minimum lift required for "keep" decision |
|
||||||
|
| `BOOTSTRAP_ITERATIONS` | 1000 | Number of bootstrap resamples for CI |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
All configuration is via environment variables. See `.env.example` for the full list.
|
||||||
|
|
||||||
|
### Core Settings
|
||||||
|
|
||||||
|
| Variable | Description | Default |
|
||||||
|
|----------|-------------|---------|
|
||||||
|
| `GROWTH_ENGINE_DATA_DIR` | Where experiment data is stored | `./data/experiments` |
|
||||||
|
| `GROWTH_ENGINE_AGENTS` | Comma-separated list of agent names | `content,email,linkedin,seo,blog` |
|
||||||
|
| `HIGH_VOLUME_AGENTS` | Agents with fast data (fewer samples needed) | `content,email` |
|
||||||
|
| `LOW_VOLUME_AGENTS` | Agents with slow data (more samples needed) | `seo,linkedin,blog` |
|
||||||
|
|
||||||
|
### Pacing Alert Settings
|
||||||
|
|
||||||
|
| Variable | Description | Default |
|
||||||
|
|----------|-------------|---------|
|
||||||
|
| `PIPELINE_API_URL` | Your pipeline/CRM API endpoint | — |
|
||||||
|
| `PIPELINE_AUTH_TOKEN` | Bearer token for pipeline API | — |
|
||||||
|
| `EMAIL_API_URL` | Email platform API base URL | — |
|
||||||
|
| `EMAIL_AUTH_TOKEN` | Bearer token for email platform | — |
|
||||||
|
| `OUTBOUND_CAMPAIGNS` | JSON map of campaign name → ID | `{}` |
|
||||||
|
| `DAILY_LEAD_TARGET` | Minimum leads staged per day | `10` |
|
||||||
|
| `WEEKLY_CANDIDATE_TARGET` | Candidate sourcing weekly target | `400` |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Integrating with Your AI Agents
|
||||||
|
|
||||||
|
The growth engine is designed to be called by AI agents (Claude Code, GPT, etc.) as part of their workflow:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# In your agent's post-publishing hook:
|
||||||
|
import subprocess
|
||||||
|
|
||||||
|
# After publishing a social post, log the experiment data
|
||||||
|
subprocess.run([
|
||||||
|
"python3", "experiment-engine.py", "log",
|
||||||
|
"--agent", "content",
|
||||||
|
"--experiment-id", current_experiment_id,
|
||||||
|
"--variant", variant_used,
|
||||||
|
"--metrics", json.dumps({"impressions": post_impressions, "clicks": post_clicks})
|
||||||
|
])
|
||||||
|
|
||||||
|
# Periodically score experiments
|
||||||
|
subprocess.run([
|
||||||
|
"python3", "experiment-engine.py", "score",
|
||||||
|
"--agent", "content",
|
||||||
|
"--experiment-id", current_experiment_id
|
||||||
|
])
|
||||||
|
|
||||||
|
# Before creating new content, check the playbook
|
||||||
|
result = subprocess.run(
|
||||||
|
["python3", "experiment-engine.py", "playbook", "--agent", "content"],
|
||||||
|
capture_output=True, text=True
|
||||||
|
)
|
||||||
|
# Parse playbook rules and apply them to new content
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Project Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
growth-engine/
|
||||||
|
├── experiment-engine.py # Core experiment lifecycle engine
|
||||||
|
├── autogrowth-weekly-scorecard.py # Weekly report generator
|
||||||
|
├── pacing-alert.py # Campaign pacing monitor
|
||||||
|
├── requirements.txt # Python dependencies
|
||||||
|
├── .env.example # Environment variable template
|
||||||
|
├── SKILL.md # Claude Code skill definition
|
||||||
|
├── README.md # This file
|
||||||
|
└── data/ # Auto-created experiment data
|
||||||
|
└── experiments/
|
||||||
|
├── content/
|
||||||
|
│ ├── experiments.json # Experiment definitions + data
|
||||||
|
│ ├── playbook.json # Proven winners
|
||||||
|
│ └── active.json # Currently running experiments
|
||||||
|
├── email/
|
||||||
|
├── seo/
|
||||||
|
└── ...
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
MIT
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<b>Built by <a href="https://www.singlegrain.com">Single Grain</a></b><br>
|
||||||
|
We help companies grow with AI-powered marketing. This is how we do it internally.
|
||||||
|
</p>
|
||||||
130
growth-engine/SKILL.md
Normal file
130
growth-engine/SKILL.md
Normal file
|
|
@ -0,0 +1,130 @@
|
||||||
|
# Growth Engine
|
||||||
|
|
||||||
|
Autonomous growth experimentation framework based on Karpathy's autoresearch pattern applied to marketing. Creates experiments with hypotheses, logs data points, runs statistical analysis (bootstrap CI + Mann-Whitney U), auto-promotes winners to a living playbook, and suggests next experiments. Supports batch mode (up to 10 variants simultaneously).
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
Use this skill when:
|
||||||
|
- Creating or managing A/B or multivariate experiments for any marketing channel
|
||||||
|
- Logging experiment data points after content is published or campaigns run
|
||||||
|
- Scoring experiments to determine statistical winners
|
||||||
|
- Checking the playbook for proven best practices before creating new content
|
||||||
|
- Generating weekly scorecards across all channels
|
||||||
|
- Monitoring campaign pacing and health
|
||||||
|
|
||||||
|
Do NOT use for:
|
||||||
|
- One-off content creation (use the playbook output as input, but don't run the engine)
|
||||||
|
- Non-experiment analytics or reporting
|
||||||
|
- Campaign setup in external platforms (this tracks experiments, not campaign config)
|
||||||
|
|
||||||
|
## Commands
|
||||||
|
|
||||||
|
### Create an experiment
|
||||||
|
```bash
|
||||||
|
python3 experiment-engine.py create \
|
||||||
|
--agent <agent_name> \
|
||||||
|
--hypothesis "What you expect to happen" \
|
||||||
|
--variable "<variable_name>" \
|
||||||
|
--variants '["variant_a", "variant_b"]' \
|
||||||
|
--metric "<primary_metric>" \
|
||||||
|
--cycle-hours 24
|
||||||
|
```
|
||||||
|
|
||||||
|
Add `--batch-mode` for 3-10 variant tests. Add `--min-samples N` to override auto-detection.
|
||||||
|
|
||||||
|
### Log a data point
|
||||||
|
```bash
|
||||||
|
python3 experiment-engine.py log \
|
||||||
|
--agent <agent_name> \
|
||||||
|
--experiment-id <EXP-ID> \
|
||||||
|
--variant "<variant_name>" \
|
||||||
|
--metrics '{"metric_name": value}'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Score an experiment
|
||||||
|
```bash
|
||||||
|
python3 experiment-engine.py score --agent <agent_name> --experiment-id <EXP-ID>
|
||||||
|
```
|
||||||
|
|
||||||
|
Statuses: `running` → `trending` → `keep` (winner) or `discard` (loser)
|
||||||
|
|
||||||
|
Winners auto-promote to the playbook. Requires p < 0.05 AND ≥ 15% lift.
|
||||||
|
|
||||||
|
### List experiments
|
||||||
|
```bash
|
||||||
|
python3 experiment-engine.py list --agent <agent_name> [--status running|trending|keep|discard]
|
||||||
|
```
|
||||||
|
|
||||||
|
### Check the playbook
|
||||||
|
```bash
|
||||||
|
python3 experiment-engine.py playbook --agent <agent_name>
|
||||||
|
```
|
||||||
|
|
||||||
|
Always check the playbook before creating new content to apply proven best practices.
|
||||||
|
|
||||||
|
### Suggest next experiments
|
||||||
|
```bash
|
||||||
|
python3 experiment-engine.py suggest --agent <agent_name>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Generate weekly scorecard
|
||||||
|
```bash
|
||||||
|
python3 autogrowth-weekly-scorecard.py [--weeks N] [--output file.md]
|
||||||
|
```
|
||||||
|
|
||||||
|
### Check campaign pacing
|
||||||
|
```bash
|
||||||
|
python3 pacing-alert.py [--json]
|
||||||
|
```
|
||||||
|
|
||||||
|
Exit code 0 = on pace, 1 = alerts present.
|
||||||
|
|
||||||
|
## Workflow
|
||||||
|
|
||||||
|
1. Before creating content: `playbook` → apply proven rules
|
||||||
|
2. When publishing: `log` → record which variant was used and its metrics
|
||||||
|
3. Periodically: `score` → check if experiments have reached statistical significance
|
||||||
|
4. Weekly: `autogrowth-weekly-scorecard.py` → review all channels
|
||||||
|
5. After completing experiments: `suggest` → pick the next variable to test
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
### Required Environment Variables
|
||||||
|
|
||||||
|
| Variable | Description |
|
||||||
|
|----------|-------------|
|
||||||
|
| `GROWTH_ENGINE_DATA_DIR` | Data directory (default: `./data/experiments`) |
|
||||||
|
| `GROWTH_ENGINE_AGENTS` | Comma-separated agent names (default: `content,email,linkedin,seo,blog`) |
|
||||||
|
|
||||||
|
### Optional Tuning
|
||||||
|
|
||||||
|
| Variable | Default | Description |
|
||||||
|
|----------|---------|-------------|
|
||||||
|
| `HIGH_VOLUME_AGENTS` | `content,email` | Agents needing only 10 samples/variant |
|
||||||
|
| `LOW_VOLUME_AGENTS` | `seo,linkedin,blog` | Agents needing 30 samples/variant |
|
||||||
|
| `P_WINNER` | `0.05` | p-value threshold for winner |
|
||||||
|
| `P_TREND` | `0.10` | p-value threshold for trending |
|
||||||
|
| `LIFT_WIN` | `15.0` | Minimum % lift for keep decision |
|
||||||
|
| `BOOTSTRAP_ITERATIONS` | `1000` | Bootstrap resamples for CI |
|
||||||
|
| `BATCH_MODE_MAX_VARIANTS` | `10` | Max variants in batch mode |
|
||||||
|
|
||||||
|
### Pacing Alert Variables
|
||||||
|
|
||||||
|
| Variable | Description |
|
||||||
|
|----------|-------------|
|
||||||
|
| `PIPELINE_API_URL` | Pipeline/CRM API endpoint |
|
||||||
|
| `PIPELINE_AUTH_TOKEN` | Bearer token for pipeline API |
|
||||||
|
| `RECRUITING_API_URL` | Recruiting API endpoint |
|
||||||
|
| `RECRUITING_AUTH_TOKEN` | Bearer token for recruiting API |
|
||||||
|
| `EMAIL_API_URL` | Email platform API base URL |
|
||||||
|
| `EMAIL_AUTH_TOKEN` | Bearer token for email platform |
|
||||||
|
| `OUTBOUND_CAMPAIGNS` | JSON: `{"name": "campaign-id"}` |
|
||||||
|
| `RECRUITING_CAMPAIGNS` | JSON: `{"name": "campaign-id"}` |
|
||||||
|
| `DAILY_LEAD_TARGET` | Leads/day target (default: 10) |
|
||||||
|
| `WEEKLY_CANDIDATE_TARGET` | Candidates/week target (default: 400) |
|
||||||
|
|
||||||
|
### Dependencies
|
||||||
|
|
||||||
|
```
|
||||||
|
pip install numpy scipy
|
||||||
|
```
|
||||||
306
growth-engine/autogrowth-weekly-scorecard.py
Normal file
306
growth-engine/autogrowth-weekly-scorecard.py
Normal file
|
|
@ -0,0 +1,306 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
AutoGrowth Weekly Scorecard Generator
|
||||||
|
|
||||||
|
Reads experiment results and playbook data across all agents and generates
|
||||||
|
a weekly report showing wins, trends, running experiments, and discards.
|
||||||
|
|
||||||
|
Works with both JSON (from experiment-engine.py) and TSV data formats.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python3 autogrowth-weekly-scorecard.py # Current week
|
||||||
|
python3 autogrowth-weekly-scorecard.py --weeks 2 # Two weeks back
|
||||||
|
python3 autogrowth-weekly-scorecard.py --output report.md # Write to file
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import csv
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
from datetime import datetime, timedelta
|
||||||
|
from pathlib import Path
|
||||||
|
from collections import defaultdict
|
||||||
|
|
||||||
|
# ── Configuration ──────────────────────────────────────────────────────────────
|
||||||
|
# Base directory for experiment data. Must match experiment-engine.py setting.
|
||||||
|
BASE_DIR = Path(os.environ.get("GROWTH_ENGINE_DATA_DIR", "./data/experiments"))
|
||||||
|
|
||||||
|
# Agent names to scan. Customize to match your agent taxonomy.
|
||||||
|
AGENTS = os.environ.get("GROWTH_ENGINE_AGENTS", "content,email,linkedin,seo,blog").split(",")
|
||||||
|
|
||||||
|
RESULTS_COLS = ["experiment_id", "variable", "variant", "metric_value", "sample_n", "status", "date", "description"]
|
||||||
|
PLAYBOOK_COLS = ["experiment_id", "agent", "channel", "rule", "lift_pct", "p_value", "date_added", "notes"]
|
||||||
|
|
||||||
|
|
||||||
|
def parse_tsv(filepath, expected_cols):
|
||||||
|
"""Parse a TSV file, return list of dicts. Gracefully handles missing/empty files."""
|
||||||
|
rows = []
|
||||||
|
if not filepath.exists():
|
||||||
|
return rows
|
||||||
|
try:
|
||||||
|
with open(filepath, "r", encoding="utf-8") as f:
|
||||||
|
content = f.read().strip()
|
||||||
|
if not content:
|
||||||
|
return rows
|
||||||
|
reader = csv.DictReader(content.splitlines(), delimiter="\t")
|
||||||
|
if reader.fieldnames and reader.fieldnames[0].startswith("#"):
|
||||||
|
return rows
|
||||||
|
for row in reader:
|
||||||
|
rows.append(dict(row))
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
return rows
|
||||||
|
|
||||||
|
|
||||||
|
def safe_float(val, default=0.0):
|
||||||
|
try:
|
||||||
|
return float(val)
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
return default
|
||||||
|
|
||||||
|
|
||||||
|
def safe_int(val, default=0):
|
||||||
|
try:
|
||||||
|
return int(val)
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
return default
|
||||||
|
|
||||||
|
|
||||||
|
def week_range(weeks_back=1):
|
||||||
|
"""Return (start_date, end_date) for the target week (Mon-Sun)."""
|
||||||
|
today = datetime.now().date()
|
||||||
|
this_monday = today - timedelta(days=today.weekday())
|
||||||
|
start = this_monday - timedelta(weeks=weeks_back - 1)
|
||||||
|
end = start + timedelta(days=6)
|
||||||
|
return start, end
|
||||||
|
|
||||||
|
|
||||||
|
def in_week(date_str, start, end):
|
||||||
|
"""Check if a date string falls within the week range."""
|
||||||
|
if not date_str:
|
||||||
|
return True
|
||||||
|
for fmt in ("%Y-%m-%d", "%m/%d/%Y", "%Y/%m/%d", "%d-%m-%Y"):
|
||||||
|
try:
|
||||||
|
d = datetime.strptime(date_str.strip(), fmt).date()
|
||||||
|
return start <= d <= end
|
||||||
|
except ValueError:
|
||||||
|
continue
|
||||||
|
return True # include if unparseable
|
||||||
|
|
||||||
|
|
||||||
|
def load_all_results(weeks_back=1):
|
||||||
|
"""Load all results TSV rows across agents, filtered by week."""
|
||||||
|
start, end = week_range(weeks_back)
|
||||||
|
all_rows = []
|
||||||
|
for agent in AGENTS:
|
||||||
|
filepath = BASE_DIR / agent.strip() / "results.tsv"
|
||||||
|
rows = parse_tsv(filepath, RESULTS_COLS)
|
||||||
|
for row in rows:
|
||||||
|
row["_agent"] = agent.strip()
|
||||||
|
date_val = row.get("date", "")
|
||||||
|
if in_week(date_val, start, end):
|
||||||
|
all_rows.append(row)
|
||||||
|
return all_rows, start, end
|
||||||
|
|
||||||
|
|
||||||
|
def load_all_playbooks():
|
||||||
|
"""Load all playbook TSV rows across agents."""
|
||||||
|
all_rows = []
|
||||||
|
for agent in AGENTS:
|
||||||
|
filepath = BASE_DIR / agent.strip() / "playbook.tsv"
|
||||||
|
rows = parse_tsv(filepath, PLAYBOOK_COLS)
|
||||||
|
for row in rows:
|
||||||
|
row["_agent"] = agent.strip()
|
||||||
|
all_rows.append(row)
|
||||||
|
return all_rows
|
||||||
|
|
||||||
|
|
||||||
|
def generate_scorecard(weeks_back=1):
|
||||||
|
results, start, end = load_all_results(weeks_back)
|
||||||
|
playbook = load_all_playbooks()
|
||||||
|
|
||||||
|
week_label = f"{start.strftime('%b %d')} – {end.strftime('%b %d, %Y')}"
|
||||||
|
lines = []
|
||||||
|
|
||||||
|
lines.append(f"# AutoGrowth Weekly Scorecard — Week of {week_label}")
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
# ── Summary ──────────────────────────────────────────────────────────────
|
||||||
|
if not results:
|
||||||
|
active = new = completed = kept = discarded = data_points = 0
|
||||||
|
else:
|
||||||
|
statuses = [r.get("status", "").strip().lower() for r in results]
|
||||||
|
running_statuses = {"running", "active", "in_progress", "in progress"}
|
||||||
|
keep_statuses = {"keep", "winner", "kept", "significant"}
|
||||||
|
discard_statuses = {"discard", "discarded", "loser", "no_effect", "no effect"}
|
||||||
|
new_statuses = {"new", "launched"}
|
||||||
|
|
||||||
|
active = sum(1 for s in statuses if s in running_statuses)
|
||||||
|
new = sum(1 for s in statuses if s in new_statuses)
|
||||||
|
kept = sum(1 for s in statuses if s in keep_statuses)
|
||||||
|
discarded = sum(1 for s in statuses if s in discard_statuses)
|
||||||
|
completed = kept + discarded
|
||||||
|
data_points = sum(safe_int(r.get("sample_n", 0)) for r in results)
|
||||||
|
|
||||||
|
lines.append("## Summary")
|
||||||
|
lines.append(f"- Total experiments active: {active}")
|
||||||
|
lines.append(f"- New experiments launched: {new}")
|
||||||
|
lines.append(f"- Experiments completed: {completed} ({kept} kept, {discarded} discarded)")
|
||||||
|
lines.append(f"- Total data points collected: {data_points:,}")
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
# ── Big Wins ─────────────────────────────────────────────────────────────
|
||||||
|
lines.append("## 🏆 Big Wins (keep status this week)")
|
||||||
|
keep_statuses_set = {"keep", "winner", "kept", "significant"}
|
||||||
|
winners = [r for r in results if r.get("status", "").strip().lower() in keep_statuses_set]
|
||||||
|
winners.sort(key=lambda r: safe_float(r.get("metric_value", 0)), reverse=True)
|
||||||
|
|
||||||
|
if not winners:
|
||||||
|
lines.append("No data yet")
|
||||||
|
else:
|
||||||
|
for r in winners:
|
||||||
|
exp_id = r.get("experiment_id", "?")
|
||||||
|
agent = r.get("_agent", "?")
|
||||||
|
variable = r.get("variable", "?")
|
||||||
|
variant = r.get("variant", "?")
|
||||||
|
metric = safe_float(r.get("metric_value", 0))
|
||||||
|
n = safe_int(r.get("sample_n", 0))
|
||||||
|
desc = r.get("description", "")
|
||||||
|
lines.append(f"### {exp_id} ({agent})")
|
||||||
|
lines.append(f"- **Tested:** {variable} → variant: {variant}")
|
||||||
|
lines.append(f"- **Metric value:** {metric:.4f} | **Sample n:** {n:,}")
|
||||||
|
if desc:
|
||||||
|
lines.append(f"- **Description:** {desc}")
|
||||||
|
pb_match = [p for p in playbook if p.get("experiment_id", "") == exp_id]
|
||||||
|
if pb_match:
|
||||||
|
rule = pb_match[0].get("rule", "")
|
||||||
|
lift = pb_match[0].get("lift_pct", "")
|
||||||
|
p_val = pb_match[0].get("p_value", "")
|
||||||
|
lines.append(f"- **Playbook rule:** {rule}")
|
||||||
|
if lift:
|
||||||
|
lines.append(f"- **Lift:** {lift}% | **p-value:** {p_val}")
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
# ── Trending ──────────────────────────────────────────────────────────────
|
||||||
|
lines.append("## 📈 Trending (watch these)")
|
||||||
|
trending_statuses = {"trending", "watch", "promising"}
|
||||||
|
trending = [r for r in results if r.get("status", "").strip().lower() in trending_statuses]
|
||||||
|
trending.sort(key=lambda r: safe_float(r.get("metric_value", 0)), reverse=True)
|
||||||
|
|
||||||
|
if not trending:
|
||||||
|
lines.append("No data yet")
|
||||||
|
else:
|
||||||
|
for r in trending:
|
||||||
|
exp_id = r.get("experiment_id", "?")
|
||||||
|
agent = r.get("_agent", "?")
|
||||||
|
variant = r.get("variant", "?")
|
||||||
|
metric = safe_float(r.get("metric_value", 0))
|
||||||
|
n = safe_int(r.get("sample_n", 0))
|
||||||
|
lines.append(f"- **{exp_id}** ({agent}) — variant `{variant}` leading at {metric:.4f} | {n:,} samples so far")
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
# ── Running ───────────────────────────────────────────────────────────────
|
||||||
|
lines.append("## 🔬 Running (in progress)")
|
||||||
|
running_statuses_set = {"running", "active", "in_progress", "in progress"}
|
||||||
|
running = [r for r in results if r.get("status", "").strip().lower() in running_statuses_set]
|
||||||
|
|
||||||
|
if not running:
|
||||||
|
lines.append("No data yet")
|
||||||
|
else:
|
||||||
|
for r in running:
|
||||||
|
exp_id = r.get("experiment_id", "?")
|
||||||
|
agent = r.get("_agent", "?")
|
||||||
|
variable = r.get("variable", "?")
|
||||||
|
variant = r.get("variant", "?")
|
||||||
|
n = safe_int(r.get("sample_n", 0))
|
||||||
|
lines.append(f"- **{exp_id}** ({agent}): testing `{variable}` → `{variant}` — {n:,} samples")
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
# ── Discarded ─────────────────────────────────────────────────────────────
|
||||||
|
lines.append("## ❌ Discarded (didn't work)")
|
||||||
|
discard_statuses_set = {"discard", "discarded", "loser", "no_effect", "no effect"}
|
||||||
|
discarded_rows = [r for r in results if r.get("status", "").strip().lower() in discard_statuses_set]
|
||||||
|
|
||||||
|
if not discarded_rows:
|
||||||
|
lines.append("No data yet")
|
||||||
|
else:
|
||||||
|
for r in discarded_rows:
|
||||||
|
exp_id = r.get("experiment_id", "?")
|
||||||
|
agent = r.get("_agent", "?")
|
||||||
|
desc = r.get("description", "No significant effect found")
|
||||||
|
lines.append(f"- **{exp_id}** ({agent}): {desc}")
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
# ── Cumulative Playbook ───────────────────────────────────────────────────
|
||||||
|
lines.append("## 📊 Cumulative Playbook")
|
||||||
|
total_rules = len(playbook)
|
||||||
|
lines.append(f"- Total rules in playbook across all agents: {total_rules}")
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
if playbook:
|
||||||
|
sorted_pb = sorted(playbook, key=lambda p: safe_float(p.get("lift_pct", 0)), reverse=True)
|
||||||
|
lines.append("**Top 3 biggest lifts ever found:**")
|
||||||
|
for i, p in enumerate(sorted_pb[:3], 1):
|
||||||
|
exp_id = p.get("experiment_id", "?")
|
||||||
|
agent = p.get("_agent", "?")
|
||||||
|
rule = p.get("rule", "?")
|
||||||
|
lift = p.get("lift_pct", "?")
|
||||||
|
lines.append(f"{i}. **{exp_id}** ({agent}) — {lift}% lift: {rule}")
|
||||||
|
else:
|
||||||
|
lines.append("No playbook rules yet — experiments still running.")
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
# ── Next Week ─────────────────────────────────────────────────────────────
|
||||||
|
lines.append("## 📅 Next Week")
|
||||||
|
next_start = end + timedelta(days=1)
|
||||||
|
next_end = next_start + timedelta(days=6)
|
||||||
|
lines.append(f"Week of {next_start.strftime('%b %d')} – {next_end.strftime('%b %d, %Y')}")
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
planned_statuses = {"planned", "next", "queued", "upcoming"}
|
||||||
|
all_results_unfiltered = []
|
||||||
|
for agent in AGENTS:
|
||||||
|
filepath = BASE_DIR / agent.strip() / "results.tsv"
|
||||||
|
rows = parse_tsv(filepath, RESULTS_COLS)
|
||||||
|
for row in rows:
|
||||||
|
row["_agent"] = agent.strip()
|
||||||
|
all_results_unfiltered.append(row)
|
||||||
|
|
||||||
|
planned = [r for r in all_results_unfiltered if r.get("status", "").strip().lower() in planned_statuses]
|
||||||
|
if not planned:
|
||||||
|
lines.append("No new experiments scheduled yet. Add rows with status=planned to results.tsv files.")
|
||||||
|
else:
|
||||||
|
for r in planned:
|
||||||
|
exp_id = r.get("experiment_id", "?")
|
||||||
|
agent = r.get("_agent", "?")
|
||||||
|
variable = r.get("variable", "?")
|
||||||
|
variant = r.get("variant", "?")
|
||||||
|
lines.append(f"- **{exp_id}** ({agent}): launch `{variable}` test → `{variant}`")
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
lines.append("---")
|
||||||
|
lines.append(f"*Generated {datetime.now().strftime('%Y-%m-%d %H:%M')}*")
|
||||||
|
|
||||||
|
return "\n".join(l for l in lines)
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(description="AutoGrowth Weekly Scorecard Generator")
|
||||||
|
parser.add_argument("--weeks", type=int, default=1, help="How many weeks back to report (default: 1 = current week)")
|
||||||
|
parser.add_argument("--output", type=str, default=None, help="Write output to file instead of stdout")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
scorecard = generate_scorecard(weeks_back=args.weeks)
|
||||||
|
|
||||||
|
if args.output:
|
||||||
|
out_path = Path(args.output)
|
||||||
|
out_path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
with open(out_path, "w", encoding="utf-8") as f:
|
||||||
|
f.write(scorecard)
|
||||||
|
print(f"Scorecard written to {out_path}", file=sys.stderr)
|
||||||
|
else:
|
||||||
|
print(scorecard)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
493
growth-engine/experiment-engine.py
Normal file
493
growth-engine/experiment-engine.py
Normal file
|
|
@ -0,0 +1,493 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Experiment Engine — Autonomous growth experimentation for AI agents.
|
||||||
|
|
||||||
|
Inspired by Karpathy's autoresearch pattern: create experiments with hypotheses,
|
||||||
|
log data points, run statistical analysis (bootstrap CI + Mann-Whitney U),
|
||||||
|
auto-promote winners to a living playbook, and suggest next experiments.
|
||||||
|
|
||||||
|
Supports batch mode (up to 10 variants simultaneously).
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
# Create a new experiment
|
||||||
|
python3 experiment-engine.py create --agent content --hypothesis "Thread posts get 2x impressions vs single posts" \
|
||||||
|
--variable "format" --variants '["thread", "single"]' --metric "impressions" --cycle-hours 8
|
||||||
|
|
||||||
|
# Log a data point for a running experiment
|
||||||
|
python3 experiment-engine.py log --agent content --experiment-id EXP-001 --variant "thread" \
|
||||||
|
--metrics '{"impressions": 4500, "clicks": 120, "replies": 8}'
|
||||||
|
|
||||||
|
# Score an experiment (auto-promotes winner if criteria met)
|
||||||
|
python3 experiment-engine.py score --agent content --experiment-id EXP-001
|
||||||
|
|
||||||
|
# List active experiments for an agent
|
||||||
|
python3 experiment-engine.py list --agent content
|
||||||
|
|
||||||
|
# Get current best practices (promoted winners)
|
||||||
|
python3 experiment-engine.py playbook --agent content
|
||||||
|
|
||||||
|
# Suggest next experiment based on gaps
|
||||||
|
python3 experiment-engine.py suggest --agent content
|
||||||
|
"""
|
||||||
|
import argparse, json, os, sys
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
from scipy import stats
|
||||||
|
|
||||||
|
# ── Configuration ──────────────────────────────────────────────────────────────
|
||||||
|
# Base directory for experiment data. Override with GROWTH_ENGINE_DATA_DIR env var.
|
||||||
|
BASE_DIR = Path(os.environ.get("GROWTH_ENGINE_DATA_DIR", "./data/experiments"))
|
||||||
|
|
||||||
|
# Define your agent/channel taxonomy. High-volume channels need fewer samples
|
||||||
|
# per variant because data arrives faster. Adjust to match your setup.
|
||||||
|
HIGH_VOLUME_AGENTS = set(os.environ.get("HIGH_VOLUME_AGENTS", "content,email").split(","))
|
||||||
|
LOW_VOLUME_AGENTS = set(os.environ.get("LOW_VOLUME_AGENTS", "seo,linkedin,blog").split(","))
|
||||||
|
|
||||||
|
# Batch mode: allow up to this many variants simultaneously (vs simple A/B)
|
||||||
|
BATCH_MODE_MAX_VARIANTS = int(os.environ.get("BATCH_MODE_MAX_VARIANTS", "10"))
|
||||||
|
|
||||||
|
# Map agent names to their marketing channels. Customize for your org.
|
||||||
|
AGENT_CHANNEL = {
|
||||||
|
"content": "social",
|
||||||
|
"email": "email",
|
||||||
|
"linkedin": "linkedin",
|
||||||
|
"seo": "seo",
|
||||||
|
"blog": "blog",
|
||||||
|
}
|
||||||
|
|
||||||
|
# Statistical parameters
|
||||||
|
BOOTSTRAP_ITERATIONS = int(os.environ.get("BOOTSTRAP_ITERATIONS", "1000"))
|
||||||
|
P_WINNER = float(os.environ.get("P_WINNER", "0.05")) # p-value threshold for declaring a winner
|
||||||
|
P_TREND = float(os.environ.get("P_TREND", "0.10")) # p-value threshold for "trending" status
|
||||||
|
LIFT_WIN = float(os.environ.get("LIFT_WIN", "15.0")) # minimum % lift required for "keep" decision
|
||||||
|
|
||||||
|
|
||||||
|
def get_min_samples(agent: str, override: int | None = None) -> int:
|
||||||
|
"""Return minimum samples per variant before scoring.
|
||||||
|
High-volume channels (email, social) need fewer samples (10).
|
||||||
|
Low-volume channels (SEO, blog) need more (30) for reliable signal.
|
||||||
|
Explicit override wins if > 3.
|
||||||
|
"""
|
||||||
|
if override is not None and override > 3:
|
||||||
|
return override
|
||||||
|
return 10 if agent in HIGH_VOLUME_AGENTS else 30
|
||||||
|
|
||||||
|
|
||||||
|
def bootstrap_lift_ci(a_vals, b_vals, n_iter=BOOTSTRAP_ITERATIONS, ci=95):
|
||||||
|
"""Bootstrap confidence interval for lift = (mean(b) - mean(a)) / mean(a) * 100.
|
||||||
|
Returns (lower_bound, upper_bound) as percentages, or (None, None) if baseline is zero.
|
||||||
|
"""
|
||||||
|
a = np.array(a_vals, dtype=float)
|
||||||
|
b = np.array(b_vals, dtype=float)
|
||||||
|
lifts = []
|
||||||
|
rng = np.random.default_rng(42)
|
||||||
|
for _ in range(n_iter):
|
||||||
|
sa = rng.choice(a, size=len(a), replace=True)
|
||||||
|
sb = rng.choice(b, size=len(b), replace=True)
|
||||||
|
baseline_mean = sa.mean()
|
||||||
|
if baseline_mean == 0:
|
||||||
|
continue
|
||||||
|
lifts.append((sb.mean() - baseline_mean) / baseline_mean * 100)
|
||||||
|
if not lifts:
|
||||||
|
return None, None
|
||||||
|
lo = float(np.percentile(lifts, (100 - ci) / 2))
|
||||||
|
hi = float(np.percentile(lifts, 100 - (100 - ci) / 2))
|
||||||
|
return round(lo, 1), round(hi, 1)
|
||||||
|
|
||||||
|
|
||||||
|
def get_agent_dir(agent):
|
||||||
|
d = BASE_DIR / agent
|
||||||
|
d.mkdir(parents=True, exist_ok=True)
|
||||||
|
return d
|
||||||
|
|
||||||
|
|
||||||
|
def load_json(path, default=None):
|
||||||
|
if path.exists():
|
||||||
|
return json.loads(path.read_text())
|
||||||
|
return default if default is not None else {}
|
||||||
|
|
||||||
|
|
||||||
|
def save_json(path, data):
|
||||||
|
path.write_text(json.dumps(data, indent=2, default=str))
|
||||||
|
|
||||||
|
|
||||||
|
def next_id(agent):
|
||||||
|
d = get_agent_dir(agent)
|
||||||
|
experiments = load_json(d / "experiments.json", [])
|
||||||
|
return f"EXP-{agent.upper()}-{len(experiments)+1:03d}"
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_create(args):
|
||||||
|
d = get_agent_dir(args.agent)
|
||||||
|
experiments = load_json(d / "experiments.json", [])
|
||||||
|
|
||||||
|
exp_id = next_id(args.agent)
|
||||||
|
min_s = get_min_samples(args.agent, args.min_samples if args.min_samples != 3 else None)
|
||||||
|
|
||||||
|
variants = json.loads(args.variants)
|
||||||
|
batch_mode = getattr(args, "batch_mode", False)
|
||||||
|
if batch_mode and len(variants) > BATCH_MODE_MAX_VARIANTS:
|
||||||
|
print(f"⚠️ Batch mode capped at {BATCH_MODE_MAX_VARIANTS} variants (got {len(variants)})")
|
||||||
|
variants = variants[:BATCH_MODE_MAX_VARIANTS]
|
||||||
|
|
||||||
|
experiment = {
|
||||||
|
"id": exp_id,
|
||||||
|
"agent": args.agent,
|
||||||
|
"channel": AGENT_CHANNEL.get(args.agent, "unknown"),
|
||||||
|
"hypothesis": args.hypothesis,
|
||||||
|
"variable": args.variable,
|
||||||
|
"variants": variants,
|
||||||
|
"primary_metric": args.metric,
|
||||||
|
"cycle_hours": args.cycle_hours,
|
||||||
|
"min_samples": min_s,
|
||||||
|
"batch_mode": batch_mode,
|
||||||
|
"max_variants": BATCH_MODE_MAX_VARIANTS if batch_mode else 2,
|
||||||
|
"status": "running",
|
||||||
|
"created_at": datetime.now(timezone.utc).isoformat(),
|
||||||
|
"data_points": [],
|
||||||
|
"baseline_variant": variants[0],
|
||||||
|
"result": None,
|
||||||
|
"winner": None
|
||||||
|
}
|
||||||
|
|
||||||
|
experiments.append(experiment)
|
||||||
|
save_json(d / "experiments.json", experiments)
|
||||||
|
|
||||||
|
# Update active experiments index
|
||||||
|
active = load_json(d / "active.json", [])
|
||||||
|
active.append({"id": exp_id, "variable": args.variable, "variants": experiment["variants"],
|
||||||
|
"current_variant_idx": 0})
|
||||||
|
save_json(d / "active.json", active)
|
||||||
|
|
||||||
|
mode_str = f"BATCH ({len(variants)} variants)" if batch_mode else "A/B"
|
||||||
|
print(f"✅ Created {exp_id}: {args.hypothesis}")
|
||||||
|
print(f" Channel: {experiment['channel']} | Variable: {args.variable} | Mode: {mode_str}")
|
||||||
|
print(f" Variants: {experiment['variants']}")
|
||||||
|
print(f" Metric: {args.metric} | Cycle: {args.cycle_hours}h | Min samples/variant: {min_s}")
|
||||||
|
return exp_id
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_log(args):
|
||||||
|
d = get_agent_dir(args.agent)
|
||||||
|
experiments = load_json(d / "experiments.json", [])
|
||||||
|
|
||||||
|
for exp in experiments:
|
||||||
|
if exp["id"] == args.experiment_id:
|
||||||
|
dp = {
|
||||||
|
"variant": args.variant,
|
||||||
|
"metrics": json.loads(args.metrics),
|
||||||
|
"logged_at": datetime.now(timezone.utc).isoformat(),
|
||||||
|
"notes": args.notes or ""
|
||||||
|
}
|
||||||
|
exp["data_points"].append(dp)
|
||||||
|
save_json(d / "experiments.json", experiments)
|
||||||
|
print(f"✅ Logged data point for {args.experiment_id} variant '{args.variant}': {dp['metrics']}")
|
||||||
|
return
|
||||||
|
|
||||||
|
print(f"❌ Experiment {args.experiment_id} not found")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_score(args):
|
||||||
|
d = get_agent_dir(args.agent)
|
||||||
|
experiments = load_json(d / "experiments.json", [])
|
||||||
|
|
||||||
|
for exp in experiments:
|
||||||
|
if exp["id"] == args.experiment_id and exp["status"] in ("running", "active", "trending"):
|
||||||
|
# Group data points by variant
|
||||||
|
variant_data = {}
|
||||||
|
for dp in exp["data_points"]:
|
||||||
|
v = dp["variant"]
|
||||||
|
if v not in variant_data:
|
||||||
|
variant_data[v] = []
|
||||||
|
variant_data[v].append(dp["metrics"].get(exp["primary_metric"], 0))
|
||||||
|
|
||||||
|
baseline_v = exp["baseline_variant"]
|
||||||
|
min_samples = exp.get("min_samples",
|
||||||
|
get_min_samples(exp["agent"]) if "agent" in exp else 15)
|
||||||
|
|
||||||
|
# Enforce per-variant sample floor
|
||||||
|
insufficient = []
|
||||||
|
for v, data in variant_data.items():
|
||||||
|
if len(data) < min_samples:
|
||||||
|
insufficient.append((v, len(data)))
|
||||||
|
|
||||||
|
if insufficient:
|
||||||
|
for v, n in insufficient:
|
||||||
|
print(f"⏳ {exp['id']}: Variant '{v}' has {n}/{min_samples} samples. Need more data.")
|
||||||
|
# Check for trending signal even with fewer samples (need at least 15)
|
||||||
|
all_counts = {v: len(data) for v, data in variant_data.items()}
|
||||||
|
min_count = min(all_counts.values()) if all_counts else 0
|
||||||
|
if min_count >= 15 and baseline_v in variant_data:
|
||||||
|
baseline_vals = variant_data[baseline_v]
|
||||||
|
best_trend_v, best_trend_p = None, 1.0
|
||||||
|
for v, vals in variant_data.items():
|
||||||
|
if v == baseline_v or len(vals) < 15:
|
||||||
|
continue
|
||||||
|
_, p = stats.mannwhitneyu(baseline_vals, vals, alternative="less")
|
||||||
|
if p < P_TREND and p < best_trend_p:
|
||||||
|
best_trend_p = p
|
||||||
|
best_trend_v = v
|
||||||
|
if best_trend_v:
|
||||||
|
exp["status"] = "trending"
|
||||||
|
save_json(d / "experiments.json", experiments)
|
||||||
|
lift = (np.mean(variant_data[best_trend_v]) - np.mean(baseline_vals)) / np.mean(baseline_vals) * 100 if np.mean(baseline_vals) else 0
|
||||||
|
print(f"📈 {exp['id']}: TRENDING — '{best_trend_v}' p={best_trend_p:.3f}, lift={lift:.1f}% (needs more samples to confirm)")
|
||||||
|
return
|
||||||
|
|
||||||
|
if not variant_data:
|
||||||
|
print(f"⏳ {exp['id']}: No data points yet.")
|
||||||
|
return
|
||||||
|
|
||||||
|
baseline_vals = np.array(variant_data.get(baseline_v, []), dtype=float)
|
||||||
|
if len(baseline_vals) < min_samples:
|
||||||
|
print(f"⏳ {exp['id']}: Baseline variant '{baseline_v}' has {len(baseline_vals)}/{min_samples} samples.")
|
||||||
|
return
|
||||||
|
|
||||||
|
# Evaluate all non-baseline variants
|
||||||
|
results = []
|
||||||
|
for v, vals in variant_data.items():
|
||||||
|
if v == baseline_v:
|
||||||
|
continue
|
||||||
|
arr = np.array(vals, dtype=float)
|
||||||
|
baseline_mean = baseline_vals.mean()
|
||||||
|
variant_mean = arr.mean()
|
||||||
|
lift = ((variant_mean - baseline_mean) / baseline_mean * 100) if baseline_mean != 0 else 0
|
||||||
|
|
||||||
|
# Mann-Whitney U test (non-parametric, no normality assumption)
|
||||||
|
_, p_two = stats.mannwhitneyu(baseline_vals, arr, alternative="two-sided")
|
||||||
|
_, p_less = stats.mannwhitneyu(baseline_vals, arr, alternative="less")
|
||||||
|
|
||||||
|
ci_lo, ci_hi = bootstrap_lift_ci(baseline_vals.tolist(), arr.tolist())
|
||||||
|
|
||||||
|
if p_less < P_WINNER and lift >= LIFT_WIN:
|
||||||
|
status = "keep"
|
||||||
|
elif p_two < P_WINNER and lift < 0:
|
||||||
|
status = "crash" if lift <= -LIFT_WIN else "discard"
|
||||||
|
elif p_less < P_TREND and len(vals) >= 15:
|
||||||
|
status = "trending"
|
||||||
|
else:
|
||||||
|
status = "running"
|
||||||
|
|
||||||
|
results.append({
|
||||||
|
"variant": v,
|
||||||
|
"mean": round(float(variant_mean), 2),
|
||||||
|
"lift_pct": round(lift, 1),
|
||||||
|
"p_value": round(float(p_less), 4),
|
||||||
|
"ci_95": [ci_lo, ci_hi],
|
||||||
|
"n": len(vals),
|
||||||
|
"status": status
|
||||||
|
})
|
||||||
|
|
||||||
|
baseline_mean = float(baseline_vals.mean())
|
||||||
|
overall_result = {
|
||||||
|
"baseline": baseline_v,
|
||||||
|
"baseline_mean": round(baseline_mean, 2),
|
||||||
|
"baseline_n": len(baseline_vals),
|
||||||
|
"variants": results,
|
||||||
|
"scored_at": datetime.now(timezone.utc).isoformat(),
|
||||||
|
"min_samples": min_samples,
|
||||||
|
"thresholds": {"p_winner": P_WINNER, "p_trend": P_TREND, "lift_pct_required": LIFT_WIN}
|
||||||
|
}
|
||||||
|
|
||||||
|
winners = [r for r in results if r["status"] == "keep"]
|
||||||
|
crashes = [r for r in results if r["status"] in ("crash", "discard")]
|
||||||
|
trending = [r for r in results if r["status"] == "trending"]
|
||||||
|
|
||||||
|
if winners:
|
||||||
|
best = max(winners, key=lambda r: r["lift_pct"])
|
||||||
|
exp["status"] = "keep"
|
||||||
|
exp["winner"] = best["variant"]
|
||||||
|
exp["result"] = overall_result
|
||||||
|
save_json(d / "experiments.json", experiments)
|
||||||
|
|
||||||
|
# Auto-promote to playbook
|
||||||
|
playbook = load_json(d / "playbook.json", {})
|
||||||
|
playbook[exp["variable"]] = {
|
||||||
|
"best": best["variant"],
|
||||||
|
"metric": exp["primary_metric"],
|
||||||
|
"avg": best["mean"],
|
||||||
|
"improvement": best["lift_pct"],
|
||||||
|
"p_value": best["p_value"],
|
||||||
|
"ci_95": best["ci_95"],
|
||||||
|
"experiment_id": exp["id"],
|
||||||
|
"promoted_at": datetime.now(timezone.utc).isoformat()
|
||||||
|
}
|
||||||
|
save_json(d / "playbook.json", playbook)
|
||||||
|
|
||||||
|
# Remove from active index
|
||||||
|
active = load_json(d / "active.json", [])
|
||||||
|
active = [a for a in active if a["id"] != exp["id"]]
|
||||||
|
save_json(d / "active.json", active)
|
||||||
|
|
||||||
|
print(f"🏆 {exp['id']}: KEEP — '{best['variant']}' +{best['lift_pct']}% lift "
|
||||||
|
f"(p={best['p_value']}, 95% CI [{best['ci_95'][0]}, {best['ci_95'][1]}]%)")
|
||||||
|
print(f" 📖 Playbook updated: {exp['variable']} → '{best['variant']}'")
|
||||||
|
|
||||||
|
elif all(r["status"] in ("crash", "discard") for r in results) and results:
|
||||||
|
worst = min(results, key=lambda r: r["lift_pct"])
|
||||||
|
exp["status"] = "discard"
|
||||||
|
exp["result"] = overall_result
|
||||||
|
save_json(d / "experiments.json", experiments)
|
||||||
|
active = load_json(d / "active.json", [])
|
||||||
|
active = [a for a in active if a["id"] != exp["id"]]
|
||||||
|
save_json(d / "active.json", active)
|
||||||
|
print(f"💀 {exp['id']}: DISCARD — baseline wins. Best variant: '{worst['variant']}' "
|
||||||
|
f"at {worst['lift_pct']}% (p={worst['p_value']})")
|
||||||
|
|
||||||
|
elif trending:
|
||||||
|
exp["status"] = "trending"
|
||||||
|
exp["result"] = overall_result
|
||||||
|
save_json(d / "experiments.json", experiments)
|
||||||
|
best_t = max(trending, key=lambda r: r["lift_pct"])
|
||||||
|
print(f"📈 {exp['id']}: TRENDING — '{best_t['variant']}' +{best_t['lift_pct']}% "
|
||||||
|
f"(p={best_t['p_value']}, n={best_t['n']}). Keep collecting data.")
|
||||||
|
|
||||||
|
else:
|
||||||
|
exp["status"] = "running"
|
||||||
|
exp["result"] = overall_result
|
||||||
|
save_json(d / "experiments.json", experiments)
|
||||||
|
for r in results:
|
||||||
|
print(f"⏳ {exp['id']}: '{r['variant']}' {r['lift_pct']:+.1f}% lift, p={r['p_value']} — running")
|
||||||
|
return
|
||||||
|
|
||||||
|
print(f"❌ Active experiment {args.experiment_id} not found")
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_list(args):
|
||||||
|
d = get_agent_dir(args.agent)
|
||||||
|
experiments = load_json(d / "experiments.json", [])
|
||||||
|
|
||||||
|
status_filter = args.status or "all"
|
||||||
|
icons = {
|
||||||
|
"running": "🔬", "active": "🔬",
|
||||||
|
"trending": "📈",
|
||||||
|
"keep": "🏆", "promoted": "🏆",
|
||||||
|
"discard": "💀", "killed": "💀",
|
||||||
|
"crash": "🔴",
|
||||||
|
"inconclusive": "🤷"
|
||||||
|
}
|
||||||
|
for exp in experiments:
|
||||||
|
s = exp["status"]
|
||||||
|
if status_filter != "all" and s != status_filter:
|
||||||
|
aliases = {"active": "running", "promoted": "keep", "killed": "discard"}
|
||||||
|
if aliases.get(s) != status_filter and s != status_filter:
|
||||||
|
continue
|
||||||
|
dp_count = len(exp.get("data_points", []))
|
||||||
|
icon = icons.get(s, "❓")
|
||||||
|
ch = exp.get("channel", AGENT_CHANNEL.get(exp["agent"], "?"))
|
||||||
|
print(f"{icon} {exp['id']}: {exp['hypothesis']}")
|
||||||
|
print(f" Variable: {exp['variable']} | Channel: {ch} | Status: {s} | Data points: {dp_count}")
|
||||||
|
if exp.get("winner"):
|
||||||
|
result = exp.get("result", {})
|
||||||
|
lift = ""
|
||||||
|
if isinstance(result, dict):
|
||||||
|
for vr in result.get("variants", []):
|
||||||
|
if vr["variant"] == exp["winner"]:
|
||||||
|
lift = f" ({vr['lift_pct']:+.1f}% lift, p={vr['p_value']})"
|
||||||
|
break
|
||||||
|
print(f" Winner: {exp['winner']}{lift}")
|
||||||
|
print()
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_playbook(args):
|
||||||
|
d = get_agent_dir(args.agent)
|
||||||
|
playbook = load_json(d / "playbook.json", {})
|
||||||
|
|
||||||
|
if not playbook:
|
||||||
|
print(f"📖 No playbook entries for {args.agent} yet. Run some experiments!")
|
||||||
|
return
|
||||||
|
|
||||||
|
print(f"📖 {args.agent.upper()} PLAYBOOK — Empirically Proven Best Practices\n")
|
||||||
|
for variable, entry in playbook.items():
|
||||||
|
p_str = f", p={entry['p_value']}" if "p_value" in entry else ""
|
||||||
|
ci_str = f", 95% CI {entry['ci_95']}" if "ci_95" in entry else ""
|
||||||
|
print(f" {variable}: '{entry['best']}' (+{entry['improvement']}% on {entry['metric']}{p_str}{ci_str})")
|
||||||
|
print(f" Source: {entry['experiment_id']} | Promoted: {entry['promoted_at'][:10]}")
|
||||||
|
print()
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_suggest(args):
|
||||||
|
d = get_agent_dir(args.agent)
|
||||||
|
experiments = load_json(d / "experiments.json", [])
|
||||||
|
playbook = load_json(d / "playbook.json", {})
|
||||||
|
|
||||||
|
# Define testable categories per channel. Customize these for your business.
|
||||||
|
categories = {
|
||||||
|
"content": ["hook_style", "post_format", "cta_type", "post_time", "thread_length",
|
||||||
|
"emoji_usage", "data_vs_narrative", "question_vs_statement"],
|
||||||
|
"email": ["subject_line_style", "opener_type", "email_length", "personalization_depth",
|
||||||
|
"cta_style", "send_time", "follow_up_timing", "social_proof_type"],
|
||||||
|
"linkedin": ["inmail_opener", "role_framing", "company_pitch", "personalization_level",
|
||||||
|
"subject_line", "follow_up_cadence"],
|
||||||
|
"blog": ["headline_style", "content_format", "platform_priority", "visual_style",
|
||||||
|
"posting_time", "content_length"],
|
||||||
|
"seo": ["title_tag_format", "meta_description_style", "content_structure",
|
||||||
|
"internal_linking", "heading_format"]
|
||||||
|
}
|
||||||
|
|
||||||
|
tested = set(playbook.keys())
|
||||||
|
tested.update(e["variable"] for e in experiments if e["status"] in ("running", "active", "trending"))
|
||||||
|
agent_cats = categories.get(args.agent, [])
|
||||||
|
untested = [c for c in agent_cats if c not in tested]
|
||||||
|
min_s = get_min_samples(args.agent)
|
||||||
|
ch = AGENT_CHANNEL.get(args.agent, "?")
|
||||||
|
|
||||||
|
if untested:
|
||||||
|
print(f"💡 Suggested next experiments for {args.agent} ({ch}, min {min_s} samples/variant):")
|
||||||
|
for cat in untested[:3]:
|
||||||
|
print(f" → {cat}")
|
||||||
|
else:
|
||||||
|
print(f"✅ {args.agent} has tested all standard categories. Time for advanced experiments!")
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(description="Experiment Engine — Autonomous growth experimentation")
|
||||||
|
sub = parser.add_subparsers(dest="command")
|
||||||
|
|
||||||
|
p_create = sub.add_parser("create", help="Create a new experiment")
|
||||||
|
p_create.add_argument("--agent", required=True, help="Agent/channel name (e.g., content, email, seo)")
|
||||||
|
p_create.add_argument("--hypothesis", required=True, help="What you're testing and expected outcome")
|
||||||
|
p_create.add_argument("--variable", required=True, help="The variable being tested (e.g., hook_style)")
|
||||||
|
p_create.add_argument("--variants", required=True, help="JSON array of variant names")
|
||||||
|
p_create.add_argument("--metric", required=True, help="Primary metric to optimize (e.g., impressions)")
|
||||||
|
p_create.add_argument("--cycle-hours", type=int, default=24, help="Hours per experiment cycle (default: 24)")
|
||||||
|
p_create.add_argument("--min-samples", type=int, default=3,
|
||||||
|
help="Override min samples/variant (default: auto based on channel volume)")
|
||||||
|
p_create.add_argument("--batch-mode", action="store_true",
|
||||||
|
help="Enable batch mode: up to 10 variants simultaneously")
|
||||||
|
|
||||||
|
p_log = sub.add_parser("log", help="Log a data point for a running experiment")
|
||||||
|
p_log.add_argument("--agent", required=True)
|
||||||
|
p_log.add_argument("--experiment-id", required=True)
|
||||||
|
p_log.add_argument("--variant", required=True)
|
||||||
|
p_log.add_argument("--metrics", required=True, help="JSON object of metric values")
|
||||||
|
p_log.add_argument("--notes", default="")
|
||||||
|
|
||||||
|
p_score = sub.add_parser("score", help="Score an experiment (auto-promotes winners)")
|
||||||
|
p_score.add_argument("--agent", required=True)
|
||||||
|
p_score.add_argument("--experiment-id", required=True)
|
||||||
|
|
||||||
|
p_list = sub.add_parser("list", help="List experiments for an agent")
|
||||||
|
p_list.add_argument("--agent", required=True)
|
||||||
|
p_list.add_argument("--status", default="all", help="Filter by status (running/trending/keep/discard/all)")
|
||||||
|
|
||||||
|
p_play = sub.add_parser("playbook", help="Show empirically proven best practices")
|
||||||
|
p_play.add_argument("--agent", required=True)
|
||||||
|
|
||||||
|
p_sug = sub.add_parser("suggest", help="Suggest next experiments based on gaps")
|
||||||
|
p_sug.add_argument("--agent", required=True)
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
if not args.command:
|
||||||
|
parser.print_help()
|
||||||
|
return
|
||||||
|
|
||||||
|
{"create": cmd_create, "log": cmd_log, "score": cmd_score,
|
||||||
|
"list": cmd_list, "playbook": cmd_playbook, "suggest": cmd_suggest}[args.command](args)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
342
growth-engine/pacing-alert.py
Normal file
342
growth-engine/pacing-alert.py
Normal file
|
|
@ -0,0 +1,342 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
pacing-alert.py — API-based pacing check for marketing campaigns.
|
||||||
|
|
||||||
|
Monitors campaign health across channels:
|
||||||
|
- Checks pipeline/lead staging rates against daily targets
|
||||||
|
- Monitors email campaign sending status and capacity
|
||||||
|
- Tracks candidate sourcing pacing against weekly targets
|
||||||
|
- Reports cron job health
|
||||||
|
|
||||||
|
Exit 0 = on pace (all green), Exit 1 = alert needed (output contains issues).
|
||||||
|
|
||||||
|
Configure via environment variables (see .env.example).
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python3 pacing-alert.py # Run full pacing check
|
||||||
|
python3 pacing-alert.py --json # Output as JSON instead of formatted text
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import sys
|
||||||
|
import json
|
||||||
|
import subprocess
|
||||||
|
from datetime import datetime, timezone, timedelta
|
||||||
|
import urllib.request
|
||||||
|
import urllib.error
|
||||||
|
import os
|
||||||
|
|
||||||
|
# ── Configuration ──────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
# API authentication tokens (set via environment variables)
|
||||||
|
PIPELINE_API_URL = os.environ.get("PIPELINE_API_URL", "https://your-dashboard.example.com/api/pipeline")
|
||||||
|
PIPELINE_AUTH = os.environ.get("PIPELINE_AUTH_TOKEN", "") # Bearer token for pipeline API
|
||||||
|
|
||||||
|
RECRUITING_API_URL = os.environ.get("RECRUITING_API_URL", "https://your-dashboard.example.com/api/recruiting/candidates")
|
||||||
|
RECRUITING_AUTH = os.environ.get("RECRUITING_AUTH_TOKEN", "") # Bearer token for recruiting API
|
||||||
|
|
||||||
|
# Email platform API (e.g., Instantly, Lemlist, Smartlead)
|
||||||
|
EMAIL_API_URL = os.environ.get("EMAIL_API_URL", "") # e.g., https://api.your-email-platform.com/v2/campaigns
|
||||||
|
EMAIL_AUTH = os.environ.get("EMAIL_AUTH_TOKEN", "") # Bearer token for email platform
|
||||||
|
|
||||||
|
# Campaign IDs for outbound email. Format: JSON object {"Campaign Name": "campaign-uuid"}
|
||||||
|
OUTBOUND_CAMPAIGNS = json.loads(os.environ.get("OUTBOUND_CAMPAIGNS", "{}"))
|
||||||
|
RECRUITING_CAMPAIGNS = json.loads(os.environ.get("RECRUITING_CAMPAIGNS", "{}"))
|
||||||
|
|
||||||
|
# Pacing targets
|
||||||
|
DAILY_LEAD_TARGET = int(os.environ.get("DAILY_LEAD_TARGET", "10")) # Min leads staged per day
|
||||||
|
WEEKLY_CANDIDATE_TARGET = int(os.environ.get("WEEKLY_CANDIDATE_TARGET", "400")) # Candidates per week
|
||||||
|
|
||||||
|
# Timezone offset from UTC (e.g., -7 for PDT, -8 for PST)
|
||||||
|
TZ_OFFSET = int(os.environ.get("TZ_OFFSET", "-7"))
|
||||||
|
LOCAL_TZ = timezone(timedelta(hours=TZ_OFFSET))
|
||||||
|
TZ_LABEL = os.environ.get("TZ_LABEL", "PDT")
|
||||||
|
|
||||||
|
# ── Helpers ────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def api_get(url, auth):
|
||||||
|
"""Make authenticated GET request. Returns parsed JSON or error dict."""
|
||||||
|
headers = {"Content-Type": "application/json"}
|
||||||
|
if auth:
|
||||||
|
headers["Authorization"] = auth if auth.startswith("Bearer ") else f"Bearer {auth}"
|
||||||
|
req = urllib.request.Request(url, headers=headers)
|
||||||
|
try:
|
||||||
|
with urllib.request.urlopen(req, timeout=15) as r:
|
||||||
|
return json.loads(r.read())
|
||||||
|
except urllib.error.HTTPError as e:
|
||||||
|
return {"_error": f"HTTP {e.code}"}
|
||||||
|
except Exception as e:
|
||||||
|
return {"_error": str(e)}
|
||||||
|
|
||||||
|
def now_local():
|
||||||
|
return datetime.now(LOCAL_TZ)
|
||||||
|
|
||||||
|
def today_date():
|
||||||
|
return now_local().date()
|
||||||
|
|
||||||
|
def week_start():
|
||||||
|
now = now_local()
|
||||||
|
monday = now - timedelta(days=now.weekday())
|
||||||
|
return monday.replace(hour=0, minute=0, second=0, microsecond=0)
|
||||||
|
|
||||||
|
def parse_ts(ts_str):
|
||||||
|
"""Parse ISO timestamp string to local timezone datetime."""
|
||||||
|
if not ts_str:
|
||||||
|
return None
|
||||||
|
try:
|
||||||
|
ts_str = ts_str.replace("Z", "+00:00")
|
||||||
|
return datetime.fromisoformat(ts_str).astimezone(LOCAL_TZ)
|
||||||
|
except Exception:
|
||||||
|
return None
|
||||||
|
|
||||||
|
def is_today(ts_str):
|
||||||
|
dt = parse_ts(ts_str)
|
||||||
|
return dt is not None and dt.date() == today_date()
|
||||||
|
|
||||||
|
def is_this_week(ts_str):
|
||||||
|
dt = parse_ts(ts_str)
|
||||||
|
return dt is not None and dt >= week_start()
|
||||||
|
|
||||||
|
# ── Pipeline API ───────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def get_pipeline_stats():
|
||||||
|
"""Fetch pipeline/lead staging stats. Returns (stats_dict, error_string)."""
|
||||||
|
if not PIPELINE_AUTH:
|
||||||
|
return None, "PIPELINE_AUTH_TOKEN not configured"
|
||||||
|
|
||||||
|
data = api_get(f"{PIPELINE_API_URL}?page=1&limit=200", PIPELINE_AUTH)
|
||||||
|
if "_error" in data:
|
||||||
|
return None, data["_error"]
|
||||||
|
|
||||||
|
prospects = data.get("prospects", [])
|
||||||
|
stats = data.get("stats", {})
|
||||||
|
|
||||||
|
today_total = 0
|
||||||
|
today_approved = 0
|
||||||
|
today_sent = 0
|
||||||
|
|
||||||
|
for p in prospects:
|
||||||
|
created = p.get("queued_at") or p.get("created_at") or ""
|
||||||
|
if is_today(created):
|
||||||
|
today_total += 1
|
||||||
|
status = (p.get("status") or "").lower()
|
||||||
|
if status == "approved":
|
||||||
|
today_approved += 1
|
||||||
|
elif status == "sent":
|
||||||
|
today_sent += 1
|
||||||
|
|
||||||
|
return {
|
||||||
|
"today_total": today_total,
|
||||||
|
"today_approved": today_approved,
|
||||||
|
"today_sent": today_sent,
|
||||||
|
"total": stats.get("total", len(prospects)),
|
||||||
|
}, None
|
||||||
|
|
||||||
|
def get_recruiting_stats():
|
||||||
|
"""Fetch candidate sourcing stats with pagination. Returns (stats_dict, error_string)."""
|
||||||
|
if not RECRUITING_AUTH:
|
||||||
|
return None, "RECRUITING_AUTH_TOKEN not configured"
|
||||||
|
|
||||||
|
data = api_get(f"{RECRUITING_API_URL}?page=1&limit=50", RECRUITING_AUTH)
|
||||||
|
if "_error" in data:
|
||||||
|
return None, data["_error"]
|
||||||
|
|
||||||
|
stats = data.get("stats", {})
|
||||||
|
pagination = data.get("pagination", {})
|
||||||
|
total_pages = pagination.get("total_pages", 1)
|
||||||
|
|
||||||
|
today_total = 0
|
||||||
|
week_total = 0
|
||||||
|
|
||||||
|
def count_page(candidates):
|
||||||
|
nonlocal today_total, week_total
|
||||||
|
for c in candidates:
|
||||||
|
created = c.get("created_at") or c.get("createdAt") or ""
|
||||||
|
if is_today(created):
|
||||||
|
today_total += 1
|
||||||
|
if is_this_week(created):
|
||||||
|
week_total += 1
|
||||||
|
|
||||||
|
count_page(data.get("candidates", []))
|
||||||
|
|
||||||
|
# Paginate (stop early when we hit records older than this week)
|
||||||
|
max_pages = min(total_pages, 7)
|
||||||
|
for page in range(2, max_pages + 1):
|
||||||
|
pdata = api_get(f"{RECRUITING_API_URL}?page={page}&limit=50", RECRUITING_AUTH)
|
||||||
|
if "_error" in pdata:
|
||||||
|
break
|
||||||
|
candidates = pdata.get("candidates", [])
|
||||||
|
if not candidates:
|
||||||
|
break
|
||||||
|
last = candidates[-1]
|
||||||
|
last_ts = parse_ts(last.get("created_at") or "")
|
||||||
|
count_page(candidates)
|
||||||
|
if last_ts and last_ts < week_start():
|
||||||
|
break
|
||||||
|
|
||||||
|
return {
|
||||||
|
"today_total": today_total,
|
||||||
|
"week_total": week_total,
|
||||||
|
"stats_total": stats.get("total", "?"),
|
||||||
|
"stats_in_pipeline": stats.get("in_pipeline", "?"),
|
||||||
|
"stats_approved": stats.get("approved", "?"),
|
||||||
|
"stats_meetings": stats.get("meetings", "?"),
|
||||||
|
}, None
|
||||||
|
|
||||||
|
# ── Email Campaign Status ─────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
NOT_SENDING_LABELS = {0: "sending", 2: "daily limit hit", 4: "issue"}
|
||||||
|
|
||||||
|
def get_campaign_status(campaign_id, name):
|
||||||
|
"""Check single email campaign health."""
|
||||||
|
if not EMAIL_AUTH:
|
||||||
|
return {"name": name, "error": "EMAIL_AUTH_TOKEN not configured", "sending": False, "active": False, "daily_limit": 0}
|
||||||
|
|
||||||
|
data = api_get(f"{EMAIL_API_URL}/{campaign_id}", EMAIL_AUTH)
|
||||||
|
if "_error" in data:
|
||||||
|
return {"name": name, "error": data["_error"], "sending": False, "active": False, "daily_limit": 0}
|
||||||
|
|
||||||
|
status = data.get("status", -1)
|
||||||
|
ns_status = data.get("not_sending_status", 0)
|
||||||
|
daily_limit = data.get("daily_limit", 0)
|
||||||
|
|
||||||
|
return {
|
||||||
|
"name": name,
|
||||||
|
"active": status == 1,
|
||||||
|
"ns_status": ns_status,
|
||||||
|
"ns_label": NOT_SENDING_LABELS.get(ns_status, f"unknown({ns_status})"),
|
||||||
|
"daily_limit": daily_limit,
|
||||||
|
"sending": status == 1 and ns_status == 0,
|
||||||
|
}
|
||||||
|
|
||||||
|
def get_campaigns_summary(campaigns_dict):
|
||||||
|
"""Get aggregate health for a set of campaigns."""
|
||||||
|
if not campaigns_dict:
|
||||||
|
return {"results": [], "sending_count": 0, "total": 0, "capacity": 0, "any_issue": False, "all_paused": True}
|
||||||
|
|
||||||
|
results = [get_campaign_status(cid, name) for name, cid in campaigns_dict.items()]
|
||||||
|
sending_count = sum(1 for r in results if r.get("sending"))
|
||||||
|
total_capacity = sum(r.get("daily_limit", 0) for r in results if r.get("sending"))
|
||||||
|
any_issue = any(r.get("ns_status", 0) in (2, 4) for r in results)
|
||||||
|
all_paused = all(not r.get("active") for r in results)
|
||||||
|
|
||||||
|
return {
|
||||||
|
"results": results,
|
||||||
|
"sending_count": sending_count,
|
||||||
|
"total": len(results),
|
||||||
|
"capacity": total_capacity,
|
||||||
|
"any_issue": any_issue,
|
||||||
|
"all_paused": all_paused,
|
||||||
|
}
|
||||||
|
|
||||||
|
# ── Pacing Logic ───────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def pace_icon(issues):
|
||||||
|
if issues == 0: return "🟢"
|
||||||
|
elif issues == 1: return "🟡"
|
||||||
|
else: return "🔴"
|
||||||
|
|
||||||
|
def pipeline_pace(today_total, campaign_summary):
|
||||||
|
issues = 0
|
||||||
|
if today_total == 0: issues += 2
|
||||||
|
elif today_total < DAILY_LEAD_TARGET // 2: issues += 1
|
||||||
|
if campaign_summary["sending_count"] == 0: issues += 2
|
||||||
|
elif campaign_summary["any_issue"]: issues += 1
|
||||||
|
return pace_icon(issues)
|
||||||
|
|
||||||
|
def recruiting_pace(week_total, campaign_summary):
|
||||||
|
issues = 0
|
||||||
|
if week_total < WEEKLY_CANDIDATE_TARGET // 4: issues += 2
|
||||||
|
elif week_total < WEEKLY_CANDIDATE_TARGET // 2: issues += 1
|
||||||
|
if campaign_summary["sending_count"] == 0: issues += 2
|
||||||
|
elif campaign_summary["any_issue"]: issues += 1
|
||||||
|
return pace_icon(issues)
|
||||||
|
|
||||||
|
def campaign_line(summary):
|
||||||
|
if summary["all_paused"]:
|
||||||
|
return "🔴 all paused | 0 emails/day"
|
||||||
|
elif summary["sending_count"] == 0:
|
||||||
|
return "🔴 not sending | 0 emails/day"
|
||||||
|
elif summary["any_issue"]:
|
||||||
|
return f"🟡 {summary['sending_count']}/{summary['total']} sending | {summary['capacity']:,} emails/day"
|
||||||
|
else:
|
||||||
|
return f"🟢 {summary['sending_count']}/{summary['total']} sending | {summary['capacity']:,} emails/day"
|
||||||
|
|
||||||
|
# ── Main ───────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(description="Campaign pacing alert")
|
||||||
|
parser.add_argument("--json", action="store_true", help="Output as JSON")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
now = now_local()
|
||||||
|
date_str = now.strftime("%a %b %-d")
|
||||||
|
time_str = now.strftime("%-I:%M %p") + " " + TZ_LABEL
|
||||||
|
|
||||||
|
alerts = []
|
||||||
|
|
||||||
|
# Fetch data
|
||||||
|
pipeline_stats, pipeline_err = get_pipeline_stats()
|
||||||
|
recruiting_stats, recruiting_err = get_recruiting_stats()
|
||||||
|
outbound_summary = get_campaigns_summary(OUTBOUND_CAMPAIGNS)
|
||||||
|
recruiting_campaign_summary = get_campaigns_summary(RECRUITING_CAMPAIGNS)
|
||||||
|
|
||||||
|
if args.json:
|
||||||
|
output = {
|
||||||
|
"timestamp": now.isoformat(),
|
||||||
|
"pipeline": pipeline_stats or {"error": pipeline_err},
|
||||||
|
"recruiting": recruiting_stats or {"error": recruiting_err},
|
||||||
|
"outbound_campaigns": outbound_summary,
|
||||||
|
"recruiting_campaigns": recruiting_campaign_summary,
|
||||||
|
}
|
||||||
|
print(json.dumps(output, indent=2, default=str))
|
||||||
|
has_alerts = pipeline_err or recruiting_err or outbound_summary["any_issue"]
|
||||||
|
sys.exit(1 if has_alerts else 0)
|
||||||
|
|
||||||
|
lines = [f"⚠️ *Pacing Alert — {date_str} {time_str}*", ""]
|
||||||
|
|
||||||
|
# ── Pipeline / Outbound ──
|
||||||
|
if pipeline_err:
|
||||||
|
p_icon = "🔴"
|
||||||
|
p_line = f"API error: {pipeline_err}"
|
||||||
|
alerts.append(f"Pipeline API error: {pipeline_err}")
|
||||||
|
else:
|
||||||
|
pt = pipeline_stats["today_total"]
|
||||||
|
pa = pipeline_stats["today_approved"]
|
||||||
|
ps = pipeline_stats["today_sent"]
|
||||||
|
p_icon = pipeline_pace(pt, outbound_summary)
|
||||||
|
p_line = f"{pt} leads staged today | {pa} approved | {ps} sent"
|
||||||
|
if pt == 0:
|
||||||
|
alerts.append("Pipeline: 0 leads staged today")
|
||||||
|
|
||||||
|
lines.append(f"{p_icon} 📧 *Outbound Pipeline:*")
|
||||||
|
lines.append(f"• {p_line}")
|
||||||
|
lines.append(f"• Campaigns: {campaign_line(outbound_summary)}")
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
# ── Recruiting / Sourcing ──
|
||||||
|
if recruiting_err:
|
||||||
|
r_icon = "🔴"
|
||||||
|
r_line = f"API error: {recruiting_err}"
|
||||||
|
alerts.append(f"Recruiting API error: {recruiting_err}")
|
||||||
|
else:
|
||||||
|
rt = recruiting_stats["today_total"]
|
||||||
|
rw = recruiting_stats["week_total"]
|
||||||
|
r_icon = recruiting_pace(rw, recruiting_campaign_summary)
|
||||||
|
r_line = f"{rt} candidates added today | {rw} this week | target: {WEEKLY_CANDIDATE_TARGET}/week"
|
||||||
|
if rw < WEEKLY_CANDIDATE_TARGET // 4:
|
||||||
|
alerts.append(f"Recruiting: only {rw} candidates this week (target {WEEKLY_CANDIDATE_TARGET})")
|
||||||
|
|
||||||
|
lines.append(f"{r_icon} 🔍 *Recruiting Pipeline:*")
|
||||||
|
lines.append(f"• {r_line}")
|
||||||
|
lines.append(f"• Campaigns: {campaign_line(recruiting_campaign_summary)}")
|
||||||
|
|
||||||
|
print("\n".join(lines))
|
||||||
|
|
||||||
|
if alerts:
|
||||||
|
sys.exit(1)
|
||||||
|
else:
|
||||||
|
sys.exit(0)
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
2
growth-engine/requirements.txt
Normal file
2
growth-engine/requirements.txt
Normal file
|
|
@ -0,0 +1,2 @@
|
||||||
|
numpy>=1.24.0
|
||||||
|
scipy>=1.10.0
|
||||||
28
outbound-engine/.env.example
Normal file
28
outbound-engine/.env.example
Normal file
|
|
@ -0,0 +1,28 @@
|
||||||
|
# Apollo People Search API
|
||||||
|
APOLLO_API_KEY=your_apollo_api_key_here
|
||||||
|
|
||||||
|
# LeadMagic Email Verification API
|
||||||
|
LEADMAGIC_API_KEY=your_leadmagic_api_key_here
|
||||||
|
|
||||||
|
# Instantly Cold Email Platform
|
||||||
|
INSTANTLY_API_KEY=your_instantly_api_key_here
|
||||||
|
|
||||||
|
# Email Sending (for cold-outbound-sender.py)
|
||||||
|
SENDER_EMAIL=you@yourdomain.com
|
||||||
|
SENDER_NAME=Your Name
|
||||||
|
SMTP_HOST=smtp.gmail.com
|
||||||
|
SMTP_PORT=587
|
||||||
|
SMTP_USER=you@yourdomain.com
|
||||||
|
SMTP_PASSWORD=your_app_password_here
|
||||||
|
|
||||||
|
# Competitive Monitor (optional)
|
||||||
|
# Path to a JSON file defining your competitors
|
||||||
|
# See scripts/competitive-monitor.py for the expected format
|
||||||
|
COMPETITORS_CONFIG=./competitors.json
|
||||||
|
|
||||||
|
# Cross-Signal Detector (optional)
|
||||||
|
DATA_DIR=./data/agent-outputs
|
||||||
|
OUTPUT_FILE=./data/cross-signals-latest.json
|
||||||
|
|
||||||
|
# Cross-Signal Detector: comma-separated words to exclude from company extraction
|
||||||
|
# SIGNAL_STOP_WORDS=YourCompany,InternalTool
|
||||||
159
outbound-engine/README.md
Normal file
159
outbound-engine/README.md
Normal file
|
|
@ -0,0 +1,159 @@
|
||||||
|
# AI Outbound Engine
|
||||||
|
|
||||||
|
From ICP definition to emails in inbox — fully automated cold outbound.
|
||||||
|
|
||||||
|
This skill category handles the complete cold outbound pipeline: defining your ideal customer profile, writing expert-scored email sequences, sourcing and verifying leads, deduplicating against existing campaigns, uploading to your email platform, and monitoring the competitive landscape.
|
||||||
|
|
||||||
|
## What's Inside
|
||||||
|
|
||||||
|
### 🎯 Cold Outbound Optimizer (`SKILL.md`)
|
||||||
|
Full campaign design workflow:
|
||||||
|
- **ICP Definition** — structured template to define exactly who you're targeting
|
||||||
|
- **Infrastructure Audit** — pulls sending account inventory, warmup scores, and capacity math from Instantly
|
||||||
|
- **Expert Panel Scoring** — 10 simulated outbound experts score your copy (recursive until 90+/100)
|
||||||
|
- **Sequence Copywriting** — subject lines, body copy, follow-ups, breakup emails — all Instantly-ready
|
||||||
|
- **Capacity Planning** — accounts × daily limits = pipeline projections
|
||||||
|
- **Implementation Docs** — step-by-step launch plan
|
||||||
|
|
||||||
|
Supports both "start from scratch" and "optimize existing campaigns" modes.
|
||||||
|
|
||||||
|
### 📥 Lead Pipeline (`scripts/lead-pipeline.py`)
|
||||||
|
End-to-end lead sourcing:
|
||||||
|
1. **Apollo People Search** — pull leads matching your ICP criteria
|
||||||
|
2. **LeadMagic Verification** — validate every email before sending
|
||||||
|
3. **Deduplication** — check against existing Instantly leads + exclusion lists
|
||||||
|
4. **Upload to Instantly** — batch upload with rate limiting and retry logic
|
||||||
|
|
||||||
|
### 🔍 Competitive Monitor (`scripts/competitive-monitor.py`)
|
||||||
|
Track competitors automatically:
|
||||||
|
- Pricing page change detection (diff-based)
|
||||||
|
- Blog post monitoring for recent content
|
||||||
|
- Generates weekly competitive intelligence reports
|
||||||
|
- Configurable competitor list — add any company you want to track
|
||||||
|
|
||||||
|
### 🔗 Cross-Signal Detector (`scripts/cross-signal-detector.py`)
|
||||||
|
Find overlapping signals across multiple data sources:
|
||||||
|
- Company overlap across SEO, sales, and outbound data
|
||||||
|
- Vertical alignment detection
|
||||||
|
- Keyword cluster correlation
|
||||||
|
- Confidence-scored recommendations for coordinated action
|
||||||
|
|
||||||
|
### 📧 Cold Outbound Sender (`scripts/cold-outbound-sender.py`)
|
||||||
|
Sends approved outbound emails:
|
||||||
|
- Reads from an approved prospects JSON file
|
||||||
|
- Daily send limits (configurable)
|
||||||
|
- Full send history tracking
|
||||||
|
- Dry-run mode for testing
|
||||||
|
|
||||||
|
### 🔧 Instantly Audit (`scripts/instantly-audit.py`)
|
||||||
|
Pull campaign health data from the Instantly v2 API:
|
||||||
|
- Sending account inventory and warmup scores
|
||||||
|
- Campaign performance (open rate, reply rate, positive reply rate)
|
||||||
|
- Capacity math (conservative vs aggressive projections)
|
||||||
|
- Flags: low warmup scores, underperforming campaigns, blockers
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### 1. Set Up Environment Variables
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cp .env.example .env
|
||||||
|
# Fill in your API keys
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Install Dependencies
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install -r requirements.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Run the Lead Pipeline
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python3 scripts/lead-pipeline.py \
|
||||||
|
--titles "VP Marketing,CMO,Head of Growth" \
|
||||||
|
--industries "SaaS,Marketing" \
|
||||||
|
--company-size "11,50" \
|
||||||
|
--locations "United States" \
|
||||||
|
--campaign-id "YOUR_CAMPAIGN_UUID" \
|
||||||
|
--volume 500 \
|
||||||
|
--dry-run
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Audit Your Instantly Account
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python3 scripts/instantly-audit.py --output report.md
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. Monitor Competitors
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python3 scripts/competitive-monitor.py --output report.md
|
||||||
|
```
|
||||||
|
|
||||||
|
### 6. Detect Cross-Signals
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python3 scripts/cross-signal-detector.py \
|
||||||
|
--data-dir ./data/agent-outputs \
|
||||||
|
--output cross-signals.json
|
||||||
|
```
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
ICP Definition
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
Expert Panel Scoring (recursive → 90+)
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
Apollo Search → LeadMagic Verify → Dedupe → Instantly Upload
|
||||||
|
│ │
|
||||||
|
▼ ▼
|
||||||
|
Competitive Monitor ◄──────────────► Cross-Signal Detector
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
Weekly Intelligence Report
|
||||||
|
```
|
||||||
|
|
||||||
|
## File Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
outbound-engine/
|
||||||
|
├── README.md # This file
|
||||||
|
├── SKILL.md # Claude Code skill definition
|
||||||
|
├── .env.example # Environment variable template
|
||||||
|
├── requirements.txt # Python dependencies
|
||||||
|
├── scripts/
|
||||||
|
│ ├── lead-pipeline.py # Apollo → LeadMagic → Dedupe → Instantly
|
||||||
|
│ ├── instantly-audit.py # Instantly account health check
|
||||||
|
│ ├── competitive-monitor.py # Competitor tracking
|
||||||
|
│ ├── cross-signal-detector.py # Multi-source signal detection
|
||||||
|
│ └── cold-outbound-sender.py # Send approved outbound emails
|
||||||
|
└── references/
|
||||||
|
├── expert-panel.md # Default 10-expert scoring roster
|
||||||
|
├── copy-rules.md # Cold email copywriting rules
|
||||||
|
├── icp-template.md # ICP data collection template
|
||||||
|
└── instantly-rules.md # Instantly variable syntax & deliverability rules
|
||||||
|
```
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
- Python 3.9+
|
||||||
|
- API keys: Apollo, LeadMagic, Instantly (see `.env.example`)
|
||||||
|
- For the sender script: a configured email sending tool (e.g., `gog` CLI or SMTP)
|
||||||
|
- Claude Code or similar AI coding agent (for running the SKILL.md workflow)
|
||||||
|
|
||||||
|
## Customization
|
||||||
|
|
||||||
|
- **ICP**: Edit `references/icp-template.md` or provide parameters at runtime
|
||||||
|
- **Expert Panel**: Swap panelists in `references/expert-panel.md` for your industry
|
||||||
|
- **Competitors**: Configure the `COMPETITORS` dict in `competitive-monitor.py`
|
||||||
|
- **Send limits**: Adjust `MAX_PER_DAY` in `cold-outbound-sender.py`
|
||||||
|
- **Data sources**: Point `cross-signal-detector.py` at your own data directories
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
MIT
|
||||||
157
outbound-engine/SKILL.md
Normal file
157
outbound-engine/SKILL.md
Normal file
|
|
@ -0,0 +1,157 @@
|
||||||
|
---
|
||||||
|
name: cold-outbound-optimizer
|
||||||
|
description: Design, analyze, and optimize cold outbound email campaigns for Instantly. Handles end-to-end ICP definition, expert panel scoring (recursive to 90+), sequence copywriting, infrastructure audit, capacity planning, and implementation docs. Use when asked to build cold outbound sequences, optimize cold email, analyze outbound campaigns, build sales sequences, build Instantly sequences, create cold outbound strategies, or design email campaigns. Supports both "start from scratch" and "optimize existing" modes.
|
||||||
|
---
|
||||||
|
|
||||||
|
# Cold Outbound Optimizer
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Startup: Determine Mode
|
||||||
|
|
||||||
|
Ask the user:
|
||||||
|
1. Do you have an **existing Instantly account** with campaigns to audit, or are you **starting from scratch**?
|
||||||
|
2. Do you have an **Instantly API key**? (Required for audit mode.)
|
||||||
|
|
||||||
|
If API key provided → run `scripts/instantly-audit.py` to pull campaigns, account inventory, and warmup scores before proceeding.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 1: Discovery & Audit
|
||||||
|
|
||||||
|
### 1A — Infrastructure Check (if API key available)
|
||||||
|
Run `python3 scripts/instantly-audit.py --api-key <KEY>` and report:
|
||||||
|
- Active campaigns (name, status, reply rate, open rate)
|
||||||
|
- Sending accounts (count, warmup score, daily limit)
|
||||||
|
- Domain inventory
|
||||||
|
- Warmup gaps: any account with score <80 or <14 days warmup → flag as NOT ready
|
||||||
|
|
||||||
|
### 1B — Performance Data
|
||||||
|
- Pull campaign analytics from Instantly
|
||||||
|
- Ask: "Do you have a spreadsheet with historical outbound data?" If yes, request link.
|
||||||
|
|
||||||
|
### 1C — ICP Definition
|
||||||
|
If no ICP defined, collect:
|
||||||
|
- **Titles:** Who are you targeting? (e.g., VP Marketing, Head of Growth)
|
||||||
|
- **Industries:** Which verticals?
|
||||||
|
- **Company size:** Employee count or revenue range?
|
||||||
|
- **Revenue floor:** Minimum ARR/revenue to qualify?
|
||||||
|
- **Anti-ICP:** Who to explicitly exclude?
|
||||||
|
|
||||||
|
Use `references/icp-template.md` as the collection template.
|
||||||
|
|
||||||
|
### 1D — Business Context
|
||||||
|
Collect:
|
||||||
|
- What do you sell? (One sentence, no jargon)
|
||||||
|
- What's the primary offer? (Free trial, audit, demo, consultation)
|
||||||
|
- Real URLs to reference (pricing page, case studies, relevant content)
|
||||||
|
- Any proof points? (Client results, stats, social proof)
|
||||||
|
|
||||||
|
### 1E — Expert Panel Config
|
||||||
|
Default: 10 experts (see `references/expert-panel.md`).
|
||||||
|
Ask: "Any industry-specific experts to add, or panelists to swap?" Confirm roster before scoring.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 2: Expert Panel Recursive Scoring
|
||||||
|
|
||||||
|
**Target: 90/100. Non-negotiable. Iterate until reached.**
|
||||||
|
|
||||||
|
### Round Structure
|
||||||
|
Each round produces:
|
||||||
|
1. **Score table** — all 10 panelists, individual score (0-100), one-line rationale
|
||||||
|
2. **Aggregate score** — average of all 10
|
||||||
|
3. **Top weaknesses** — ranked list of what's holding the copy back
|
||||||
|
4. **Changes made** — specific edits addressing each weakness
|
||||||
|
5. **Updated copy** — full revised sequence after changes
|
||||||
|
|
||||||
|
### Scoring Criteria (per panelist's lens — see `references/expert-panel.md`)
|
||||||
|
- Subject line curiosity / open rate potential
|
||||||
|
- First sentence pattern interrupt
|
||||||
|
- Body clarity and brevity
|
||||||
|
- CTA softness and specificity
|
||||||
|
- Sequence flow and follow-up logic
|
||||||
|
- Deliverability risk signals (spam words, link density)
|
||||||
|
- Personalization believability
|
||||||
|
|
||||||
|
### Rules
|
||||||
|
- Scores must be brutally honest. No padding to 90 without earning it.
|
||||||
|
- If round score < 90: identify top 3 weaknesses, revise copy, run next round.
|
||||||
|
- If round score ≥ 90: finalize copy and proceed to deliverables.
|
||||||
|
- Show every round in the final doc — the iteration trail is part of the value.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 3: Deliverables
|
||||||
|
|
||||||
|
### Strategy Doc
|
||||||
|
Create a document (Google Doc, Notion, or markdown) with:
|
||||||
|
|
||||||
|
1. **Pre-Analysis / Brutal Truth** — what the existing campaigns are doing wrong (or baseline if starting from scratch)
|
||||||
|
2. **ICP Summary** — confirmed targeting parameters
|
||||||
|
3. **Infrastructure Status** — account inventory, warmup readiness, capacity math
|
||||||
|
4. **Scoring Rounds** — full panel vote tables for every round
|
||||||
|
5. **Final Email Copy** — all steps for all campaigns, Instantly-ready format
|
||||||
|
6. **Implementation Plan** — step-by-step setup instructions
|
||||||
|
7. **Capacity Math** — accounts × daily send rate = pipeline projections
|
||||||
|
8. **Weekly Metrics Targets** — open rate, reply rate, positive reply rate, meetings booked
|
||||||
|
9. **STOP List** — what to kill immediately
|
||||||
|
10. **START List** — what to launch first
|
||||||
|
|
||||||
|
### Format Rules for Final Copy
|
||||||
|
Follow all rules in `references/instantly-rules.md` and `references/copy-rules.md`.
|
||||||
|
|
||||||
|
### Human Review Gate
|
||||||
|
**Do NOT push anything to Instantly automatically.** The doc is for human review. Get explicit approval before any API writes.
|
||||||
|
|
||||||
|
### Iteration
|
||||||
|
After review, collect feedback and re-run scoring on revised copy if needed.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Capacity Math Formula
|
||||||
|
|
||||||
|
```
|
||||||
|
Accounts ready (score ≥80, ≥14 days warmup) × 30 emails/day = conservative daily volume
|
||||||
|
Accounts ready × 50 emails/day = aggressive daily volume
|
||||||
|
Daily volume × 22 working days = monthly send capacity
|
||||||
|
Monthly sends × expected reply rate = expected replies
|
||||||
|
Expected replies × qualification rate = pipeline opportunities
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Weekly Metrics Targets (Baselines)
|
||||||
|
|
||||||
|
| Metric | Good | Great |
|
||||||
|
|--------|------|-------|
|
||||||
|
| Open rate | 40%+ | 60%+ |
|
||||||
|
| Reply rate | 3%+ | 7%+ |
|
||||||
|
| Positive reply rate | 1%+ | 3%+ |
|
||||||
|
| Meeting rate | 0.5%+ | 1.5%+ |
|
||||||
|
|
||||||
|
Adjust targets based on niche and offer. Cold traffic to a free audit converts differently than a paid trial.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Add-On Recommendations (mention but don't build)
|
||||||
|
|
||||||
|
- **LinkedIn automation:** HeyReach or similar for multi-channel sequences. Separate workflow.
|
||||||
|
- **Lead enrichment:** Clay or Apollo for personalization data before upload.
|
||||||
|
- **Lead pipeline:** Use `scripts/lead-pipeline.py` for Apollo → LeadMagic → Instantly automation.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Reference Files
|
||||||
|
|
||||||
|
| File | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| `references/instantly-rules.md` | Variable syntax, sequence structure, deliverability rules |
|
||||||
|
| `references/expert-panel.md` | Default 10-expert roster with scoring lenses |
|
||||||
|
| `references/copy-rules.md` | Email copy rules (first sentence, CTA, stats framing) |
|
||||||
|
| `references/icp-template.md` | ICP data collection template |
|
||||||
|
| `scripts/instantly-audit.py` | Pulls campaigns, accounts, warmup scores via Instantly v2 API |
|
||||||
|
| `scripts/lead-pipeline.py` | End-to-end lead sourcing pipeline |
|
||||||
|
| `scripts/competitive-monitor.py` | Competitor tracking and intelligence |
|
||||||
|
| `scripts/cross-signal-detector.py` | Multi-source signal detection |
|
||||||
|
| `scripts/cold-outbound-sender.py` | Send approved outbound emails |
|
||||||
139
outbound-engine/references/copy-rules.md
Normal file
139
outbound-engine/references/copy-rules.md
Normal file
|
|
@ -0,0 +1,139 @@
|
||||||
|
# Cold Email Copy Rules
|
||||||
|
|
||||||
|
Rules for writing and evaluating cold email copy. Apply to every step in every sequence.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## First Sentence Rules
|
||||||
|
|
||||||
|
**NEVER start with:**
|
||||||
|
- "I" — e.g., "I came across your company..."
|
||||||
|
- "We" — e.g., "We help companies like yours..."
|
||||||
|
- "Our team" — e.g., "Our team specializes in..."
|
||||||
|
- "I wanted to" — e.g., "I wanted to reach out because..."
|
||||||
|
- "Hope this finds you well" or any version of it
|
||||||
|
- "My name is..." (save for follow-ups if needed, never Step 1)
|
||||||
|
|
||||||
|
**ALWAYS start with one of:**
|
||||||
|
- **Prospect's company name** — "{{companyName}}'s recent..."
|
||||||
|
- **A specific market observation** — "Most [industry] companies we talk to are..."
|
||||||
|
- **A specific finding** — "Your [blog post / LinkedIn post / job listing] on..."
|
||||||
|
- **A relevant trend** — "Since [relevant thing] happened in [industry]..."
|
||||||
|
|
||||||
|
The first sentence earns the second. If it doesn't make the prospect think "hm, relevant," the email is dead.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Body Length Rules
|
||||||
|
|
||||||
|
| Step | Max sentences | Notes |
|
||||||
|
|------|--------------|-------|
|
||||||
|
| Step 1 | 3 sentences | Open + value + CTA. That's it. |
|
||||||
|
| Steps 2-4 | 3-5 sentences | Add new angle or asset, not a repeat |
|
||||||
|
| Step 5 (bump) | 1-2 sentences | Short. "Still relevant?" style. |
|
||||||
|
| Step 6 (breakup) | 2-3 sentences | Leave value, don't close your file. |
|
||||||
|
|
||||||
|
If a step is longer than this, cut it. Ruthlessly.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Stats and Social Proof
|
||||||
|
|
||||||
|
**Correct framing (observation):**
|
||||||
|
> "Most brands we audit are leaving 30-40% of their SEO traffic unconverted."
|
||||||
|
|
||||||
|
**Incorrect framing (study/study-like):**
|
||||||
|
> "According to our data, 73% of brands have this problem."
|
||||||
|
|
||||||
|
Why: Observation sounds like earned experience. Study sounds like a marketing claim. Prospects believe the former.
|
||||||
|
|
||||||
|
**Never fabricate:**
|
||||||
|
- Specific client names unless verified and approved
|
||||||
|
- Revenue numbers or % improvements unless you have the actual data
|
||||||
|
- Podcast episodes or content references unless they exist and are linkable
|
||||||
|
- Case study specifics — if you can't verify it, generalize it
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## CTAs
|
||||||
|
|
||||||
|
**Soft asks (preferred):**
|
||||||
|
- "Worth a look?"
|
||||||
|
- "Want the data?"
|
||||||
|
- "Does this match what you're seeing?"
|
||||||
|
- "Relevant to what you're working on?"
|
||||||
|
- "Happy to share what we found — useful?"
|
||||||
|
|
||||||
|
**Hard asks (avoid in Step 1):**
|
||||||
|
- "Book a call with me" → too much commitment too early
|
||||||
|
- "Schedule 30 minutes" → presumes interest
|
||||||
|
- "Let's hop on a call" → pushy
|
||||||
|
- "Are you free Thursday?" → too forward for a stranger
|
||||||
|
|
||||||
|
Use hard asks only in Step 4+ if you've gotten engagement signals. Even then, soften them.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Links
|
||||||
|
|
||||||
|
- **Step 1:** No links (deliverability + trust)
|
||||||
|
- **Steps 2-3:** Max 1 link, only if it adds genuine value (a case study, a report, a tool)
|
||||||
|
- **Breakup email:** Include 1 real link to genuinely useful content (not a sales page)
|
||||||
|
- **Never:** Hallucinate URLs. All links must be verified real pages before use.
|
||||||
|
- **Never:** Link to a landing page with a form in Steps 1-2 — it signals spray-and-pray
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Breakup Email (Final Step)
|
||||||
|
|
||||||
|
**Correct:**
|
||||||
|
> Leave something genuinely useful. A real article, a real report, a real piece of content that relates to their problem.
|
||||||
|
> "In case it's useful regardless — here's the framework we use: [real URL]. No pressure on the rest."
|
||||||
|
|
||||||
|
**Incorrect:**
|
||||||
|
> "Just wanted to close the loop / closing your file / marking you as not interested"
|
||||||
|
> This is negative framing and slightly manipulative. The prospect notices.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## AI Engine References
|
||||||
|
|
||||||
|
When listing AI tools in copy or messaging, always include the full set:
|
||||||
|
**ChatGPT, Perplexity, Gemini, Claude**
|
||||||
|
|
||||||
|
Do not omit any major AI platform. If listing "AI tools" or "AI search engines," include all four.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Personalization Rules
|
||||||
|
|
||||||
|
- `{{personalization}}` field: must be set per lead. Don't leave it generic.
|
||||||
|
- Personalization should reference something *specific* to the company: a recent hire, a published piece, a product launch, a job listing signal, a funding round.
|
||||||
|
- If you can't personalize at least 50% of the list, remove `{{personalization}}` from the template and rewrite to not depend on it.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Subject Lines
|
||||||
|
|
||||||
|
- Length: 3-7 words is the sweet spot
|
||||||
|
- No exclamation points
|
||||||
|
- No all-caps
|
||||||
|
- No emoji in B2B cold email (unless targeting a persona that expects it)
|
||||||
|
- Best patterns:
|
||||||
|
- Question: "Quick question, {{firstName}}"
|
||||||
|
- Observation: "{{companyName}}'s content strategy"
|
||||||
|
- Specificity: "Saw your post on [topic]"
|
||||||
|
- Intrigue: "One thing we noticed"
|
||||||
|
- A/B test 2 variants per Step 1. Pick winner after 100+ sends each.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Tone
|
||||||
|
|
||||||
|
- Peer-to-peer, not vendor-to-prospect
|
||||||
|
- Curious, not desperate
|
||||||
|
- Specific, not generic
|
||||||
|
- Short, not comprehensive
|
||||||
|
- Human, not corporate
|
||||||
|
|
||||||
|
If it sounds like a marketing email, rewrite it. Cold email that converts sounds like a text from a knowledgeable peer.
|
||||||
89
outbound-engine/references/expert-panel.md
Normal file
89
outbound-engine/references/expert-panel.md
Normal file
|
|
@ -0,0 +1,89 @@
|
||||||
|
# Expert Panel — Default Roster
|
||||||
|
|
||||||
|
10 outbound sales experts. Each scores copy through their specific lens.
|
||||||
|
User can swap or add panelists based on industry or offer type.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Default Panel
|
||||||
|
|
||||||
|
### 1. Alex Berman
|
||||||
|
**Background:** Cold email master, B2B agency lead gen. $100M+ pipeline generated via cold outreach.
|
||||||
|
**Scoring lens:** Raw reply rate potential. Does this email get a "yes" or "tell me more"? Evaluates offer clarity, brevity, and specificity.
|
||||||
|
**Red flags he catches:** Vague value props, over-explaining, walls of text.
|
||||||
|
|
||||||
|
### 2. Oren Klaff
|
||||||
|
**Background:** Author of *Pitch Anything*. Neuromarketing and frame control specialist.
|
||||||
|
**Scoring lens:** Frame and status. Does this email position the sender as high-status? Is there genuine scarcity or social proof? Does it trigger "I need to respond to this"?
|
||||||
|
**Red flags he catches:** Begging energy, "I just wanted to...", weak positioning.
|
||||||
|
|
||||||
|
### 3. Josh Braun
|
||||||
|
**Background:** *Badass B2B Growth*. Anti-spam cold email philosophy.
|
||||||
|
**Scoring lens:** Does this email respect the prospect? Is it genuinely useful or just noise? Evaluates honest curiosity, relevant observations, and non-pushy CTAs.
|
||||||
|
**Red flags he catches:** Fake personalization, presumptuous CTAs, spray-and-pray signals.
|
||||||
|
|
||||||
|
### 4. Becc Holland
|
||||||
|
**Background:** Creator of "Flip the Script." Pattern interrupt specialist.
|
||||||
|
**Scoring lens:** Does Step 1 stop the scroll? Is the opening surprising enough to earn the next sentence? Evaluates subject line + first sentence combo.
|
||||||
|
**Red flags he catches:** Generic openers, "I hope this finds you well", predictable subject lines.
|
||||||
|
|
||||||
|
### 5. Sam McKenna
|
||||||
|
**Background:** #samsales. "Show Me You Know Me" methodology.
|
||||||
|
**Scoring lens:** Research depth. Does the email prove the sender actually knows this prospect? Evaluates specificity of personalization and relevance of observation.
|
||||||
|
**Red flags she catches:** Generic compliments, surface-level research, "I noticed your website..."
|
||||||
|
|
||||||
|
### 6. Kyle Coleman
|
||||||
|
**Background:** Copy.ai VP Marketing. B2B sequencing strategy.
|
||||||
|
**Scoring lens:** Sequence architecture. Does the follow-up ladder make sense? Does each step add new value rather than just bumping? Evaluates sequence logic and escalation.
|
||||||
|
**Red flags he catches:** Repetitive follow-ups, "just checking in", no value escalation.
|
||||||
|
|
||||||
|
### 7. Will Allred
|
||||||
|
**Background:** Lavender co-founder. Reply rate optimization via AI-assisted email analysis.
|
||||||
|
**Scoring lens:** Readability and reply rate signals. Reading grade level, sentence length, mobile rendering, emotional tone. Does this feel like a real email from a real person?
|
||||||
|
**Red flags he catches:** Long sentences, passive voice, corporate jargon, "synergies".
|
||||||
|
|
||||||
|
### 8. Jeremy Donovan
|
||||||
|
**Background:** SalesLoft SVP of Revenue Strategy. Data-driven deliverability and analytics.
|
||||||
|
**Scoring lens:** Deliverability and measurability. Are there spam triggers? Is the send structure safe? Are metrics targets realistic?
|
||||||
|
**Red flags he catches:** Spam words, link overload, unrealistic reply rate expectations.
|
||||||
|
|
||||||
|
### 9. Jeb Blount
|
||||||
|
**Background:** Author of *Fanatical Prospecting*. Multi-channel outbound systems.
|
||||||
|
**Scoring lens:** Pipeline math and multi-channel logic. Is the sequence volume sufficient? Should LinkedIn or phone be layered in? Is the outreach sustainable?
|
||||||
|
**Red flags he catches:** Under-resourced sequences, single-channel dependency, no follow-through plan.
|
||||||
|
|
||||||
|
### 10. Patrick Dang
|
||||||
|
**Background:** B2B sales coach. Email + LinkedIn combo plays.
|
||||||
|
**Scoring lens:** LinkedIn integration potential. Does the email sequence have natural LinkedIn touchpoints? Is the overall outreach strategy connected across channels?
|
||||||
|
**Red flags he catches:** Siloed email sequences with no social proof layer, no profile warmup.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Scoring Table Format (per round)
|
||||||
|
|
||||||
|
| Panelist | Score | Rationale |
|
||||||
|
|----------|-------|-----------|
|
||||||
|
| Alex Berman | XX | [one-line reason] |
|
||||||
|
| Oren Klaff | XX | [one-line reason] |
|
||||||
|
| Josh Braun | XX | [one-line reason] |
|
||||||
|
| Becc Holland | XX | [one-line reason] |
|
||||||
|
| Sam McKenna | XX | [one-line reason] |
|
||||||
|
| Kyle Coleman | XX | [one-line reason] |
|
||||||
|
| Will Allred | XX | [one-line reason] |
|
||||||
|
| Jeremy Donovan | XX | [one-line reason] |
|
||||||
|
| Jeb Blount | XX | [one-line reason] |
|
||||||
|
| Patrick Dang | XX | [one-line reason] |
|
||||||
|
| **AVERAGE** | **XX** | |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Swapping / Adding Panelists
|
||||||
|
|
||||||
|
User may request panelist changes. Examples:
|
||||||
|
- Selling to HR → add Lou Adler (hiring-focused B2B sales)
|
||||||
|
- Selling SaaS dev tools → add Jason Lemkin (SaaS-specific outbound)
|
||||||
|
- Selling to enterprise → add John Barrows (enterprise sales methodology)
|
||||||
|
- Selling to e-commerce → add Ezra Firestone (e-com marketing lens)
|
||||||
|
|
||||||
|
When adding panelists, define their scoring lens before running rounds.
|
||||||
|
Minimum panel size: 5. Maximum: 15 (more than 15 creates noise, not signal).
|
||||||
160
outbound-engine/references/icp-template.md
Normal file
160
outbound-engine/references/icp-template.md
Normal file
|
|
@ -0,0 +1,160 @@
|
||||||
|
# ICP Data Collection Template
|
||||||
|
|
||||||
|
Use this template when defining the Ideal Customer Profile. Collect all fields before writing copy.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ICP Definition
|
||||||
|
|
||||||
|
**Client/Campaign:** _______________
|
||||||
|
**Date:** _______________
|
||||||
|
**Collected by:** _______________
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Target Titles
|
||||||
|
Who specifically receives these emails? List primary and secondary titles.
|
||||||
|
|
||||||
|
**Primary titles (high intent):**
|
||||||
|
- e.g., VP of Marketing
|
||||||
|
- e.g., Director of Demand Generation
|
||||||
|
- e.g., Head of Growth
|
||||||
|
|
||||||
|
**Secondary titles (acceptable, lower priority):**
|
||||||
|
- e.g., CMO (at smaller companies)
|
||||||
|
- e.g., Marketing Manager (if company size <50)
|
||||||
|
|
||||||
|
**Never target:**
|
||||||
|
- e.g., Coordinators, Interns, Assistants (unless specifically requested)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Target Industries / Verticals
|
||||||
|
|
||||||
|
**Primary verticals:**
|
||||||
|
1.
|
||||||
|
2.
|
||||||
|
3.
|
||||||
|
|
||||||
|
**Secondary verticals (test, not primary):**
|
||||||
|
1.
|
||||||
|
2.
|
||||||
|
|
||||||
|
**Excluded verticals (anti-ICP):**
|
||||||
|
- e.g., Non-profits (budget constraints)
|
||||||
|
- e.g., Government (procurement timelines)
|
||||||
|
- e.g., [Specific vertical you can't serve]
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Company Size
|
||||||
|
|
||||||
|
**Employee count range:**
|
||||||
|
- Minimum: ___
|
||||||
|
- Maximum: ___
|
||||||
|
- Sweet spot: ___
|
||||||
|
|
||||||
|
**Revenue range (if targeting by revenue):**
|
||||||
|
- Minimum ARR/Revenue: $___
|
||||||
|
- Maximum: $___
|
||||||
|
|
||||||
|
**Funding stage (if relevant):**
|
||||||
|
- e.g., Series A+
|
||||||
|
- e.g., Bootstrapped >$5M revenue
|
||||||
|
- e.g., PE-backed
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Geographic Targeting
|
||||||
|
|
||||||
|
**Primary markets:**
|
||||||
|
- e.g., US only
|
||||||
|
- e.g., US + Canada
|
||||||
|
- e.g., English-speaking markets
|
||||||
|
|
||||||
|
**Excluded regions:**
|
||||||
|
- e.g., APAC (different sales motion)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Buying Signals / Trigger Events
|
||||||
|
|
||||||
|
What makes a company more likely to buy right now?
|
||||||
|
|
||||||
|
- e.g., Recently hired a new VP Marketing (job posting signal)
|
||||||
|
- e.g., Raised funding in last 6 months
|
||||||
|
- e.g., Launched new product in last 90 days
|
||||||
|
- e.g., Running paid search (visible via SpyFu/SemRush)
|
||||||
|
- e.g., Job listings for [role] signal they need help
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Anti-ICP (Explicit Exclusions)
|
||||||
|
|
||||||
|
Who should never receive these emails?
|
||||||
|
|
||||||
|
**Company characteristics:**
|
||||||
|
- e.g., <10 employees (too small, no budget)
|
||||||
|
- e.g., Bootstrapped and not scaling
|
||||||
|
- e.g., Already a current client
|
||||||
|
|
||||||
|
**Contact characteristics:**
|
||||||
|
- e.g., No verified email (bounce risk)
|
||||||
|
- e.g., Missing firstName (won't personalize)
|
||||||
|
- e.g., Opt-out list
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Offer-to-ICP Fit
|
||||||
|
|
||||||
|
**What's the primary offer?**
|
||||||
|
- [ ] Free audit
|
||||||
|
- [ ] Free trial
|
||||||
|
- [ ] Demo
|
||||||
|
- [ ] Strategy call
|
||||||
|
- [ ] Content/report download
|
||||||
|
- [ ] Other: _______________
|
||||||
|
|
||||||
|
**Why this offer for this ICP?**
|
||||||
|
(One sentence — if you can't answer this, the offer needs rethinking)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Known Objections
|
||||||
|
|
||||||
|
What does this ICP typically say no to?
|
||||||
|
|
||||||
|
1.
|
||||||
|
2.
|
||||||
|
3.
|
||||||
|
|
||||||
|
**How we address objections in copy:**
|
||||||
|
(Don't address all of them — pick the one that kills the most deals and neutralize it in Step 3 or 4)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Personalization Data Available
|
||||||
|
|
||||||
|
What data fields are available per lead?
|
||||||
|
|
||||||
|
- [ ] firstName ✓ (required)
|
||||||
|
- [ ] companyName ✓ (required)
|
||||||
|
- [ ] personalization field — source: _______________
|
||||||
|
- [ ] Industry
|
||||||
|
- [ ] Employee count
|
||||||
|
- [ ] LinkedIn URL
|
||||||
|
- [ ] Other: _______________
|
||||||
|
|
||||||
|
**Personalization source:**
|
||||||
|
- e.g., Clay enrichment
|
||||||
|
- e.g., Apollo export
|
||||||
|
- e.g., Manual research (for small lists)
|
||||||
|
- e.g., None (template must work without it)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Notes / Special Instructions
|
||||||
|
|
||||||
|
Any other context the copywriter needs:
|
||||||
|
|
||||||
|
_______________
|
||||||
79
outbound-engine/references/instantly-rules.md
Normal file
79
outbound-engine/references/instantly-rules.md
Normal file
|
|
@ -0,0 +1,79 @@
|
||||||
|
# Instantly-Specific Rules
|
||||||
|
|
||||||
|
## Valid Variables (ONLY these — no others)
|
||||||
|
|
||||||
|
| Variable | Usage |
|
||||||
|
|----------|-------|
|
||||||
|
| `{{firstName\|there}}` | Prospect first name, fallback "there" |
|
||||||
|
| `{{companyName\|your company}}` | Prospect company, fallback "your company" |
|
||||||
|
| `{{personalization}}` | Custom personalization field (set per lead) |
|
||||||
|
| `{{sendingAccountFirstName}}` | Sender's first name (from sending account) |
|
||||||
|
|
||||||
|
**Never use:**
|
||||||
|
- Square-bracket placeholders like `[Competitor A]`, `[Your Company]`, `[Industry]`
|
||||||
|
- Custom variables not listed above — they won't render in Instantly
|
||||||
|
- If a concept can't be expressed with valid variables, rewrite the copy to not need it
|
||||||
|
|
||||||
|
## firstName Rule (Critical)
|
||||||
|
- **Always require firstName** during lead upload. Filter out leads without first name.
|
||||||
|
- Do NOT rely on the `|there` fallback as a design choice — it signals a bad list.
|
||||||
|
- If the list has >5% missing firstName, flag it before launch.
|
||||||
|
|
||||||
|
## Sequence Structure
|
||||||
|
|
||||||
|
- **Steps:** 5-6 max (not 8). Diminishing returns after 6.
|
||||||
|
- **Step delays (days after previous step):**
|
||||||
|
- Step 1: Day 0 (immediate)
|
||||||
|
- Step 2: Day 2
|
||||||
|
- Step 3: Day 4-7
|
||||||
|
- Step 4: Day 7
|
||||||
|
- Step 5: Day 7-14
|
||||||
|
- Step 6 (breakup): Day 7-14 after Step 5
|
||||||
|
|
||||||
|
## A/B Testing
|
||||||
|
- **Step 1 only:** Test 2 subject line variants (A/B)
|
||||||
|
- Don't A/B test body copy in early campaigns — isolate subject line variable first
|
||||||
|
- Winning subject line = whichever hits higher open rate at 100+ sends per variant
|
||||||
|
|
||||||
|
## Signature Format
|
||||||
|
```
|
||||||
|
{{sendingAccountFirstName}}
|
||||||
|
```
|
||||||
|
- No company name, no title, no tagline — unless explicitly requested
|
||||||
|
- Keep it human. Feels like it came from a person, not a company.
|
||||||
|
|
||||||
|
## Deliverability Rules
|
||||||
|
|
||||||
|
### Send Limits
|
||||||
|
- **Safe:** 30 emails/day per account
|
||||||
|
- **Aggressive:** 50 emails/day per account (only with score 90+, warmed 30+ days)
|
||||||
|
- Never exceed 50/day per account without explicit discussion
|
||||||
|
|
||||||
|
### Warmup Requirements
|
||||||
|
- **Minimum:** 14 days warmup before first campaign
|
||||||
|
- **Minimum score:** 80+ warmup score
|
||||||
|
- Accounts below 80 or under 14 days: DO NOT add to active campaigns
|
||||||
|
|
||||||
|
### Domain Setup (must verify before launch)
|
||||||
|
- SPF: configured and passing
|
||||||
|
- DKIM: configured and passing
|
||||||
|
- DMARC: policy set (at minimum p=none with reporting)
|
||||||
|
- MX records: pointing correctly
|
||||||
|
- Custom tracking domain: set up in Instantly (subdomain, not root domain)
|
||||||
|
|
||||||
|
### Spam Signals to Avoid
|
||||||
|
- Words: "free", "guarantee", "no risk", "limited time", "act now", "click here"
|
||||||
|
- Excessive links (max 1 per email, ideally 0 in Steps 1-2)
|
||||||
|
- Images in cold email (never)
|
||||||
|
- HTML formatting (plain text only)
|
||||||
|
- All-caps words
|
||||||
|
- Exclamation points in subject lines
|
||||||
|
|
||||||
|
## Upload Requirements
|
||||||
|
Leads must have:
|
||||||
|
- `firstName` (required — filter out if missing)
|
||||||
|
- `email` (required)
|
||||||
|
- `companyName` (required for `{{companyName}}` variable)
|
||||||
|
- `personalization` (required if using `{{personalization}}` in sequence)
|
||||||
|
|
||||||
|
Validate list before upload. Bad data = bad deliverability.
|
||||||
1
outbound-engine/requirements.txt
Normal file
1
outbound-engine/requirements.txt
Normal file
|
|
@ -0,0 +1 @@
|
||||||
|
requests>=2.28.0
|
||||||
240
outbound-engine/scripts/cold-outbound-sender.py
Normal file
240
outbound-engine/scripts/cold-outbound-sender.py
Normal file
|
|
@ -0,0 +1,240 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Cold Outbound Sender — sends approved emails via SMTP or a configured email CLI.
|
||||||
|
|
||||||
|
Reads from a JSON file of approved prospects, sends up to N/day,
|
||||||
|
logs to a history file.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python3 cold-outbound-sender.py [--dry-run] [--max N]
|
||||||
|
python3 cold-outbound-sender.py --approved-file path/to/approved.json
|
||||||
|
python3 cold-outbound-sender.py --send-method smtp
|
||||||
|
|
||||||
|
Environment variables:
|
||||||
|
SMTP_HOST, SMTP_PORT, SMTP_USER, SMTP_PASSWORD — for SMTP sending
|
||||||
|
SENDER_EMAIL — sender email address
|
||||||
|
SENDER_NAME — sender display name
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import smtplib
|
||||||
|
import subprocess
|
||||||
|
import sys
|
||||||
|
from datetime import datetime
|
||||||
|
from email.mime.text import MIMEText
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
|
||||||
|
DEFAULT_MAX_PER_DAY = 10
|
||||||
|
DEFAULT_APPROVED_FILE = "./data/cold-outbound-approved.json"
|
||||||
|
DEFAULT_HISTORY_FILE = "./data/cold-outbound-history.json"
|
||||||
|
|
||||||
|
|
||||||
|
def validate_outbound(text):
|
||||||
|
"""Basic validation for outbound content. Returns (ok, text)."""
|
||||||
|
if not text or not isinstance(text, str):
|
||||||
|
return False, text
|
||||||
|
# Check for common leaked credential patterns
|
||||||
|
suspicious_patterns = [
|
||||||
|
r'sk-[a-zA-Z0-9]{20,}', # API keys
|
||||||
|
r'Bearer [a-zA-Z0-9\-_.]+', # Auth headers
|
||||||
|
r'/Users/[a-zA-Z]+/', # Local paths
|
||||||
|
r'password\s*[:=]\s*\S+', # Password patterns
|
||||||
|
]
|
||||||
|
import re
|
||||||
|
for pattern in suspicious_patterns:
|
||||||
|
if re.search(pattern, text, re.IGNORECASE):
|
||||||
|
return False, text
|
||||||
|
return True, text
|
||||||
|
|
||||||
|
|
||||||
|
def load_history(history_path):
|
||||||
|
if os.path.exists(history_path):
|
||||||
|
try:
|
||||||
|
with open(history_path) as f:
|
||||||
|
return json.load(f)
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
return []
|
||||||
|
|
||||||
|
|
||||||
|
def save_history(history, history_path):
|
||||||
|
os.makedirs(os.path.dirname(history_path), exist_ok=True)
|
||||||
|
with open(history_path, 'w') as f:
|
||||||
|
json.dump(history, f, indent=2)
|
||||||
|
|
||||||
|
|
||||||
|
def count_sent_today(history):
|
||||||
|
today = datetime.now().strftime("%Y-%m-%d")
|
||||||
|
return sum(1 for h in history if h.get("sent_date", "").startswith(today))
|
||||||
|
|
||||||
|
|
||||||
|
def send_email_smtp(to, subject, body, sender_email, sender_name,
|
||||||
|
smtp_host, smtp_port, smtp_user, smtp_password, dry_run=False):
|
||||||
|
"""Send via SMTP."""
|
||||||
|
ok_subj, subject = validate_outbound(subject)
|
||||||
|
ok_body, body = validate_outbound(body)
|
||||||
|
if not ok_subj or not ok_body:
|
||||||
|
print(f" 🛡️ Email to {to} BLOCKED by validation (suspicious content detected)")
|
||||||
|
return False
|
||||||
|
|
||||||
|
if dry_run:
|
||||||
|
print(f" [DRY RUN] Would send to {to}: {subject}")
|
||||||
|
return True
|
||||||
|
|
||||||
|
try:
|
||||||
|
msg = MIMEText(body, 'plain')
|
||||||
|
msg['Subject'] = subject
|
||||||
|
msg['From'] = f"{sender_name} <{sender_email}>"
|
||||||
|
msg['To'] = to
|
||||||
|
|
||||||
|
with smtplib.SMTP(smtp_host, int(smtp_port)) as server:
|
||||||
|
server.starttls()
|
||||||
|
server.login(smtp_user, smtp_password)
|
||||||
|
server.sendmail(sender_email, [to], msg.as_string())
|
||||||
|
|
||||||
|
print(f" ✅ Sent to {to}: {subject}")
|
||||||
|
return True
|
||||||
|
except Exception as e:
|
||||||
|
print(f" ❌ Error sending to {to}: {e}", file=sys.stderr)
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def send_email_cli(to, subject, body, sender_email, sender_name, cli_command, dry_run=False):
|
||||||
|
"""Send via a CLI tool (e.g., gog, msmtp, mailx)."""
|
||||||
|
ok_subj, subject = validate_outbound(subject)
|
||||||
|
ok_body, body = validate_outbound(body)
|
||||||
|
if not ok_subj or not ok_body:
|
||||||
|
print(f" 🛡️ Email to {to} BLOCKED by validation (suspicious content detected)")
|
||||||
|
return False
|
||||||
|
|
||||||
|
if dry_run:
|
||||||
|
print(f" [DRY RUN] Would send to {to}: {subject}")
|
||||||
|
return True
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Default CLI pattern: gog gmail send
|
||||||
|
cmd = cli_command.split() + [
|
||||||
|
"--to", to,
|
||||||
|
"--subject", subject,
|
||||||
|
"--body", body,
|
||||||
|
"--from", f"{sender_name} <{sender_email}>",
|
||||||
|
]
|
||||||
|
result = subprocess.run(cmd, capture_output=True, text=True, timeout=30)
|
||||||
|
if result.returncode == 0:
|
||||||
|
print(f" ✅ Sent to {to}: {subject}")
|
||||||
|
return True
|
||||||
|
else:
|
||||||
|
print(f" ❌ Failed to send to {to}: {result.stderr}", file=sys.stderr)
|
||||||
|
return False
|
||||||
|
except Exception as e:
|
||||||
|
print(f" ❌ Error sending to {to}: {e}", file=sys.stderr)
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(description="Cold Outbound Sender")
|
||||||
|
parser.add_argument("--dry-run", action="store_true", help="Don't actually send emails")
|
||||||
|
parser.add_argument("--max", type=int, default=DEFAULT_MAX_PER_DAY,
|
||||||
|
help=f"Max emails per day (default: {DEFAULT_MAX_PER_DAY})")
|
||||||
|
parser.add_argument("--approved-file", default=DEFAULT_APPROVED_FILE,
|
||||||
|
help="Path to approved prospects JSON file")
|
||||||
|
parser.add_argument("--history-file", default=DEFAULT_HISTORY_FILE,
|
||||||
|
help="Path to send history JSON file")
|
||||||
|
parser.add_argument("--send-method", choices=["smtp", "cli"], default="smtp",
|
||||||
|
help="Send method: smtp or cli (default: smtp)")
|
||||||
|
parser.add_argument("--cli-command", default="gog gmail send",
|
||||||
|
help="CLI command for sending (used with --send-method cli)")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
# Load config from env
|
||||||
|
sender_email = os.environ.get("SENDER_EMAIL", "")
|
||||||
|
sender_name = os.environ.get("SENDER_NAME", "")
|
||||||
|
smtp_host = os.environ.get("SMTP_HOST", "smtp.gmail.com")
|
||||||
|
smtp_port = os.environ.get("SMTP_PORT", "587")
|
||||||
|
smtp_user = os.environ.get("SMTP_USER", sender_email)
|
||||||
|
smtp_password = os.environ.get("SMTP_PASSWORD", "")
|
||||||
|
|
||||||
|
if not os.path.exists(args.approved_file):
|
||||||
|
print(f"No approved prospects file found at {args.approved_file}")
|
||||||
|
sys.exit(0)
|
||||||
|
|
||||||
|
with open(args.approved_file) as f:
|
||||||
|
approved = json.load(f)
|
||||||
|
|
||||||
|
history = load_history(args.history_file)
|
||||||
|
sent_today = count_sent_today(history)
|
||||||
|
remaining = args.max - sent_today
|
||||||
|
|
||||||
|
if remaining <= 0:
|
||||||
|
print(f"Already sent {sent_today} emails today (max {args.max}). Stopping.")
|
||||||
|
sys.exit(0)
|
||||||
|
|
||||||
|
sent_count = 0
|
||||||
|
for prospect in approved:
|
||||||
|
if sent_count >= remaining:
|
||||||
|
break
|
||||||
|
|
||||||
|
email = prospect.get("email")
|
||||||
|
if not email or email == "Unknown":
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Check if already sent
|
||||||
|
if any(h.get("email") == email for h in history):
|
||||||
|
print(f" SKIP {email}: already in history")
|
||||||
|
continue
|
||||||
|
|
||||||
|
angle_key = prospect.get("approved_angle", "A")
|
||||||
|
drafts = prospect.get("angle_drafts", {})
|
||||||
|
draft = drafts.get(angle_key, {})
|
||||||
|
|
||||||
|
subject = draft.get("subject", f"Quick question for {prospect.get('company', 'you')}")
|
||||||
|
body = draft.get("body", "")
|
||||||
|
|
||||||
|
if not body:
|
||||||
|
print(f" SKIP {email}: no draft body for angle {angle_key}")
|
||||||
|
continue
|
||||||
|
|
||||||
|
if args.send_method == "smtp":
|
||||||
|
if not smtp_password and not args.dry_run:
|
||||||
|
print("ERROR: SMTP_PASSWORD env var required for smtp sending.")
|
||||||
|
sys.exit(1)
|
||||||
|
success = send_email_smtp(
|
||||||
|
email, subject, body, sender_email, sender_name,
|
||||||
|
smtp_host, smtp_port, smtp_user, smtp_password, args.dry_run
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
success = send_email_cli(
|
||||||
|
email, subject, body, sender_email, sender_name,
|
||||||
|
args.cli_command, args.dry_run
|
||||||
|
)
|
||||||
|
|
||||||
|
if success:
|
||||||
|
history.append({
|
||||||
|
"company": prospect.get("company", ""),
|
||||||
|
"contact_name": prospect.get("contact_name", ""),
|
||||||
|
"email": email,
|
||||||
|
"angle": angle_key,
|
||||||
|
"subject": subject,
|
||||||
|
"sent_date": datetime.now().isoformat(),
|
||||||
|
"score": prospect.get("score", 0),
|
||||||
|
})
|
||||||
|
sent_count += 1
|
||||||
|
|
||||||
|
if not args.dry_run:
|
||||||
|
save_history(history, args.history_file)
|
||||||
|
|
||||||
|
# Remove sent prospects from approved file
|
||||||
|
if not args.dry_run and sent_count > 0:
|
||||||
|
sent_emails = {h["email"] for h in history}
|
||||||
|
remaining_approved = [p for p in approved if p.get("email") not in sent_emails]
|
||||||
|
with open(args.approved_file, 'w') as f:
|
||||||
|
json.dump(remaining_approved, f, indent=2)
|
||||||
|
|
||||||
|
print(f"\nSent {sent_count} emails ({'dry run' if args.dry_run else 'live'}). Total today: {sent_today + sent_count}")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
483
outbound-engine/scripts/competitive-monitor.py
Normal file
483
outbound-engine/scripts/competitive-monitor.py
Normal file
|
|
@ -0,0 +1,483 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Competitive Monitor — tracks pricing, blog posts, and feature changes across competitors.
|
||||||
|
|
||||||
|
Generates weekly competitive intelligence diffs. Configurable competitor list.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python3 competitive-monitor.py
|
||||||
|
python3 competitive-monitor.py --company acme
|
||||||
|
python3 competitive-monitor.py --output report.md
|
||||||
|
python3 competitive-monitor.py --config competitors.json
|
||||||
|
|
||||||
|
Competitor config can be provided via:
|
||||||
|
1. --config flag pointing to a JSON file
|
||||||
|
2. COMPETITORS_CONFIG env var pointing to a JSON file
|
||||||
|
3. Built-in example competitors (for demo purposes)
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import sys
|
||||||
|
import urllib.request
|
||||||
|
import urllib.parse
|
||||||
|
from datetime import datetime, timedelta
|
||||||
|
from difflib import unified_diff
|
||||||
|
from typing import Dict, List, Optional
|
||||||
|
from html.parser import HTMLParser
|
||||||
|
from urllib.error import URLError, HTTPError
|
||||||
|
|
||||||
|
|
||||||
|
def validate_text(text, max_length=500000):
|
||||||
|
"""Basic input validation for scraped content."""
|
||||||
|
if not text or not isinstance(text, str):
|
||||||
|
return text
|
||||||
|
# Truncate extremely long content
|
||||||
|
if len(text) > max_length:
|
||||||
|
text = text[:max_length]
|
||||||
|
return text
|
||||||
|
|
||||||
|
|
||||||
|
class BlogExtractor(HTMLParser):
|
||||||
|
"""Extract blog post titles and dates from HTML."""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
super().__init__()
|
||||||
|
self.posts = []
|
||||||
|
self.current_title = None
|
||||||
|
self.current_date = None
|
||||||
|
self.in_title = False
|
||||||
|
self.in_date = False
|
||||||
|
self.title_tags = ['h1', 'h2', 'h3', 'h4']
|
||||||
|
|
||||||
|
def handle_starttag(self, tag, attrs):
|
||||||
|
if tag.lower() in self.title_tags:
|
||||||
|
self.in_title = True
|
||||||
|
for name, value in attrs:
|
||||||
|
if name in ['class', 'id'] and any(
|
||||||
|
date_word in value.lower() for date_word in ['date', 'time', 'published']
|
||||||
|
):
|
||||||
|
self.in_date = True
|
||||||
|
|
||||||
|
def handle_endtag(self, tag):
|
||||||
|
if tag.lower() in self.title_tags:
|
||||||
|
self.in_title = False
|
||||||
|
self.in_date = False
|
||||||
|
|
||||||
|
def handle_data(self, data):
|
||||||
|
if self.in_title and data.strip():
|
||||||
|
self.current_title = data.strip()
|
||||||
|
|
||||||
|
if self.in_date and data.strip():
|
||||||
|
date_match = re.search(
|
||||||
|
r'\b\d{1,2}[/-]\d{1,2}[/-]\d{2,4}\b|\b\w+ \d{1,2},? \d{4}\b', data
|
||||||
|
)
|
||||||
|
if date_match:
|
||||||
|
self.current_date = date_match.group()
|
||||||
|
|
||||||
|
if self.current_title and self.current_date:
|
||||||
|
self.posts.append({
|
||||||
|
'title': self.current_title,
|
||||||
|
'date': self.current_date,
|
||||||
|
})
|
||||||
|
self.current_title = None
|
||||||
|
self.current_date = None
|
||||||
|
|
||||||
|
|
||||||
|
class CompetitiveMonitor:
|
||||||
|
"""Main competitive monitoring class."""
|
||||||
|
|
||||||
|
# Example competitors for demo. Override with --config or COMPETITORS_CONFIG.
|
||||||
|
EXAMPLE_COMPETITORS = {
|
||||||
|
'competitor_a': {
|
||||||
|
'name': 'Competitor A',
|
||||||
|
'domain': 'competitor-a.com',
|
||||||
|
'pricing_url': 'https://www.competitor-a.com/pricing',
|
||||||
|
'blog_url': 'https://www.competitor-a.com/blog',
|
||||||
|
'linkedin_query': 'Competitor A site:linkedin.com',
|
||||||
|
'jobs_query': 'Competitor A careers OR jobs',
|
||||||
|
},
|
||||||
|
'competitor_b': {
|
||||||
|
'name': 'Competitor B',
|
||||||
|
'domain': 'competitor-b.com',
|
||||||
|
'pricing_url': 'https://www.competitor-b.com/pricing',
|
||||||
|
'blog_url': 'https://www.competitor-b.com/blog',
|
||||||
|
'linkedin_query': 'Competitor B site:linkedin.com',
|
||||||
|
'jobs_query': 'Competitor B careers OR jobs',
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
def __init__(self, data_dir: str = None, competitors: dict = None):
|
||||||
|
self.data_dir = data_dir or os.path.join(os.getcwd(), 'data', 'competitive')
|
||||||
|
self.pricing_dir = os.path.join(self.data_dir, 'pricing-snapshots')
|
||||||
|
self.history_dir = os.path.join(self.data_dir, 'scan-history')
|
||||||
|
self.competitors = competitors or self.EXAMPLE_COMPETITORS
|
||||||
|
|
||||||
|
os.makedirs(self.pricing_dir, exist_ok=True)
|
||||||
|
os.makedirs(self.history_dir, exist_ok=True)
|
||||||
|
|
||||||
|
def fetch_url(self, url: str, timeout: int = 10) -> Optional[str]:
|
||||||
|
"""Fetch URL content with error handling."""
|
||||||
|
try:
|
||||||
|
headers = {
|
||||||
|
'User-Agent': (
|
||||||
|
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) '
|
||||||
|
'AppleWebKit/537.36 (KHTML, like Gecko) '
|
||||||
|
'Chrome/91.0.4472.124 Safari/537.36'
|
||||||
|
)
|
||||||
|
}
|
||||||
|
request = urllib.request.Request(url, headers=headers)
|
||||||
|
|
||||||
|
with urllib.request.urlopen(request, timeout=timeout) as response:
|
||||||
|
content = response.read().decode('utf-8', errors='ignore')
|
||||||
|
content = validate_text(content)
|
||||||
|
return content
|
||||||
|
|
||||||
|
except (URLError, HTTPError, UnicodeDecodeError) as e:
|
||||||
|
print(f"❌ Error fetching {url}: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
def extract_blog_posts(self, html: str) -> List[Dict]:
|
||||||
|
"""Extract blog posts from HTML."""
|
||||||
|
if not html:
|
||||||
|
return []
|
||||||
|
|
||||||
|
extractor = BlogExtractor()
|
||||||
|
try:
|
||||||
|
extractor.feed(html)
|
||||||
|
return extractor.posts
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Error extracting blog posts: {e}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
def is_recent_post(self, date_str: str, days_back: int = 7) -> bool:
|
||||||
|
"""Check if post is from last N days."""
|
||||||
|
if not date_str:
|
||||||
|
return False
|
||||||
|
|
||||||
|
formats = [
|
||||||
|
'%m/%d/%Y', '%m-%d-%Y', '%Y-%m-%d',
|
||||||
|
'%B %d, %Y', '%b %d, %Y', '%B %d %Y', '%b %d %Y',
|
||||||
|
]
|
||||||
|
|
||||||
|
for fmt in formats:
|
||||||
|
try:
|
||||||
|
post_date = datetime.strptime(date_str, fmt)
|
||||||
|
cutoff_date = datetime.now() - timedelta(days=days_back)
|
||||||
|
return post_date >= cutoff_date
|
||||||
|
except ValueError:
|
||||||
|
continue
|
||||||
|
|
||||||
|
return False
|
||||||
|
|
||||||
|
def get_pricing_diff(self, company_key: str, current_content: str) -> Optional[str]:
|
||||||
|
"""Compare current pricing with previous snapshot."""
|
||||||
|
today = datetime.now().strftime('%Y-%m-%d')
|
||||||
|
pricing_file = os.path.join(self.pricing_dir, f'{company_key}-{today}.txt')
|
||||||
|
|
||||||
|
with open(pricing_file, 'w', encoding='utf-8') as f:
|
||||||
|
f.write(current_content)
|
||||||
|
|
||||||
|
previous_files = [
|
||||||
|
f for f in os.listdir(self.pricing_dir)
|
||||||
|
if f.startswith(f'{company_key}-') and f != f'{company_key}-{today}.txt'
|
||||||
|
]
|
||||||
|
|
||||||
|
if not previous_files:
|
||||||
|
return "🆕 First pricing snapshot saved"
|
||||||
|
|
||||||
|
previous_files.sort(reverse=True)
|
||||||
|
previous_file = os.path.join(self.pricing_dir, previous_files[0])
|
||||||
|
|
||||||
|
try:
|
||||||
|
with open(previous_file, 'r', encoding='utf-8') as f:
|
||||||
|
previous_content = f.read()
|
||||||
|
|
||||||
|
if current_content.strip() == previous_content.strip():
|
||||||
|
return None
|
||||||
|
|
||||||
|
current_lines = current_content.splitlines()
|
||||||
|
previous_lines = previous_content.splitlines()
|
||||||
|
|
||||||
|
diff = list(unified_diff(
|
||||||
|
previous_lines, current_lines,
|
||||||
|
fromfile='previous', tofile='current', n=0
|
||||||
|
))
|
||||||
|
|
||||||
|
changes = len([
|
||||||
|
line for line in diff
|
||||||
|
if line.startswith(('+', '-')) and not line.startswith(('+++', '---'))
|
||||||
|
])
|
||||||
|
|
||||||
|
return f"🔍 {changes} lines changed since last snapshot"
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
return f"❌ Error comparing snapshots: {e}"
|
||||||
|
|
||||||
|
def scan_competitor(self, company_key: str) -> Dict:
|
||||||
|
"""Scan single competitor."""
|
||||||
|
company = self.competitors[company_key]
|
||||||
|
print(f"\n🔍 Scanning {company['name']}...")
|
||||||
|
|
||||||
|
results = {
|
||||||
|
'company': company['name'],
|
||||||
|
'domain': company['domain'],
|
||||||
|
'scan_time': datetime.now().isoformat(),
|
||||||
|
'pricing': {},
|
||||||
|
'blog': {},
|
||||||
|
'search_queries': {
|
||||||
|
'linkedin': company.get('linkedin_query', ''),
|
||||||
|
'jobs': company.get('jobs_query', ''),
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
# Fetch pricing page
|
||||||
|
pricing_url = company.get('pricing_url')
|
||||||
|
if pricing_url:
|
||||||
|
print(f" 📄 Fetching pricing: {pricing_url}")
|
||||||
|
pricing_content = self.fetch_url(pricing_url)
|
||||||
|
|
||||||
|
if pricing_content:
|
||||||
|
clean_content = re.sub(r'<[^>]+>', '', pricing_content)
|
||||||
|
clean_content = re.sub(r'\s+', ' ', clean_content).strip()
|
||||||
|
|
||||||
|
pricing_diff = self.get_pricing_diff(company_key, clean_content)
|
||||||
|
|
||||||
|
results['pricing'] = {
|
||||||
|
'url': pricing_url,
|
||||||
|
'fetched': True,
|
||||||
|
'content_length': len(clean_content),
|
||||||
|
'diff': pricing_diff,
|
||||||
|
}
|
||||||
|
else:
|
||||||
|
results['pricing'] = {
|
||||||
|
'url': pricing_url,
|
||||||
|
'fetched': False,
|
||||||
|
'error': 'Failed to fetch pricing page',
|
||||||
|
}
|
||||||
|
|
||||||
|
# Fetch blog page
|
||||||
|
blog_url = company.get('blog_url')
|
||||||
|
if blog_url:
|
||||||
|
print(f" 📝 Fetching blog: {blog_url}")
|
||||||
|
blog_content = self.fetch_url(blog_url)
|
||||||
|
|
||||||
|
recent_posts = []
|
||||||
|
if blog_content:
|
||||||
|
all_posts = self.extract_blog_posts(blog_content)
|
||||||
|
recent_posts = [post for post in all_posts if self.is_recent_post(post['date'])]
|
||||||
|
|
||||||
|
results['blog'] = {
|
||||||
|
'url': blog_url,
|
||||||
|
'fetched': bool(blog_content),
|
||||||
|
'total_posts_found': len(self.extract_blog_posts(blog_content)) if blog_content else 0,
|
||||||
|
'recent_posts': recent_posts,
|
||||||
|
}
|
||||||
|
|
||||||
|
return results
|
||||||
|
|
||||||
|
def generate_report(self, scan_results: List[Dict], threat_keywords: List[str] = None) -> str:
|
||||||
|
"""Generate markdown report."""
|
||||||
|
today = datetime.now().strftime('%Y-%m-%d')
|
||||||
|
|
||||||
|
# Configurable threat keywords (topics that signal competitive overlap)
|
||||||
|
if threat_keywords is None:
|
||||||
|
threat_keywords = ['funnel', 'conversion', 'landing page', 'ab test', 'optimize', 'cro']
|
||||||
|
|
||||||
|
report = f"""# 🔍 Competitive Intelligence Report - {today}
|
||||||
|
|
||||||
|
## Executive Summary
|
||||||
|
|
||||||
|
Monitored {len(scan_results)} competitors for pricing changes, recent blog activity, and market signals.
|
||||||
|
|
||||||
|
"""
|
||||||
|
|
||||||
|
threats = []
|
||||||
|
interesting = []
|
||||||
|
opportunities = []
|
||||||
|
search_queries = []
|
||||||
|
|
||||||
|
for result in scan_results:
|
||||||
|
company = result['company']
|
||||||
|
|
||||||
|
pricing = result.get('pricing', {})
|
||||||
|
if pricing.get('diff') and '🔍' in str(pricing['diff']):
|
||||||
|
interesting.append(
|
||||||
|
f"**{company}**: {pricing['diff']} → *Monitor for pricing strategy shifts*"
|
||||||
|
)
|
||||||
|
elif pricing.get('diff') and '🆕' in str(pricing['diff']):
|
||||||
|
interesting.append(
|
||||||
|
f"**{company}**: {pricing['diff']} → *Baseline established for future tracking*"
|
||||||
|
)
|
||||||
|
|
||||||
|
blog = result.get('blog', {})
|
||||||
|
recent_posts = blog.get('recent_posts', [])
|
||||||
|
|
||||||
|
if recent_posts:
|
||||||
|
post_titles = [
|
||||||
|
post['title'][:80] + '...' if len(post['title']) > 80 else post['title']
|
||||||
|
for post in recent_posts[:3]
|
||||||
|
]
|
||||||
|
content_lower = ' '.join(post_titles).lower()
|
||||||
|
|
||||||
|
if any(keyword in content_lower for keyword in threat_keywords):
|
||||||
|
threats.append(
|
||||||
|
f"**{company}**: {len(recent_posts)} recent posts, potential feature overlap → *Review competitive positioning*"
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
interesting.append(
|
||||||
|
f"**{company}**: {len(recent_posts)} recent posts → *{', '.join(post_titles[:2])}*"
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
opportunities.append(
|
||||||
|
f"**{company}**: No recent blog content → *Content marketing gap you can exploit*"
|
||||||
|
)
|
||||||
|
|
||||||
|
sq = result.get('search_queries', {})
|
||||||
|
if sq.get('linkedin'):
|
||||||
|
search_queries.append(f"LinkedIn search: {sq['linkedin']}")
|
||||||
|
if sq.get('jobs'):
|
||||||
|
search_queries.append(f"Jobs search: {sq['jobs']}")
|
||||||
|
|
||||||
|
if threats:
|
||||||
|
report += "## 🔴 THREATS\n\n"
|
||||||
|
for threat in threats:
|
||||||
|
report += f"- {threat}\n"
|
||||||
|
report += "\n"
|
||||||
|
|
||||||
|
if interesting:
|
||||||
|
report += "## 🟡 INTERESTING\n\n"
|
||||||
|
for item in interesting:
|
||||||
|
report += f"- {item}\n"
|
||||||
|
report += "\n"
|
||||||
|
|
||||||
|
if opportunities:
|
||||||
|
report += "## 🟢 OPPORTUNITIES\n\n"
|
||||||
|
for opp in opportunities:
|
||||||
|
report += f"- {opp}\n"
|
||||||
|
report += "\n"
|
||||||
|
|
||||||
|
if search_queries:
|
||||||
|
report += "## 🔎 LinkedIn/Jobs Search Queries\n\n"
|
||||||
|
report += "Run these queries for social/hiring signals:\n\n"
|
||||||
|
for query in search_queries:
|
||||||
|
report += f"- `{query}`\n"
|
||||||
|
report += "\n"
|
||||||
|
|
||||||
|
report += "## 📊 Technical Summary\n\n"
|
||||||
|
for result in scan_results:
|
||||||
|
company = result['company']
|
||||||
|
pricing = result.get('pricing', {})
|
||||||
|
blog = result.get('blog', {})
|
||||||
|
|
||||||
|
report += f"**{company}:**\n"
|
||||||
|
report += f"- Pricing: {'✅' if pricing.get('fetched') else '❌'} {pricing.get('diff', 'No changes')}\n"
|
||||||
|
report += f"- Blog: {'✅' if blog.get('fetched') else '❌'} {len(blog.get('recent_posts', []))} recent posts\n\n"
|
||||||
|
|
||||||
|
return report
|
||||||
|
|
||||||
|
def save_results(self, scan_results: List[Dict]) -> str:
|
||||||
|
"""Save scan results to files."""
|
||||||
|
today = datetime.now().strftime('%Y-%m-%d')
|
||||||
|
|
||||||
|
latest_file = os.path.join(self.data_dir, 'latest-scan.json')
|
||||||
|
with open(latest_file, 'w') as f:
|
||||||
|
json.dump(scan_results, f, indent=2)
|
||||||
|
|
||||||
|
history_file = os.path.join(self.history_dir, f'{today}.json')
|
||||||
|
with open(history_file, 'w') as f:
|
||||||
|
json.dump(scan_results, f, indent=2)
|
||||||
|
|
||||||
|
return latest_file
|
||||||
|
|
||||||
|
def run(self, company_filter: Optional[str] = None) -> str:
|
||||||
|
"""Run competitive monitoring scan."""
|
||||||
|
print("🚀 Starting competitive monitoring scan...")
|
||||||
|
|
||||||
|
companies_to_scan = (
|
||||||
|
[company_filter] if company_filter else list(self.competitors.keys())
|
||||||
|
)
|
||||||
|
|
||||||
|
if company_filter and company_filter not in self.competitors:
|
||||||
|
print(f"❌ Unknown company: {company_filter}")
|
||||||
|
print(f"Available companies: {', '.join(self.competitors.keys())}")
|
||||||
|
return ""
|
||||||
|
|
||||||
|
scan_results = []
|
||||||
|
for company_key in companies_to_scan:
|
||||||
|
try:
|
||||||
|
result = self.scan_competitor(company_key)
|
||||||
|
scan_results.append(result)
|
||||||
|
except Exception as e:
|
||||||
|
print(f"❌ Error scanning {company_key}: {e}")
|
||||||
|
|
||||||
|
self.save_results(scan_results)
|
||||||
|
report = self.generate_report(scan_results)
|
||||||
|
|
||||||
|
print(f"\n✅ Scan complete! Results for {len(scan_results)} companies.")
|
||||||
|
return report
|
||||||
|
|
||||||
|
|
||||||
|
def load_competitors_config(config_path: str) -> dict:
|
||||||
|
"""Load competitors from a JSON config file.
|
||||||
|
|
||||||
|
Expected format:
|
||||||
|
{
|
||||||
|
"competitor_key": {
|
||||||
|
"name": "Competitor Name",
|
||||||
|
"domain": "competitor.com",
|
||||||
|
"pricing_url": "https://competitor.com/pricing",
|
||||||
|
"blog_url": "https://competitor.com/blog",
|
||||||
|
"linkedin_query": "Competitor Name site:linkedin.com",
|
||||||
|
"jobs_query": "Competitor Name careers OR jobs"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
"""
|
||||||
|
with open(config_path, 'r') as f:
|
||||||
|
return json.load(f)
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(description='Competitive Monitoring Scraper')
|
||||||
|
parser.add_argument('--company', help='Scan specific company only (by key)')
|
||||||
|
parser.add_argument('--output', '-o', help='Save report to file')
|
||||||
|
parser.add_argument('--config', help='Path to competitors JSON config file')
|
||||||
|
parser.add_argument('--data-dir', help='Directory for storing scan data')
|
||||||
|
parser.add_argument('--threat-keywords', nargs='*',
|
||||||
|
help='Keywords that signal competitive overlap (space-separated)')
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
# Load competitor config
|
||||||
|
config_path = args.config or os.environ.get('COMPETITORS_CONFIG')
|
||||||
|
competitors = None
|
||||||
|
if config_path:
|
||||||
|
try:
|
||||||
|
competitors = load_competitors_config(config_path)
|
||||||
|
print(f"📋 Loaded {len(competitors)} competitors from {config_path}")
|
||||||
|
except Exception as e:
|
||||||
|
print(f"❌ Error loading config: {e}")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
monitor = CompetitiveMonitor(
|
||||||
|
data_dir=args.data_dir,
|
||||||
|
competitors=competitors,
|
||||||
|
)
|
||||||
|
|
||||||
|
report = monitor.run(args.company)
|
||||||
|
|
||||||
|
if report:
|
||||||
|
print("\n" + "=" * 60)
|
||||||
|
print(report)
|
||||||
|
print("=" * 60)
|
||||||
|
|
||||||
|
if args.output:
|
||||||
|
with open(args.output, 'w') as f:
|
||||||
|
f.write(report)
|
||||||
|
print(f"\n📁 Report saved to: {args.output}")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
main()
|
||||||
278
outbound-engine/scripts/cross-signal-detector.py
Normal file
278
outbound-engine/scripts/cross-signal-detector.py
Normal file
|
|
@ -0,0 +1,278 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Cross-Signal Detector — finds overlapping signals across multiple data sources.
|
||||||
|
|
||||||
|
When your SEO data and sales data both flag the same company, that's a cross-signal
|
||||||
|
worth acting on. This script scans agent outputs and data files for company names,
|
||||||
|
industry verticals, and keyword clusters, then finds overlaps.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python3 cross-signal-detector.py
|
||||||
|
python3 cross-signal-detector.py --data-dir ./data/agent-outputs
|
||||||
|
python3 cross-signal-detector.py --hours 48
|
||||||
|
python3 cross-signal-detector.py --output cross-signals.json
|
||||||
|
|
||||||
|
Environment variables:
|
||||||
|
DATA_DIR — directory containing agent output files to scan
|
||||||
|
OUTPUT_FILE — where to write the signal detection results
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import glob
|
||||||
|
from datetime import datetime, timedelta, timezone
|
||||||
|
from collections import defaultdict
|
||||||
|
|
||||||
|
|
||||||
|
# Words to exclude from company name extraction (common English words that look like names)
|
||||||
|
STOP_WORDS = {
|
||||||
|
'The', 'This', 'That', 'What', 'How', 'Why', 'When', 'Where',
|
||||||
|
'For', 'From', 'With', 'About', 'Into', 'Over', 'After',
|
||||||
|
'Before', 'Between', 'Under', 'During', 'Through',
|
||||||
|
'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday',
|
||||||
|
'Saturday', 'Sunday', 'January', 'February', 'March',
|
||||||
|
'April', 'May', 'June', 'July', 'August', 'September',
|
||||||
|
'October', 'November', 'December',
|
||||||
|
'None', 'True', 'False', 'Error', 'Warning',
|
||||||
|
}
|
||||||
|
|
||||||
|
# Configurable: add your own team names / internal terms to exclude
|
||||||
|
CUSTOM_STOP_WORDS = set(os.environ.get('SIGNAL_STOP_WORDS', '').split(',')) if os.environ.get('SIGNAL_STOP_WORDS') else set()
|
||||||
|
|
||||||
|
|
||||||
|
def get_recent_files(directory, hours=24):
|
||||||
|
"""Get files modified in the last N hours."""
|
||||||
|
cutoff = datetime.now(timezone.utc) - timedelta(hours=hours)
|
||||||
|
recent = []
|
||||||
|
if not os.path.isdir(directory):
|
||||||
|
return recent
|
||||||
|
for f in glob.glob(os.path.join(directory, "*")):
|
||||||
|
if os.path.isfile(f):
|
||||||
|
mtime = datetime.fromtimestamp(os.path.getmtime(f), tz=timezone.utc)
|
||||||
|
if mtime > cutoff:
|
||||||
|
recent.append(f)
|
||||||
|
return recent
|
||||||
|
|
||||||
|
|
||||||
|
def extract_companies(text):
|
||||||
|
"""Extract company names (capitalized words, common patterns)."""
|
||||||
|
companies = set()
|
||||||
|
all_stop = STOP_WORDS | CUSTOM_STOP_WORDS
|
||||||
|
for match in re.findall(
|
||||||
|
r'\b([A-Z][a-zA-Z]+(?:\.[a-zA-Z]+)?(?:\s+(?:AI|Inc|Corp|Labs|Tech|io))?)\b',
|
||||||
|
text
|
||||||
|
):
|
||||||
|
if len(match) > 2 and match not in all_stop:
|
||||||
|
companies.add(match)
|
||||||
|
return companies
|
||||||
|
|
||||||
|
|
||||||
|
def extract_keywords(text):
|
||||||
|
"""Extract keyword themes from marketing/business text."""
|
||||||
|
keywords = set()
|
||||||
|
patterns = [
|
||||||
|
r'(?:ai|artificial intelligence)\s+(?:marketing|agent|tool|saas|automation)',
|
||||||
|
r'(?:seo|content|digital)\s+(?:marketing|strategy|optimization|growth)',
|
||||||
|
r'(?:b2b|saas|enterprise)\s+(?:marketing|growth|sales)',
|
||||||
|
r'(?:social media|linkedin|twitter|youtube)\s+(?:marketing|growth|strategy)',
|
||||||
|
r'(?:email|outbound|cold)\s+(?:marketing|outreach|campaign)',
|
||||||
|
r'(?:paid|ppc|google)\s+(?:ads|advertising|media)',
|
||||||
|
]
|
||||||
|
text_lower = text.lower()
|
||||||
|
for p in patterns:
|
||||||
|
match = re.search(p, text_lower)
|
||||||
|
if match:
|
||||||
|
keywords.add(match.group())
|
||||||
|
return keywords
|
||||||
|
|
||||||
|
|
||||||
|
def extract_verticals(text):
|
||||||
|
"""Extract industry verticals."""
|
||||||
|
verticals = set()
|
||||||
|
vertical_keywords = {
|
||||||
|
'fintech': ['fintech', 'financial', 'banking', 'payments'],
|
||||||
|
'healthtech': ['healthtech', 'health tech', 'healthcare', 'medical'],
|
||||||
|
'edtech': ['edtech', 'education', 'learning platform'],
|
||||||
|
'ai_saas': ['ai saas', 'ai tool', 'ai agent', 'ai platform', 'artificial intelligence'],
|
||||||
|
'ecommerce': ['ecommerce', 'e-commerce', 'shopify', 'dtc', 'd2c'],
|
||||||
|
'cybersecurity': ['cybersecurity', 'security', 'infosec'],
|
||||||
|
'martech': ['martech', 'marketing tech', 'marketing tool'],
|
||||||
|
'hr_tech': ['hr tech', 'hiring', 'recruiting', 'talent'],
|
||||||
|
}
|
||||||
|
text_lower = text.lower()
|
||||||
|
for vertical, kws in vertical_keywords.items():
|
||||||
|
if any(kw in text_lower for kw in kws):
|
||||||
|
verticals.add(vertical)
|
||||||
|
return verticals
|
||||||
|
|
||||||
|
|
||||||
|
def read_file_safe(filepath):
|
||||||
|
"""Read file content safely."""
|
||||||
|
try:
|
||||||
|
with open(filepath) as f:
|
||||||
|
return f.read()
|
||||||
|
except Exception:
|
||||||
|
return ""
|
||||||
|
|
||||||
|
|
||||||
|
def categorize_file(filepath, agent_patterns=None):
|
||||||
|
"""Categorize a file by agent/source based on filename patterns.
|
||||||
|
|
||||||
|
Override with agent_patterns dict: {"pattern": "agent_name"}
|
||||||
|
"""
|
||||||
|
basename = os.path.basename(filepath).lower()
|
||||||
|
|
||||||
|
# Default patterns — customize these for your setup
|
||||||
|
default_patterns = {
|
||||||
|
'seo': 'seo',
|
||||||
|
'oracle': 'seo',
|
||||||
|
'content': 'content',
|
||||||
|
'flash': 'content',
|
||||||
|
'trend': 'content',
|
||||||
|
'deal': 'deal',
|
||||||
|
'cold': 'cold_outbound',
|
||||||
|
'outbound': 'cold_outbound',
|
||||||
|
'recruit': 'recruiting',
|
||||||
|
'hiring': 'recruiting',
|
||||||
|
}
|
||||||
|
|
||||||
|
patterns = agent_patterns or default_patterns
|
||||||
|
|
||||||
|
for pattern, agent in patterns.items():
|
||||||
|
if pattern in basename:
|
||||||
|
return agent
|
||||||
|
|
||||||
|
return 'other'
|
||||||
|
|
||||||
|
|
||||||
|
def detect_signals(data_dir, additional_data_dirs=None, hours=48, agent_patterns=None):
|
||||||
|
"""Main detection logic.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
data_dir: Primary directory to scan for agent output files
|
||||||
|
additional_data_dirs: Dict of {"agent_name": "glob_pattern"} for extra data
|
||||||
|
hours: How far back to look for files
|
||||||
|
agent_patterns: Dict of {"filename_pattern": "agent_name"} for categorization
|
||||||
|
"""
|
||||||
|
recent_files = get_recent_files(data_dir, hours=hours)
|
||||||
|
if not recent_files:
|
||||||
|
# Fallback to 7 days
|
||||||
|
recent_files = get_recent_files(data_dir, hours=168)
|
||||||
|
|
||||||
|
# Categorize by agent/source
|
||||||
|
agent_data = defaultdict(lambda: {
|
||||||
|
"files": [], "companies": set(), "keywords": set(), "verticals": set(), "text": ""
|
||||||
|
})
|
||||||
|
|
||||||
|
for f in recent_files:
|
||||||
|
agent = categorize_file(f, agent_patterns)
|
||||||
|
text = read_file_safe(f)
|
||||||
|
|
||||||
|
agent_data[agent]["files"].append(f)
|
||||||
|
agent_data[agent]["companies"].update(extract_companies(text))
|
||||||
|
agent_data[agent]["keywords"].update(extract_keywords(text))
|
||||||
|
agent_data[agent]["verticals"].update(extract_verticals(text))
|
||||||
|
agent_data[agent]["text"] += text + "\n"
|
||||||
|
|
||||||
|
# Scan additional data directories
|
||||||
|
if additional_data_dirs:
|
||||||
|
for agent, pattern in additional_data_dirs.items():
|
||||||
|
files = sorted(glob.glob(pattern))[-1:] # latest only
|
||||||
|
for f in files:
|
||||||
|
text = read_file_safe(f)
|
||||||
|
agent_data[agent]["companies"].update(extract_companies(text))
|
||||||
|
agent_data[agent]["keywords"].update(extract_keywords(text))
|
||||||
|
agent_data[agent]["verticals"].update(extract_verticals(text))
|
||||||
|
|
||||||
|
# Find overlaps
|
||||||
|
signals = []
|
||||||
|
agents_list = list(agent_data.keys())
|
||||||
|
|
||||||
|
# 1. Company overlap
|
||||||
|
for i, a1 in enumerate(agents_list):
|
||||||
|
for a2 in agents_list[i + 1:]:
|
||||||
|
common_companies = agent_data[a1]["companies"] & agent_data[a2]["companies"]
|
||||||
|
if common_companies:
|
||||||
|
confidence = min(95, 60 + len(common_companies) * 10)
|
||||||
|
signals.append({
|
||||||
|
"confidence": confidence,
|
||||||
|
"type": "company_overlap",
|
||||||
|
"agents": [a1, a2],
|
||||||
|
"signal": f"Company overlap: {', '.join(list(common_companies)[:5])} appearing in both {a1} and {a2}",
|
||||||
|
"recommended_play": f"Cross-reference {a1} and {a2} data for these companies — coordinate outreach/content",
|
||||||
|
"entities": list(common_companies)[:10],
|
||||||
|
})
|
||||||
|
|
||||||
|
# 2. Vertical overlap
|
||||||
|
for i, a1 in enumerate(agents_list):
|
||||||
|
for a2 in agents_list[i + 1:]:
|
||||||
|
common_verticals = agent_data[a1]["verticals"] & agent_data[a2]["verticals"]
|
||||||
|
if common_verticals:
|
||||||
|
confidence = min(90, 50 + len(common_verticals) * 15)
|
||||||
|
signals.append({
|
||||||
|
"confidence": confidence,
|
||||||
|
"type": "vertical_alignment",
|
||||||
|
"agents": [a1, a2],
|
||||||
|
"signal": f"Vertical alignment: {', '.join(common_verticals)} trending across {a1} + {a2}",
|
||||||
|
"recommended_play": f"Coordinated push into {', '.join(common_verticals)}: content + outbound + SEO",
|
||||||
|
"entities": list(common_verticals),
|
||||||
|
})
|
||||||
|
|
||||||
|
# 3. Keyword cluster overlap
|
||||||
|
for i, a1 in enumerate(agents_list):
|
||||||
|
for a2 in agents_list[i + 1:]:
|
||||||
|
common_kw = agent_data[a1]["keywords"] & agent_data[a2]["keywords"]
|
||||||
|
if common_kw:
|
||||||
|
confidence = min(88, 55 + len(common_kw) * 12)
|
||||||
|
signals.append({
|
||||||
|
"confidence": confidence,
|
||||||
|
"type": "keyword_cluster",
|
||||||
|
"agents": [a1, a2],
|
||||||
|
"signal": f"Keyword cluster overlap: {', '.join(list(common_kw)[:3])}",
|
||||||
|
"recommended_play": "Target these keywords in content and outbound simultaneously",
|
||||||
|
"entities": list(common_kw),
|
||||||
|
})
|
||||||
|
|
||||||
|
# Deduplicate and sort by confidence
|
||||||
|
signals.sort(key=lambda x: x["confidence"], reverse=True)
|
||||||
|
|
||||||
|
output = {
|
||||||
|
"date": datetime.now().strftime("%Y-%m-%d"),
|
||||||
|
"generated_at": datetime.now(timezone.utc).isoformat(),
|
||||||
|
"agents_analyzed": list(agent_data.keys()),
|
||||||
|
"files_scanned": sum(len(d["files"]) for d in agent_data.values()),
|
||||||
|
"signals": signals[:20], # top 20
|
||||||
|
}
|
||||||
|
|
||||||
|
return output
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description='Cross-Signal Detector — find overlapping signals across data sources'
|
||||||
|
)
|
||||||
|
parser.add_argument('--data-dir', default=os.environ.get('DATA_DIR', './data/agent-outputs'),
|
||||||
|
help='Directory containing agent output files')
|
||||||
|
parser.add_argument('--output', default=os.environ.get('OUTPUT_FILE', './data/cross-signals-latest.json'),
|
||||||
|
help='Output file path')
|
||||||
|
parser.add_argument('--hours', type=int, default=48,
|
||||||
|
help='How far back to look for files (default: 48)')
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
output = detect_signals(data_dir=args.data_dir, hours=args.hours)
|
||||||
|
|
||||||
|
os.makedirs(os.path.dirname(args.output) or '.', exist_ok=True)
|
||||||
|
with open(args.output, "w") as f:
|
||||||
|
json.dump(output, f, indent=2)
|
||||||
|
|
||||||
|
signals = output.get("signals", [])
|
||||||
|
print(f"Cross-signal detection complete: {len(signals)} signals found")
|
||||||
|
print(f"Output: {args.output}")
|
||||||
|
if signals:
|
||||||
|
print(f"Top signal (confidence {signals[0]['confidence']}): {signals[0]['signal'][:100]}")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
376
outbound-engine/scripts/instantly-audit.py
Normal file
376
outbound-engine/scripts/instantly-audit.py
Normal file
|
|
@ -0,0 +1,376 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
instantly-audit.py
|
||||||
|
Pulls campaign data, account inventory, and warmup scores from the Instantly v2 API.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python3 instantly-audit.py --api-key YOUR_KEY
|
||||||
|
python3 instantly-audit.py # uses INSTANTLY_API_KEY env var
|
||||||
|
python3 instantly-audit.py --api-key YOUR_KEY --output report.md
|
||||||
|
python3 instantly-audit.py --api-key YOUR_KEY --json # raw JSON output
|
||||||
|
|
||||||
|
Instantly v2 API docs: https://developer.instantly.ai/
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import time
|
||||||
|
from datetime import datetime
|
||||||
|
|
||||||
|
try:
|
||||||
|
import requests
|
||||||
|
except ImportError:
|
||||||
|
print("ERROR: 'requests' not installed. Run: pip install requests")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
BASE_URL = "https://api.instantly.ai/api/v2"
|
||||||
|
|
||||||
|
|
||||||
|
def get_headers(api_key: str) -> dict:
|
||||||
|
return {
|
||||||
|
"Authorization": f"Bearer {api_key}",
|
||||||
|
"Content-Type": "application/json",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def paginate(url: str, headers: dict, params: dict = None, limit: int = 100) -> list:
|
||||||
|
"""Handle Instantly v2 cursor-based pagination."""
|
||||||
|
results = []
|
||||||
|
params = params or {}
|
||||||
|
params["limit"] = limit
|
||||||
|
starting_after = None
|
||||||
|
|
||||||
|
while True:
|
||||||
|
if starting_after:
|
||||||
|
params["starting_after"] = starting_after
|
||||||
|
|
||||||
|
try:
|
||||||
|
resp = requests.get(url, headers=headers, params=params, timeout=30)
|
||||||
|
except requests.exceptions.RequestException as e:
|
||||||
|
print(f" ⚠️ Request failed: {e}")
|
||||||
|
break
|
||||||
|
|
||||||
|
if resp.status_code == 429:
|
||||||
|
retry_after = int(resp.headers.get("Retry-After", 5))
|
||||||
|
print(f" ⏳ Rate limited. Waiting {retry_after}s...")
|
||||||
|
time.sleep(retry_after)
|
||||||
|
continue
|
||||||
|
|
||||||
|
if resp.status_code == 401:
|
||||||
|
print(" 🔴 Authentication failed. Check your API key.")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
if not resp.ok:
|
||||||
|
print(f" ⚠️ API error {resp.status_code}: {resp.text[:200]}")
|
||||||
|
break
|
||||||
|
|
||||||
|
data = resp.json()
|
||||||
|
items = data.get("items", data if isinstance(data, list) else [])
|
||||||
|
results.extend(items)
|
||||||
|
|
||||||
|
next_cursor = data.get("next_starting_after") or data.get("next_cursor")
|
||||||
|
if not next_cursor or len(items) < limit:
|
||||||
|
break
|
||||||
|
starting_after = next_cursor
|
||||||
|
|
||||||
|
return results
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_campaigns(headers: dict) -> list:
|
||||||
|
"""Fetch all campaigns with analytics."""
|
||||||
|
print("📋 Fetching campaigns...")
|
||||||
|
campaigns = paginate(f"{BASE_URL}/campaigns", headers)
|
||||||
|
print(f" Found {len(campaigns)} campaigns")
|
||||||
|
return campaigns
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_campaign_analytics(headers: dict, campaign_ids: list) -> dict:
|
||||||
|
"""Fetch analytics summary for campaigns."""
|
||||||
|
if not campaign_ids:
|
||||||
|
return {}
|
||||||
|
|
||||||
|
print("📊 Fetching campaign analytics...")
|
||||||
|
analytics = {}
|
||||||
|
|
||||||
|
for i in range(0, len(campaign_ids), 10):
|
||||||
|
batch = campaign_ids[i:i+10]
|
||||||
|
try:
|
||||||
|
resp = requests.get(
|
||||||
|
f"{BASE_URL}/campaigns/analytics/overview",
|
||||||
|
headers=headers,
|
||||||
|
params={"campaign_id": batch},
|
||||||
|
timeout=30,
|
||||||
|
)
|
||||||
|
if resp.ok:
|
||||||
|
data = resp.json()
|
||||||
|
if isinstance(data, dict):
|
||||||
|
analytics.update(data)
|
||||||
|
elif isinstance(data, list):
|
||||||
|
for item in data:
|
||||||
|
cid = item.get("campaign_id") or item.get("id")
|
||||||
|
if cid:
|
||||||
|
analytics[cid] = item
|
||||||
|
except requests.exceptions.RequestException as e:
|
||||||
|
print(f" ⚠️ Analytics fetch failed for batch: {e}")
|
||||||
|
|
||||||
|
time.sleep(0.3)
|
||||||
|
|
||||||
|
return analytics
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_accounts(headers: dict) -> list:
|
||||||
|
"""Fetch all sending accounts with warmup status."""
|
||||||
|
print("📧 Fetching sending accounts...")
|
||||||
|
accounts = paginate(f"{BASE_URL}/accounts", headers)
|
||||||
|
print(f" Found {len(accounts)} accounts")
|
||||||
|
return accounts
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_warmup_scores(headers: dict, account_emails: list) -> dict:
|
||||||
|
"""Fetch warmup analytics for accounts."""
|
||||||
|
if not account_emails:
|
||||||
|
return {}
|
||||||
|
|
||||||
|
print("🔥 Fetching warmup scores...")
|
||||||
|
warmup_data = {}
|
||||||
|
|
||||||
|
for email in account_emails:
|
||||||
|
try:
|
||||||
|
resp = requests.get(
|
||||||
|
f"{BASE_URL}/accounts/{email}/warmup/analytics",
|
||||||
|
headers=headers,
|
||||||
|
timeout=30,
|
||||||
|
)
|
||||||
|
if resp.ok:
|
||||||
|
warmup_data[email] = resp.json()
|
||||||
|
elif resp.status_code == 404:
|
||||||
|
warmup_data[email] = {"score": None, "status": "no_warmup_data"}
|
||||||
|
time.sleep(0.1)
|
||||||
|
except requests.exceptions.RequestException:
|
||||||
|
warmup_data[email] = {"score": None, "status": "fetch_error"}
|
||||||
|
|
||||||
|
return warmup_data
|
||||||
|
|
||||||
|
|
||||||
|
def assess_warmup_readiness(account: dict, warmup: dict) -> tuple:
|
||||||
|
"""Return (ready: bool, issues: list) for an account."""
|
||||||
|
issues = []
|
||||||
|
|
||||||
|
score = (warmup.get("warmup_score") or warmup.get("score")
|
||||||
|
or account.get("stat_warmup_score") or account.get("warmup_score"))
|
||||||
|
if score is None:
|
||||||
|
issues.append("No warmup data available")
|
||||||
|
elif score < 80:
|
||||||
|
issues.append(f"Warmup score {score} < 80 (minimum required)")
|
||||||
|
|
||||||
|
warmup_start = account.get("warmup_start_date") or account.get("created_at")
|
||||||
|
if warmup_start:
|
||||||
|
try:
|
||||||
|
start_dt = datetime.fromisoformat(warmup_start.replace("Z", "+00:00"))
|
||||||
|
days_warmed = (datetime.now(start_dt.tzinfo) - start_dt).days
|
||||||
|
if days_warmed < 14:
|
||||||
|
issues.append(f"Only {days_warmed} days warmed (need 14+)")
|
||||||
|
except (ValueError, AttributeError):
|
||||||
|
pass
|
||||||
|
|
||||||
|
status = str(account.get("status", "")).lower()
|
||||||
|
if status in ("paused", "error", "suspended", "disabled"):
|
||||||
|
issues.append(f"Account status: {status}")
|
||||||
|
|
||||||
|
ready = len(issues) == 0
|
||||||
|
return ready, issues
|
||||||
|
|
||||||
|
|
||||||
|
def format_pct(value, total, decimals=1) -> str:
|
||||||
|
if not total:
|
||||||
|
return "N/A"
|
||||||
|
return f"{(value / total * 100):.{decimals}f}%"
|
||||||
|
|
||||||
|
|
||||||
|
def generate_report(campaigns: list, analytics: dict, accounts: list, warmup_scores: dict) -> str:
|
||||||
|
lines = []
|
||||||
|
now = datetime.now().strftime("%Y-%m-%d %H:%M")
|
||||||
|
|
||||||
|
lines.append(f"# Instantly Audit Report")
|
||||||
|
lines.append(f"Generated: {now}\n")
|
||||||
|
|
||||||
|
# ── Account Inventory ──
|
||||||
|
lines.append("## Sending Account Inventory\n")
|
||||||
|
|
||||||
|
ready_accounts = []
|
||||||
|
not_ready_accounts = []
|
||||||
|
|
||||||
|
for acct in accounts:
|
||||||
|
email = acct.get("email", "unknown")
|
||||||
|
warmup = warmup_scores.get(email, {})
|
||||||
|
ready, issues = assess_warmup_readiness(acct, warmup)
|
||||||
|
score = (warmup.get("warmup_score") or warmup.get("score")
|
||||||
|
or acct.get("stat_warmup_score") or acct.get("warmup_score") or "N/A")
|
||||||
|
daily_limit = acct.get("sending_limit") or acct.get("daily_limit", 30)
|
||||||
|
|
||||||
|
row = {
|
||||||
|
"email": email,
|
||||||
|
"status": acct.get("status", "unknown"),
|
||||||
|
"warmup_score": score,
|
||||||
|
"daily_limit": daily_limit,
|
||||||
|
"ready": ready,
|
||||||
|
"issues": issues,
|
||||||
|
}
|
||||||
|
|
||||||
|
if ready:
|
||||||
|
ready_accounts.append(row)
|
||||||
|
else:
|
||||||
|
not_ready_accounts.append(row)
|
||||||
|
|
||||||
|
total_accounts = len(accounts)
|
||||||
|
total_ready = len(ready_accounts)
|
||||||
|
|
||||||
|
lines.append(f"**Total accounts:** {total_accounts}")
|
||||||
|
lines.append(f"**Ready to send:** {total_ready} ✅")
|
||||||
|
lines.append(f"**Not ready:** {len(not_ready_accounts)} ⚠️\n")
|
||||||
|
|
||||||
|
if ready_accounts:
|
||||||
|
conservative_daily = total_ready * 30
|
||||||
|
aggressive_daily = total_ready * 50
|
||||||
|
conservative_monthly = conservative_daily * 22
|
||||||
|
aggressive_monthly = aggressive_daily * 22
|
||||||
|
lines.append("### Capacity Math (ready accounts only)")
|
||||||
|
lines.append(f"- Conservative (30/day/account): **{conservative_daily:,}/day → {conservative_monthly:,}/month**")
|
||||||
|
lines.append(f"- Aggressive (50/day/account): **{aggressive_daily:,}/day → {aggressive_monthly:,}/month**\n")
|
||||||
|
|
||||||
|
lines.append("### ✅ Ready Accounts")
|
||||||
|
if ready_accounts:
|
||||||
|
lines.append("| Account | Status | Warmup Score | Daily Limit |")
|
||||||
|
lines.append("|---------|--------|-------------|------------|")
|
||||||
|
for a in ready_accounts:
|
||||||
|
lines.append(f"| {a['email']} | {a['status']} | {a['warmup_score']} | {a['daily_limit']} |")
|
||||||
|
else:
|
||||||
|
lines.append("_None — no accounts meet warmup requirements_")
|
||||||
|
|
||||||
|
lines.append("\n### ⚠️ Not Ready Accounts")
|
||||||
|
if not_ready_accounts:
|
||||||
|
lines.append("| Account | Status | Warmup Score | Issues |")
|
||||||
|
lines.append("|---------|--------|-------------|--------|")
|
||||||
|
for a in not_ready_accounts:
|
||||||
|
issues_str = "; ".join(a["issues"]) if a["issues"] else "unknown"
|
||||||
|
lines.append(f"| {a['email']} | {a['status']} | {a['warmup_score']} | {issues_str} |")
|
||||||
|
else:
|
||||||
|
lines.append("_None — all accounts are ready_")
|
||||||
|
|
||||||
|
# ── Campaign Performance ──
|
||||||
|
lines.append("\n---\n## Campaign Performance\n")
|
||||||
|
lines.append(f"**Total campaigns:** {len(campaigns)}\n")
|
||||||
|
|
||||||
|
if not campaigns:
|
||||||
|
lines.append("_No campaigns found_")
|
||||||
|
else:
|
||||||
|
lines.append("| Campaign | Status | Sent | Open Rate | Reply Rate | Positive Reply Rate |")
|
||||||
|
lines.append("|----------|--------|------|-----------|-----------|-------------------|")
|
||||||
|
|
||||||
|
for c in campaigns:
|
||||||
|
cid = c.get("id", "")
|
||||||
|
name = c.get("name", "Unnamed")[:50]
|
||||||
|
status = c.get("status", "unknown")
|
||||||
|
|
||||||
|
a = analytics.get(cid, {})
|
||||||
|
sent = a.get("emails_sent", 0) or c.get("emails_sent", 0)
|
||||||
|
opened = a.get("emails_opened", 0)
|
||||||
|
replied = a.get("emails_replied", 0)
|
||||||
|
positive = a.get("positive_replies", 0)
|
||||||
|
|
||||||
|
open_rate = format_pct(opened, sent)
|
||||||
|
reply_rate = format_pct(replied, sent)
|
||||||
|
pos_rate = format_pct(positive, sent)
|
||||||
|
|
||||||
|
lines.append(f"| {name} | {status} | {sent:,} | {open_rate} | {reply_rate} | {pos_rate} |")
|
||||||
|
|
||||||
|
# ── Flags & Recommendations ──
|
||||||
|
lines.append("\n---\n## Flags & Recommendations\n")
|
||||||
|
|
||||||
|
flags = []
|
||||||
|
|
||||||
|
if total_ready == 0:
|
||||||
|
flags.append("🔴 **BLOCKER:** No accounts are ready to send. All fail warmup requirements. Do not launch campaigns.")
|
||||||
|
elif total_ready < 3:
|
||||||
|
flags.append(f"⚠️ Only {total_ready} account(s) ready. Low volume capacity. Consider warming more accounts.")
|
||||||
|
|
||||||
|
low_open = []
|
||||||
|
low_reply = []
|
||||||
|
for c in campaigns:
|
||||||
|
cid = c.get("id", "")
|
||||||
|
a = analytics.get(cid, {})
|
||||||
|
sent = a.get("emails_sent", 0)
|
||||||
|
if sent < 50:
|
||||||
|
continue
|
||||||
|
opened = a.get("emails_opened", 0)
|
||||||
|
replied = a.get("emails_replied", 0)
|
||||||
|
open_pct = (opened / sent * 100) if sent else 0
|
||||||
|
reply_pct = (replied / sent * 100) if sent else 0
|
||||||
|
if open_pct < 40:
|
||||||
|
low_open.append(c.get("name", cid))
|
||||||
|
if reply_pct < 3:
|
||||||
|
low_reply.append(c.get("name", cid))
|
||||||
|
|
||||||
|
if low_open:
|
||||||
|
flags.append(f"⚠️ Low open rate (<40%) campaigns (subject line issue): {', '.join(low_open[:5])}")
|
||||||
|
if low_reply:
|
||||||
|
flags.append(f"⚠️ Low reply rate (<3%) campaigns (copy/offer issue): {', '.join(low_reply[:5])}")
|
||||||
|
|
||||||
|
if not flags:
|
||||||
|
flags.append("✅ No critical flags detected.")
|
||||||
|
|
||||||
|
for f in flags:
|
||||||
|
lines.append(f"- {f}")
|
||||||
|
|
||||||
|
lines.append(f"\n---\n_Audit complete. {total_accounts} accounts, {len(campaigns)} campaigns analyzed._")
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(description="Instantly v2 API Audit Tool")
|
||||||
|
parser.add_argument("--api-key", help="Instantly API key (or set INSTANTLY_API_KEY env var)")
|
||||||
|
parser.add_argument("--output", help="Write report to this file (default: print to stdout)")
|
||||||
|
parser.add_argument("--json", action="store_true", help="Output raw JSON instead of markdown report")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
api_key = args.api_key or os.environ.get("INSTANTLY_API_KEY")
|
||||||
|
if not api_key:
|
||||||
|
api_key = input("Instantly API key: ").strip()
|
||||||
|
if not api_key:
|
||||||
|
print("ERROR: API key required. Set INSTANTLY_API_KEY env var or pass --api-key.")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
headers = get_headers(api_key)
|
||||||
|
|
||||||
|
print(f"\n🔍 Starting Instantly audit...\n")
|
||||||
|
|
||||||
|
campaigns = fetch_campaigns(headers)
|
||||||
|
campaign_ids = [c.get("id") for c in campaigns if c.get("id")]
|
||||||
|
analytics = fetch_campaign_analytics(headers, campaign_ids)
|
||||||
|
|
||||||
|
accounts = fetch_accounts(headers)
|
||||||
|
account_emails = [a.get("email") for a in accounts if a.get("email")]
|
||||||
|
warmup_scores = fetch_warmup_scores(headers, account_emails)
|
||||||
|
|
||||||
|
if args.json:
|
||||||
|
output = json.dumps({
|
||||||
|
"campaigns": campaigns,
|
||||||
|
"analytics": analytics,
|
||||||
|
"accounts": accounts,
|
||||||
|
"warmup_scores": warmup_scores,
|
||||||
|
}, indent=2, default=str)
|
||||||
|
else:
|
||||||
|
output = generate_report(campaigns, analytics, accounts, warmup_scores)
|
||||||
|
|
||||||
|
if args.output:
|
||||||
|
with open(args.output, "w") as f:
|
||||||
|
f.write(output)
|
||||||
|
print(f"\n✅ Report written to: {args.output}")
|
||||||
|
else:
|
||||||
|
print("\n" + output)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
607
outbound-engine/scripts/lead-pipeline.py
Normal file
607
outbound-engine/scripts/lead-pipeline.py
Normal file
|
|
@ -0,0 +1,607 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Lead Pipeline: Apollo → LeadMagic → Dedupe → Instantly
|
||||||
|
|
||||||
|
End-to-end lead sourcing, verification, deduplication, and upload pipeline.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python3 lead-pipeline.py \\
|
||||||
|
--titles "VP Marketing,CMO" --industries "SaaS" \\
|
||||||
|
--company-size "11,50" --locations "United States" \\
|
||||||
|
--campaign-id YOUR_CAMPAIGN_UUID --volume 500
|
||||||
|
|
||||||
|
# Dry run (no upload)
|
||||||
|
python3 lead-pipeline.py \\
|
||||||
|
--titles "CTO,VP Engineering" --company-size "51,200" \\
|
||||||
|
--campaign-id YOUR_CAMPAIGN_UUID --volume 100 --dry-run
|
||||||
|
|
||||||
|
API keys are read from environment variables:
|
||||||
|
APOLLO_API_KEY, LEADMAGIC_API_KEY, INSTANTLY_API_KEY
|
||||||
|
|
||||||
|
Or pass them via --apollo-key, --leadmagic-key, --instantly-key flags.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import time
|
||||||
|
from datetime import datetime
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
try:
|
||||||
|
import requests
|
||||||
|
except ImportError:
|
||||||
|
print("ERROR: 'requests' package required. Run: pip3 install requests", file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Retry / backoff helper
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def request_with_retry(method, url, max_retries=5, **kwargs):
|
||||||
|
"""HTTP request with exponential backoff on 429 / 5xx."""
|
||||||
|
backoff = 1
|
||||||
|
for attempt in range(max_retries + 1):
|
||||||
|
try:
|
||||||
|
resp = requests.request(method, url, timeout=30, **kwargs)
|
||||||
|
if resp.status_code == 429:
|
||||||
|
wait = int(resp.headers.get("Retry-After", backoff))
|
||||||
|
print(f" ⏳ Rate limited (429). Waiting {wait}s …")
|
||||||
|
time.sleep(wait)
|
||||||
|
backoff = min(backoff * 2, 60)
|
||||||
|
continue
|
||||||
|
if resp.status_code >= 500:
|
||||||
|
print(f" ⚠️ Server error {resp.status_code}. Retry in {backoff}s …")
|
||||||
|
time.sleep(backoff)
|
||||||
|
backoff = min(backoff * 2, 60)
|
||||||
|
continue
|
||||||
|
return resp
|
||||||
|
except requests.exceptions.RequestException as e:
|
||||||
|
if attempt == max_retries:
|
||||||
|
raise
|
||||||
|
print(f" ⚠️ Request error: {e}. Retry in {backoff}s …")
|
||||||
|
time.sleep(backoff)
|
||||||
|
backoff = min(backoff * 2, 60)
|
||||||
|
return resp # type: ignore
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Step 1: Apollo People Search
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def source_from_apollo(api_key, titles, industries, company_size, locations, keywords, volume):
|
||||||
|
"""Pull leads from Apollo People Search API."""
|
||||||
|
print(f"\n{'='*50}")
|
||||||
|
print(f"STEP 1: Sourcing from Apollo (target: {volume})")
|
||||||
|
print(f"{'='*50}")
|
||||||
|
|
||||||
|
url = "https://api.apollo.io/api/v1/mixed_people/search"
|
||||||
|
leads = []
|
||||||
|
page = 1
|
||||||
|
|
||||||
|
# Parse company size into Apollo format
|
||||||
|
size_ranges = []
|
||||||
|
if company_size:
|
||||||
|
parts = [s.strip() for s in company_size.split(",")]
|
||||||
|
if len(parts) == 2:
|
||||||
|
size_ranges = [f"{parts[0]},{parts[1]}"]
|
||||||
|
else:
|
||||||
|
size_ranges = parts
|
||||||
|
|
||||||
|
while len(leads) < volume:
|
||||||
|
body = {
|
||||||
|
"api_key": api_key,
|
||||||
|
"per_page": 100,
|
||||||
|
"page": page,
|
||||||
|
}
|
||||||
|
if titles:
|
||||||
|
body["person_titles"] = [t.strip() for t in titles.split(",")]
|
||||||
|
if industries:
|
||||||
|
body["q_organization_keyword_tags"] = [i.strip() for i in industries.split(",")]
|
||||||
|
if size_ranges:
|
||||||
|
body["organization_num_employees_ranges"] = size_ranges
|
||||||
|
if locations:
|
||||||
|
body["person_locations"] = [l.strip() for l in locations.split(",")]
|
||||||
|
if keywords:
|
||||||
|
body["q_keywords"] = keywords
|
||||||
|
|
||||||
|
print(f" 📡 Apollo page {page} …", end=" ", flush=True)
|
||||||
|
resp = request_with_retry("POST", url, json=body)
|
||||||
|
|
||||||
|
if resp.status_code != 200:
|
||||||
|
print(f"ERROR {resp.status_code}: {resp.text[:200]}")
|
||||||
|
break
|
||||||
|
|
||||||
|
data = resp.json()
|
||||||
|
people = data.get("people", [])
|
||||||
|
if not people:
|
||||||
|
print("no more results.")
|
||||||
|
break
|
||||||
|
|
||||||
|
page_leads = 0
|
||||||
|
for person in people:
|
||||||
|
email = person.get("email")
|
||||||
|
if not email:
|
||||||
|
continue
|
||||||
|
leads.append({
|
||||||
|
"email": email.lower().strip(),
|
||||||
|
"first_name": person.get("first_name", ""),
|
||||||
|
"last_name": person.get("last_name", ""),
|
||||||
|
"title": person.get("title", ""),
|
||||||
|
"company_name": (person.get("organization") or {}).get("name", ""),
|
||||||
|
"domain": (person.get("organization") or {}).get("primary_domain", ""),
|
||||||
|
})
|
||||||
|
page_leads += 1
|
||||||
|
if len(leads) >= volume:
|
||||||
|
break
|
||||||
|
|
||||||
|
print(f"{page_leads} with email ({len(leads)} total)")
|
||||||
|
|
||||||
|
total_pages = data.get("pagination", {}).get("total_pages", page)
|
||||||
|
if page >= total_pages:
|
||||||
|
print(" Reached last Apollo page.")
|
||||||
|
break
|
||||||
|
page += 1
|
||||||
|
time.sleep(0.5)
|
||||||
|
|
||||||
|
# Dedupe by email within sourced set
|
||||||
|
seen = set()
|
||||||
|
unique_leads = []
|
||||||
|
for lead in leads:
|
||||||
|
if lead["email"] not in seen:
|
||||||
|
seen.add(lead["email"])
|
||||||
|
unique_leads.append(lead)
|
||||||
|
|
||||||
|
print(f"\n ✅ Sourced {len(unique_leads)} unique leads with emails")
|
||||||
|
return unique_leads
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Step 2: LeadMagic Email Verification
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def verify_with_leadmagic(api_key, leads):
|
||||||
|
"""Verify emails via LeadMagic. Returns only valid leads."""
|
||||||
|
print(f"\n{'='*50}")
|
||||||
|
print(f"STEP 2: Verifying {len(leads)} emails via LeadMagic")
|
||||||
|
print(f"{'='*50}")
|
||||||
|
|
||||||
|
url = "https://api.leadmagic.io/v1/people/email-validation"
|
||||||
|
headers = {
|
||||||
|
"X-API-Key": api_key,
|
||||||
|
"Content-Type": "application/json",
|
||||||
|
}
|
||||||
|
|
||||||
|
valid_leads = []
|
||||||
|
invalid_count = 0
|
||||||
|
unknown_count = 0
|
||||||
|
error_count = 0
|
||||||
|
rejection_reasons = {}
|
||||||
|
|
||||||
|
for i, lead in enumerate(leads):
|
||||||
|
if (i + 1) % 50 == 0 or i == 0:
|
||||||
|
print(f" 🔍 Verifying {i+1}/{len(leads)} …")
|
||||||
|
|
||||||
|
try:
|
||||||
|
resp = request_with_retry("POST", url, headers=headers, json={"email": lead["email"]})
|
||||||
|
|
||||||
|
if resp.status_code != 200:
|
||||||
|
error_count += 1
|
||||||
|
continue
|
||||||
|
|
||||||
|
data = resp.json()
|
||||||
|
status = data.get("email_status", "unknown")
|
||||||
|
|
||||||
|
if status == "valid":
|
||||||
|
lead["is_free_email"] = data.get("is_free_email", False)
|
||||||
|
lead["is_role_based"] = data.get("is_role_based", False)
|
||||||
|
valid_leads.append(lead)
|
||||||
|
elif status == "invalid":
|
||||||
|
invalid_count += 1
|
||||||
|
rejection_reasons["invalid"] = rejection_reasons.get("invalid", 0) + 1
|
||||||
|
else:
|
||||||
|
unknown_count += 1
|
||||||
|
rejection_reasons["unknown"] = rejection_reasons.get("unknown", 0) + 1
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
error_count += 1
|
||||||
|
print(f" ⚠️ Error verifying {lead['email']}: {e}")
|
||||||
|
|
||||||
|
if (i + 1) % 20 == 0:
|
||||||
|
time.sleep(0.5)
|
||||||
|
|
||||||
|
print(f"\n ✅ Verified: {len(valid_leads)} valid")
|
||||||
|
print(f" ❌ Invalid: {invalid_count}")
|
||||||
|
print(f" ❓ Unknown: {unknown_count}")
|
||||||
|
print(f" ⚠️ Errors: {error_count}")
|
||||||
|
if rejection_reasons:
|
||||||
|
print(f" 📊 Rejection breakdown: {rejection_reasons}")
|
||||||
|
|
||||||
|
return valid_leads, {
|
||||||
|
"total": len(leads),
|
||||||
|
"valid": len(valid_leads),
|
||||||
|
"invalid": invalid_count,
|
||||||
|
"unknown": unknown_count,
|
||||||
|
"errors": error_count,
|
||||||
|
"rejection_reasons": rejection_reasons,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Step 3: Deduplicate against Instantly + exclusion list
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def get_instantly_existing_emails(api_key):
|
||||||
|
"""Pull ALL existing leads from Instantly workspace for dedup."""
|
||||||
|
print(f"\n 📥 Fetching existing Instantly leads for dedup …")
|
||||||
|
|
||||||
|
url = "https://api.instantly.ai/api/v2/leads/list"
|
||||||
|
headers = {
|
||||||
|
"Authorization": f"Bearer {api_key}",
|
||||||
|
"Content-Type": "application/json",
|
||||||
|
}
|
||||||
|
|
||||||
|
existing_emails = set()
|
||||||
|
cursor = None
|
||||||
|
page = 0
|
||||||
|
|
||||||
|
while True:
|
||||||
|
body = {"limit": 100}
|
||||||
|
if cursor:
|
||||||
|
body["starting_after"] = cursor
|
||||||
|
|
||||||
|
resp = request_with_retry("POST", url, headers=headers, json=body)
|
||||||
|
|
||||||
|
if resp.status_code != 200:
|
||||||
|
print(f" ⚠️ Instantly list error {resp.status_code}: {resp.text[:200]}")
|
||||||
|
break
|
||||||
|
|
||||||
|
data = resp.json()
|
||||||
|
items = data.get("items", [])
|
||||||
|
|
||||||
|
if not items:
|
||||||
|
break
|
||||||
|
|
||||||
|
for item in items:
|
||||||
|
email = item.get("email", "").lower().strip()
|
||||||
|
if email:
|
||||||
|
existing_emails.add(email)
|
||||||
|
|
||||||
|
cursor = data.get("next_starting_after")
|
||||||
|
if not cursor:
|
||||||
|
break
|
||||||
|
|
||||||
|
page += 1
|
||||||
|
if page % 10 == 0:
|
||||||
|
print(f" … {len(existing_emails)} existing leads so far")
|
||||||
|
time.sleep(1)
|
||||||
|
|
||||||
|
print(f" 📊 Found {len(existing_emails)} existing leads in Instantly")
|
||||||
|
return existing_emails
|
||||||
|
|
||||||
|
|
||||||
|
def load_exclusion_list(filepath):
|
||||||
|
"""Load burned emails from a CSV file (one email per line or first column)."""
|
||||||
|
excluded = set()
|
||||||
|
if not filepath or not os.path.exists(filepath):
|
||||||
|
return excluded
|
||||||
|
|
||||||
|
with open(filepath, "r") as f:
|
||||||
|
for line in f:
|
||||||
|
line = line.strip()
|
||||||
|
if not line or line.startswith("#"):
|
||||||
|
continue
|
||||||
|
email = line.split(",")[0].strip().strip('"').lower()
|
||||||
|
if "@" in email:
|
||||||
|
excluded.add(email)
|
||||||
|
|
||||||
|
print(f" 📋 Loaded {len(excluded)} emails from exclusion list")
|
||||||
|
return excluded
|
||||||
|
|
||||||
|
|
||||||
|
def deduplicate(leads, api_key, exclude_file=None):
|
||||||
|
"""Remove leads already in Instantly or on exclusion list."""
|
||||||
|
print(f"\n{'='*50}")
|
||||||
|
print(f"STEP 3: Deduplicating {len(leads)} leads")
|
||||||
|
print(f"{'='*50}")
|
||||||
|
|
||||||
|
existing = get_instantly_existing_emails(api_key)
|
||||||
|
excluded = load_exclusion_list(exclude_file)
|
||||||
|
|
||||||
|
deduped = []
|
||||||
|
instantly_dupes = 0
|
||||||
|
burned_dupes = 0
|
||||||
|
|
||||||
|
for lead in leads:
|
||||||
|
email = lead["email"]
|
||||||
|
if email in existing:
|
||||||
|
instantly_dupes += 1
|
||||||
|
elif email in excluded:
|
||||||
|
burned_dupes += 1
|
||||||
|
else:
|
||||||
|
deduped.append(lead)
|
||||||
|
|
||||||
|
print(f"\n ✅ Net new leads: {len(deduped)}")
|
||||||
|
print(f" 🔄 Already in Instantly: {instantly_dupes}")
|
||||||
|
print(f" 🚫 On exclusion list: {burned_dupes}")
|
||||||
|
|
||||||
|
return deduped, {
|
||||||
|
"instantly_dupes": instantly_dupes,
|
||||||
|
"burned_dupes": burned_dupes,
|
||||||
|
"net_new": len(deduped),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Step 4: Upload to Instantly
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def generate_personalization(lead):
|
||||||
|
"""Generate a simple 1-line personalization based on available data."""
|
||||||
|
name = lead.get("first_name", "")
|
||||||
|
company = lead.get("company_name", "")
|
||||||
|
title = lead.get("title", "")
|
||||||
|
|
||||||
|
if company and title:
|
||||||
|
return f"Noticed you're {title} at {company} — curious how you're thinking about growth this quarter."
|
||||||
|
elif company:
|
||||||
|
return f"Been following {company}'s trajectory — impressive momentum."
|
||||||
|
elif title:
|
||||||
|
return f"As a {title}, you're probably juggling growth and efficiency right now."
|
||||||
|
return "Your background caught my eye — wanted to reach out."
|
||||||
|
|
||||||
|
|
||||||
|
def upload_to_instantly(api_key, leads, campaign_id, dry_run=False):
|
||||||
|
"""Upload leads to Instantly campaign in batches."""
|
||||||
|
print(f"\n{'='*50}")
|
||||||
|
print(f"STEP 4: Uploading {len(leads)} leads to Instantly")
|
||||||
|
print(f"{'='*50}")
|
||||||
|
|
||||||
|
if dry_run:
|
||||||
|
print(" 🏃 DRY RUN — skipping actual upload")
|
||||||
|
return {"uploaded": 0, "failed": 0, "dry_run": True}
|
||||||
|
|
||||||
|
url = "https://api.instantly.ai/api/v2/leads"
|
||||||
|
headers = {
|
||||||
|
"Authorization": f"Bearer {api_key}",
|
||||||
|
"Content-Type": "application/json",
|
||||||
|
}
|
||||||
|
|
||||||
|
uploaded = 0
|
||||||
|
failed = 0
|
||||||
|
batch_size = 25
|
||||||
|
|
||||||
|
for i in range(0, len(leads), batch_size):
|
||||||
|
batch = leads[i:i + batch_size]
|
||||||
|
batch_num = (i // batch_size) + 1
|
||||||
|
total_batches = (len(leads) + batch_size - 1) // batch_size
|
||||||
|
|
||||||
|
print(f" 📤 Batch {batch_num}/{total_batches} ({len(batch)} leads) …", end=" ", flush=True)
|
||||||
|
|
||||||
|
batch_success = 0
|
||||||
|
batch_fail = 0
|
||||||
|
|
||||||
|
for lead in batch:
|
||||||
|
body = {
|
||||||
|
"email": lead["email"],
|
||||||
|
"first_name": lead.get("first_name", ""),
|
||||||
|
"last_name": lead.get("last_name", ""),
|
||||||
|
"company_name": lead.get("company_name", ""),
|
||||||
|
"campaign": campaign_id,
|
||||||
|
"custom_variables": {
|
||||||
|
"title": lead.get("title", ""),
|
||||||
|
"company_name": lead.get("company_name", ""),
|
||||||
|
"personalization": generate_personalization(lead),
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
try:
|
||||||
|
resp = request_with_retry("POST", url, headers=headers, json=body)
|
||||||
|
if resp.status_code in (200, 201):
|
||||||
|
batch_success += 1
|
||||||
|
else:
|
||||||
|
batch_fail += 1
|
||||||
|
if batch_fail <= 3:
|
||||||
|
print(f"\n ⚠️ Failed {lead['email']}: {resp.status_code} {resp.text[:100]}")
|
||||||
|
except Exception as e:
|
||||||
|
batch_fail += 1
|
||||||
|
print(f"\n ⚠️ Error uploading {lead['email']}: {e}")
|
||||||
|
|
||||||
|
uploaded += batch_success
|
||||||
|
failed += batch_fail
|
||||||
|
print(f"✓ {batch_success} ok, {batch_fail} failed")
|
||||||
|
|
||||||
|
if i + batch_size < len(leads):
|
||||||
|
time.sleep(1)
|
||||||
|
|
||||||
|
print(f"\n ✅ Uploaded: {uploaded}")
|
||||||
|
if failed:
|
||||||
|
print(f" ❌ Failed: {failed}")
|
||||||
|
|
||||||
|
return {"uploaded": uploaded, "failed": failed, "dry_run": False}
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Reporting
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def save_report(output_dir, sourced, verified_stats, dedup_stats, upload_stats, leads_uploaded, args):
|
||||||
|
"""Save run log as JSON."""
|
||||||
|
timestamp = datetime.now().strftime("%Y-%m-%d-%H-%M")
|
||||||
|
report_path = os.path.join(output_dir, f"{timestamp}.json")
|
||||||
|
|
||||||
|
report = {
|
||||||
|
"timestamp": datetime.now().isoformat(),
|
||||||
|
"parameters": {
|
||||||
|
"titles": args.titles,
|
||||||
|
"industries": args.industries,
|
||||||
|
"company_size": args.company_size,
|
||||||
|
"locations": args.locations,
|
||||||
|
"keywords": args.keywords,
|
||||||
|
"campaign_id": args.campaign_id,
|
||||||
|
"volume": args.volume,
|
||||||
|
"exclude_file": args.exclude_file,
|
||||||
|
"dry_run": args.dry_run,
|
||||||
|
},
|
||||||
|
"results": {
|
||||||
|
"sourced_from_apollo": sourced,
|
||||||
|
"verification": verified_stats,
|
||||||
|
"deduplication": dedup_stats,
|
||||||
|
"upload": upload_stats,
|
||||||
|
},
|
||||||
|
"leads_uploaded": [
|
||||||
|
{k: v for k, v in lead.items() if k not in ("is_free_email", "is_role_based")}
|
||||||
|
for lead in leads_uploaded
|
||||||
|
],
|
||||||
|
}
|
||||||
|
|
||||||
|
os.makedirs(output_dir, exist_ok=True)
|
||||||
|
with open(report_path, "w") as f:
|
||||||
|
json.dump(report, f, indent=2, default=str)
|
||||||
|
|
||||||
|
print(f"\n 💾 Run log saved: {report_path}")
|
||||||
|
return report_path
|
||||||
|
|
||||||
|
|
||||||
|
def print_summary(sourced_count, verified_stats, dedup_stats, upload_stats):
|
||||||
|
"""Print final summary."""
|
||||||
|
print(f"\n{'='*50}")
|
||||||
|
print(f" LEAD PIPELINE SUMMARY")
|
||||||
|
print(f"{'='*50}")
|
||||||
|
print(f" Sourced from Apollo: {sourced_count:>6}")
|
||||||
|
print(f" Verified (LeadMagic): {verified_stats['valid']:>6} ({verified_stats['valid']/max(sourced_count,1)*100:.1f}%)")
|
||||||
|
print(f" Already in Instantly: {dedup_stats['instantly_dupes']:>6}")
|
||||||
|
print(f" Excluded (burned list): {dedup_stats['burned_dupes']:>6}")
|
||||||
|
print(f" Net new uploaded: {upload_stats['uploaded']:>6}")
|
||||||
|
if upload_stats.get('failed'):
|
||||||
|
print(f" Failed uploads: {upload_stats['failed']:>6}")
|
||||||
|
if upload_stats.get('dry_run'):
|
||||||
|
print(f" ⚠️ DRY RUN — nothing was uploaded")
|
||||||
|
print(f"{'='*50}\n")
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Main
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="Lead Pipeline: Apollo → LeadMagic → Dedupe → Instantly",
|
||||||
|
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||||
|
epilog="""
|
||||||
|
Examples:
|
||||||
|
# Full pipeline run
|
||||||
|
python3 lead-pipeline.py \\
|
||||||
|
--titles "VP Marketing,CMO" --industries "SaaS" \\
|
||||||
|
--company-size "11,50" --locations "United States" \\
|
||||||
|
--campaign-id abc-123 --volume 200
|
||||||
|
|
||||||
|
# Dry run (no upload)
|
||||||
|
python3 lead-pipeline.py \\
|
||||||
|
--titles "CTO,VP Engineering" --company-size "51,200" \\
|
||||||
|
--campaign-id abc-123 --volume 100 --dry-run
|
||||||
|
""",
|
||||||
|
)
|
||||||
|
|
||||||
|
parser.add_argument("--apollo-key", default=os.environ.get("APOLLO_API_KEY"),
|
||||||
|
help="Apollo API key (or set APOLLO_API_KEY env var)")
|
||||||
|
parser.add_argument("--leadmagic-key", default=os.environ.get("LEADMAGIC_API_KEY"),
|
||||||
|
help="LeadMagic API key (or set LEADMAGIC_API_KEY env var)")
|
||||||
|
parser.add_argument("--instantly-key", default=os.environ.get("INSTANTLY_API_KEY"),
|
||||||
|
help="Instantly API key (or set INSTANTLY_API_KEY env var)")
|
||||||
|
parser.add_argument("--titles", required=True, help="Comma-separated job titles")
|
||||||
|
parser.add_argument("--industries", default="", help="Comma-separated industries/keywords")
|
||||||
|
parser.add_argument("--company-size", default="", help="Employee range, e.g. '11,50'")
|
||||||
|
parser.add_argument("--locations", default="", help="Comma-separated locations")
|
||||||
|
parser.add_argument("--keywords", default="", help="Additional search keywords")
|
||||||
|
parser.add_argument("--campaign-id", required=True, help="Instantly campaign UUID")
|
||||||
|
parser.add_argument("--volume", type=int, default=500, help="Target number of leads (default: 500)")
|
||||||
|
parser.add_argument("--exclude-file", default=None, help="Path to CSV of burned/excluded emails")
|
||||||
|
parser.add_argument("--output-dir", default="./data/lead-pipeline-runs/",
|
||||||
|
help="Directory for run logs (default: ./data/lead-pipeline-runs/)")
|
||||||
|
parser.add_argument("--dry-run", action="store_true", help="Run pipeline but skip Instantly upload")
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
# Validate required keys
|
||||||
|
if not args.apollo_key:
|
||||||
|
print("ERROR: Apollo API key required. Set APOLLO_API_KEY env var or pass --apollo-key.")
|
||||||
|
sys.exit(1)
|
||||||
|
if not args.leadmagic_key:
|
||||||
|
print("ERROR: LeadMagic API key required. Set LEADMAGIC_API_KEY env var or pass --leadmagic-key.")
|
||||||
|
sys.exit(1)
|
||||||
|
if not args.instantly_key:
|
||||||
|
print("ERROR: Instantly API key required. Set INSTANTLY_API_KEY env var or pass --instantly-key.")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
start_time = time.time()
|
||||||
|
|
||||||
|
os.makedirs(args.output_dir, exist_ok=True)
|
||||||
|
|
||||||
|
print(f"\n🚀 Lead Pipeline Started — {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
|
||||||
|
print(f" Target: {args.volume} leads → Campaign {args.campaign_id}")
|
||||||
|
if args.dry_run:
|
||||||
|
print(f" ⚠️ DRY RUN MODE — will not upload to Instantly")
|
||||||
|
|
||||||
|
# Step 1: Source from Apollo
|
||||||
|
sourced_leads = source_from_apollo(
|
||||||
|
api_key=args.apollo_key,
|
||||||
|
titles=args.titles,
|
||||||
|
industries=args.industries,
|
||||||
|
company_size=args.company_size,
|
||||||
|
locations=args.locations,
|
||||||
|
keywords=args.keywords,
|
||||||
|
volume=args.volume,
|
||||||
|
)
|
||||||
|
|
||||||
|
if not sourced_leads:
|
||||||
|
print("\n❌ No leads sourced from Apollo. Exiting.")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
# Save intermediate state
|
||||||
|
intermediate_path = os.path.join(args.output_dir, "last-sourced.json")
|
||||||
|
with open(intermediate_path, "w") as f:
|
||||||
|
json.dump(sourced_leads, f, indent=2)
|
||||||
|
|
||||||
|
# Step 2: Verify via LeadMagic
|
||||||
|
verified_leads, verified_stats = verify_with_leadmagic(args.leadmagic_key, sourced_leads)
|
||||||
|
|
||||||
|
if not verified_leads:
|
||||||
|
print("\n❌ No leads passed verification. Exiting.")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
intermediate_path = os.path.join(args.output_dir, "last-verified.json")
|
||||||
|
with open(intermediate_path, "w") as f:
|
||||||
|
json.dump(verified_leads, f, indent=2)
|
||||||
|
|
||||||
|
# Step 3: Deduplicate
|
||||||
|
deduped_leads, dedup_stats = deduplicate(verified_leads, args.instantly_key, args.exclude_file)
|
||||||
|
|
||||||
|
if not deduped_leads:
|
||||||
|
print("\n⚠️ All leads already exist in Instantly. Nothing to upload.")
|
||||||
|
upload_stats = {"uploaded": 0, "failed": 0, "dry_run": args.dry_run}
|
||||||
|
else:
|
||||||
|
# Step 4: Upload to Instantly
|
||||||
|
upload_stats = upload_to_instantly(args.instantly_key, deduped_leads, args.campaign_id, args.dry_run)
|
||||||
|
|
||||||
|
# Step 5: Report
|
||||||
|
print_summary(len(sourced_leads), verified_stats, dedup_stats, upload_stats)
|
||||||
|
|
||||||
|
save_report(
|
||||||
|
args.output_dir,
|
||||||
|
sourced=len(sourced_leads),
|
||||||
|
verified_stats=verified_stats,
|
||||||
|
dedup_stats=dedup_stats,
|
||||||
|
upload_stats=upload_stats,
|
||||||
|
leads_uploaded=deduped_leads,
|
||||||
|
args=args,
|
||||||
|
)
|
||||||
|
|
||||||
|
elapsed = time.time() - start_time
|
||||||
|
print(f"⏱️ Completed in {elapsed/60:.1f} minutes")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
50
sales-pipeline/.env.example
Normal file
50
sales-pipeline/.env.example
Normal file
|
|
@ -0,0 +1,50 @@
|
||||||
|
# ─── Required API Keys ───────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
# HubSpot API key (for Deal Resurrector + Suppression Pipeline CRM check)
|
||||||
|
# Get one: HubSpot → Settings → Integrations → Private Apps
|
||||||
|
HUBSPOT_API_KEY=your-hubspot-private-app-token
|
||||||
|
|
||||||
|
# Instantly API key (for RB2B Router + Suppression Pipeline)
|
||||||
|
# Get one: Instantly.ai → Settings → API
|
||||||
|
INSTANTLY_API_KEY=your-instantly-api-key
|
||||||
|
|
||||||
|
# Brave Search API key (for Trigger Prospector)
|
||||||
|
# Get one: https://api.search.brave.com/
|
||||||
|
BRAVE_API_KEY=your-brave-search-api-key
|
||||||
|
|
||||||
|
# ─── Optional: Database (for ICP Learning Analyzer) ─────────────────────────
|
||||||
|
|
||||||
|
# PostgreSQL connection string for prospect tracking database
|
||||||
|
# Only needed if you're running the ICP Learning Analyzer
|
||||||
|
DATABASE_URL=postgresql://user:password@localhost:5432/prospects_db
|
||||||
|
|
||||||
|
# ─── Your Company Info (for email templates) ────────────────────────────────
|
||||||
|
|
||||||
|
YOUR_COMPANY_NAME=Your Company
|
||||||
|
YOUR_SENDER_NAME=Your Name
|
||||||
|
YOUR_SENDER_TITLE=CEO
|
||||||
|
YOUR_VALUE_PROP=We've built new capabilities since we last talked that I think you'd find interesting.
|
||||||
|
|
||||||
|
# ─── Pipeline Configuration ─────────────────────────────────────────────────
|
||||||
|
|
||||||
|
# Default source site for RB2B webhook ingest
|
||||||
|
DEFAULT_SOURCE_SITE=your-site.com
|
||||||
|
|
||||||
|
# Minimum intent score to process visitors (0-100, default: 50)
|
||||||
|
MIN_INTENT_SCORE=50
|
||||||
|
|
||||||
|
# Minimum company size for ICP match (employees, default: 50)
|
||||||
|
ICP_MIN_COMPANY_SIZE=50
|
||||||
|
|
||||||
|
# Company dedup window in days (default: 7)
|
||||||
|
COMPANY_DEDUP_WINDOW_DAYS=7
|
||||||
|
|
||||||
|
# HubSpot API rate limit delay in seconds (default: 1.5)
|
||||||
|
HUBSPOT_RATE_DELAY=1.5
|
||||||
|
|
||||||
|
# Campaign routing (override defaults)
|
||||||
|
CAMPAIGN_AGENCY=Agency-Default
|
||||||
|
CAMPAIGN_GENERAL=General-Default
|
||||||
|
|
||||||
|
# Base directory override (defaults to script directory)
|
||||||
|
# BASE_DIR=/path/to/your/sales-pipeline
|
||||||
349
sales-pipeline/README.md
Normal file
349
sales-pipeline/README.md
Normal file
|
|
@ -0,0 +1,349 @@
|
||||||
|
# 🎯 AI Sales Pipeline
|
||||||
|
|
||||||
|
> **Turn anonymous website visitors into qualified pipeline in under 60 seconds.**
|
||||||
|
|
||||||
|
A complete AI-powered sales pipeline automation suite: from website visitor identification through intent scoring, suppression, campaign routing, dead deal resurrection, trigger-based prospecting, and self-learning ICP optimization.
|
||||||
|
|
||||||
|
These tools were built in production at [Single Grain](https://www.singlegrain.com), processing thousands of visitors and deals weekly. Now open-sourced for any B2B company to use.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────┐
|
||||||
|
│ YOUR WEBSITE(S) │
|
||||||
|
└──────────────┬──────────────────────────┘
|
||||||
|
│ RB2B pixel fires
|
||||||
|
▼
|
||||||
|
┌───────────────────────────────────────┐
|
||||||
|
│ rb2b_webhook_ingest.py │
|
||||||
|
│ Intent Scoring + ICP Classification │
|
||||||
|
│ (pricing=90, blog=30, services=65) │
|
||||||
|
└──────────────┬────────────────────────┘
|
||||||
|
│ High-intent visitors
|
||||||
|
▼
|
||||||
|
┌────────────────────────────────────────────────┐
|
||||||
|
│ rb2b_suppression_pipeline.py │
|
||||||
|
│ 5-Layer Check: │
|
||||||
|
│ CRM → Outbound → Stripe → Analytics → Block │
|
||||||
|
│ + Company-level dedup (1 per domain/week) │
|
||||||
|
└──────────────┬─────────────────────────────────┘
|
||||||
|
│ Clean leads only
|
||||||
|
▼
|
||||||
|
┌────────────────────────────────────────────────┐
|
||||||
|
│ rb2b_instantly_router.py │
|
||||||
|
│ Agency Detection + Source Site Routing │
|
||||||
|
│ → Routes to correct Instantly campaign │
|
||||||
|
│ → Auto-activates paused campaigns │
|
||||||
|
└────────────────────────────────────────────────┘
|
||||||
|
|
||||||
|
┌────────────────────┐ ┌────────────────────┐ ┌─────────────────────┐
|
||||||
|
│ deal_resurrector │ │ trigger_prospector │ │ icp_learning_ │
|
||||||
|
│ .py │ │ .py │ │ analyzer.py │
|
||||||
|
│ │ │ │ │ │
|
||||||
|
│ 3 intelligence │ │ Monitors: │ │ Reads approve/ │
|
||||||
|
│ layers on dead │ │ • New CMO hires │ │ reject decisions │
|
||||||
|
│ deals: │ │ • Job postings │ │ │
|
||||||
|
│ 1. Time decay │ │ • Funding rounds │ │ Outputs: │
|
||||||
|
│ scoring │ │ • Agency searches │ │ • Industry targets │
|
||||||
|
│ 2. POC expansion │ │ │ │ • Size sweet spots │
|
||||||
|
│ 3. Follow the │ │ Scores, enriches, │ │ • Title patterns │
|
||||||
|
│ champion │ │ drafts outreach │ │ • Revenue ranges │
|
||||||
|
└────────────────────┘ └─────────────────────┘ └─────────────────────┘
|
||||||
|
│ │ │
|
||||||
|
└────────────────────────┼─────────────────────────┘
|
||||||
|
▼
|
||||||
|
┌───────────────────────┐
|
||||||
|
│ Your CRM / Outbound │
|
||||||
|
│ (HubSpot, Instantly) │
|
||||||
|
└───────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Tools
|
||||||
|
|
||||||
|
### 1. 🌐 RB2B Webhook Ingest (`rb2b_webhook_ingest.py`)
|
||||||
|
|
||||||
|
Receives RB2B visitor identification webhooks, scores intent based on pages visited, and classifies ICP fit.
|
||||||
|
|
||||||
|
**What it does:**
|
||||||
|
- Scores every page visit against configurable intent patterns (pricing page = 90, blog = 30)
|
||||||
|
- Checks ICP fit by title seniority + company size
|
||||||
|
- Outputs structured signals with priority levels (high/medium/low)
|
||||||
|
- Runs as HTTP server or processes stdin/batch files
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run as webhook server
|
||||||
|
python3 rb2b_webhook_ingest.py --serve --port 4100
|
||||||
|
|
||||||
|
# Test with sample data
|
||||||
|
echo '{"email":"cmo@acme.com","job_title":"CMO","company":"Acme Inc","company_size":500,"pages_visited":["https://yoursite.com/pricing"]}' | python3 rb2b_webhook_ingest.py --dry-run
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. 🛡️ Suppression Pipeline (`rb2b_suppression_pipeline.py`)
|
||||||
|
|
||||||
|
5-layer suppression that prevents embarrassing outreach to existing customers, active leads, or competitors.
|
||||||
|
|
||||||
|
**Layers:**
|
||||||
|
1. **Personal Email Filter** — Skip gmail.com, yahoo.com, etc.
|
||||||
|
2. **CRM Check** — Already in HubSpot? Don't cold email them.
|
||||||
|
3. **Outbound Platform** — Already in an Instantly campaign (last 90 days)?
|
||||||
|
4. **Payment Provider** — Paying Stripe customer? Definitely don't cold email.
|
||||||
|
5. **Blocklist** — Competitor domains + manual blocks
|
||||||
|
6. **Company Dedup** — Only 1 contact per company domain per 7-day window
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check a single email
|
||||||
|
python3 rb2b_suppression_pipeline.py --email john@acme.com --company "Acme Inc"
|
||||||
|
|
||||||
|
# Output:
|
||||||
|
# 📋 Suppression check for: john@acme.com
|
||||||
|
# ──────────────────────────────────────────────────
|
||||||
|
# ✅ Personal Email Filter: business email
|
||||||
|
# ✅ CRM Check: not in CRM
|
||||||
|
# ✅ Outbound Platform: not in outbound platform
|
||||||
|
# ✅ Payment Provider: not a paying customer
|
||||||
|
# ✅ Blocklist: not blocklisted
|
||||||
|
# ✅ Company Dedup: no company dedup conflict
|
||||||
|
# ──────────────────────────────────────────────────
|
||||||
|
# ✅ CLEAR — eligible for enrollment
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. 🔀 Instantly Router (`rb2b_instantly_router.py`)
|
||||||
|
|
||||||
|
The orchestrator: combines intent scoring + suppression + agency classification to route leads to the right Instantly campaign automatically.
|
||||||
|
|
||||||
|
**What it does:**
|
||||||
|
- Scores visitor intent
|
||||||
|
- Runs full suppression pipeline
|
||||||
|
- Classifies agency vs. non-agency visitors (2+ signal threshold)
|
||||||
|
- Detects source site (if you have multiple properties)
|
||||||
|
- Routes to the correct campaign and auto-enrolls via Instantly API
|
||||||
|
- Auto-activates paused campaigns when leads are ready
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run as webhook server (production mode)
|
||||||
|
python3 rb2b_instantly_router.py --serve --port 4100
|
||||||
|
|
||||||
|
# Dry run test
|
||||||
|
echo '{"email":"vp@techco.com","job_title":"VP Marketing","company":"TechCo","industry":"SaaS","company_size":"200","pages_visited":["https://yoursite.com/pricing","https://yoursite.com/case-studies"]}' | python3 rb2b_instantly_router.py --dry-run
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. 🔥 Deal Resurrector (`deal_resurrector.py`)
|
||||||
|
|
||||||
|
Three intelligence layers on your closed-lost deals. Finds the best revival opportunities using a composite scoring formula.
|
||||||
|
|
||||||
|
**Layer 1 — Time Decay Scoring (0-100):**
|
||||||
|
- Time component (35 pts): 60-90 days = sweet spot, decays over time
|
||||||
|
- Value component (30 pts): Normalized deal value
|
||||||
|
- Reason component (20 pts): "Timing" deals score higher than "bad fit"
|
||||||
|
- Trigger component (15 pts): Bonus if recent email opens or site visits
|
||||||
|
|
||||||
|
**Layer 2 — POC Expansion:**
|
||||||
|
- Verifies if your contact is still at the company
|
||||||
|
- Finds replacement decision-makers when contacts leave
|
||||||
|
|
||||||
|
**Layer 3 — Follow the Champion:**
|
||||||
|
- Tracks departed contacts to their new companies
|
||||||
|
- If they moved to an ICP-fit company, generates outreach for the new org
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Find top 10 revival opportunities (dry run)
|
||||||
|
python3 deal_resurrector.py --top 10 --dry-run
|
||||||
|
|
||||||
|
# Full run with champion tracking
|
||||||
|
python3 deal_resurrector.py --top 5 --include-champion
|
||||||
|
|
||||||
|
# Exclude a company from future runs
|
||||||
|
python3 deal_resurrector.py --add-exclusion "Already Won Corp"
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. 🔍 Trigger Prospector (`trigger_prospector.py`)
|
||||||
|
|
||||||
|
Scans the web for buying signals: new marketing leadership hires, job postings, funding rounds, and active agency searches.
|
||||||
|
|
||||||
|
**Signal Categories:**
|
||||||
|
| Signal | What It Means | Score Weight |
|
||||||
|
|--------|--------------|-------------|
|
||||||
|
| New CMO/VP hire | Budget reallocation window | 35 pts |
|
||||||
|
| Job posting | Growth mode, team building | 25 pts |
|
||||||
|
| Funding round | Capital to deploy | 30 pts |
|
||||||
|
| Agency search | Active evaluation | 40 pts |
|
||||||
|
|
||||||
|
Each prospect gets a composite score (0-100) plus enrichment: estimated company size, industry, suggested services, outreach channel recommendation, and a ready-to-send email draft.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Scan last 7 days for signals
|
||||||
|
python3 trigger_prospector.py --days 7 --top 15
|
||||||
|
|
||||||
|
# Wider scan with lower threshold
|
||||||
|
python3 trigger_prospector.py --days 30 --top 25 --min-score 40
|
||||||
|
```
|
||||||
|
|
||||||
|
### 6. 📊 ICP Learning Analyzer (`icp_learning_analyzer.py`)
|
||||||
|
|
||||||
|
Your ICP should evolve from data, not guesswork. This tool reads your prospect approve/reject history and outputs recommended filter changes.
|
||||||
|
|
||||||
|
**What it analyzes:**
|
||||||
|
- Industry patterns (which convert vs. get rejected)
|
||||||
|
- Company size sweet spots (10th-90th percentile of approvals)
|
||||||
|
- Title/seniority patterns
|
||||||
|
- Revenue ranges
|
||||||
|
- Per-source approval rates (cold vs. trigger vs. warm vs. revival)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run analysis
|
||||||
|
python3 icp_learning_analyzer.py
|
||||||
|
|
||||||
|
# With custom config
|
||||||
|
python3 icp_learning_analyzer.py --config data/icp-config.json
|
||||||
|
|
||||||
|
# Example output:
|
||||||
|
# 📊 ICP Learning Analyzer Results
|
||||||
|
# Total prospects analyzed: 847
|
||||||
|
# ────────────────────────────────────────
|
||||||
|
# cold : ready (n=312, approval=23%)
|
||||||
|
# → Target: SaaS, Fintech, E-commerce
|
||||||
|
# → Exclude: Crypto/Web3
|
||||||
|
# → Employees: 50-500
|
||||||
|
# trigger : ready (n=156, approval=41%)
|
||||||
|
# → Target: SaaS, Healthcare
|
||||||
|
# → Employees: 100-1000
|
||||||
|
# warm : ready (n=289, approval=67%)
|
||||||
|
# revival : insufficient_data (n=12, min_required=30)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### 1. Clone and install
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone https://github.com/nichochar/ai-marketing-skills.git
|
||||||
|
cd ai-marketing-skills/sales-pipeline
|
||||||
|
pip install -r requirements.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Configure environment
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cp .env.example .env
|
||||||
|
# Edit .env with your API keys
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Set up campaign config (for RB2B Router)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cp data/campaigns.json.example data/campaigns.json
|
||||||
|
# Add your Instantly campaign UUIDs
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Test with dry runs
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Test suppression pipeline
|
||||||
|
python3 rb2b_suppression_pipeline.py --email test@example.com
|
||||||
|
|
||||||
|
# Test intent scoring
|
||||||
|
echo '{"email":"test@example.com","pages_visited":["https://yoursite.com/pricing"]}' \
|
||||||
|
| python3 rb2b_webhook_ingest.py --dry-run
|
||||||
|
|
||||||
|
# Test deal resurrector
|
||||||
|
python3 deal_resurrector.py --top 5 --dry-run
|
||||||
|
|
||||||
|
# Test trigger prospector
|
||||||
|
python3 trigger_prospector.py --days 7 --top 10
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. Deploy webhook server
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Start the full pipeline as a webhook endpoint
|
||||||
|
python3 rb2b_instantly_router.py --serve --port 4100
|
||||||
|
|
||||||
|
# Point your RB2B webhook (or Zapier/Make) at:
|
||||||
|
# POST http://your-server:4100/
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Customization
|
||||||
|
|
||||||
|
### Intent Scoring
|
||||||
|
Edit `PAGE_INTENT_SCORES` in `rb2b_webhook_ingest.py` to match your site's URL structure:
|
||||||
|
|
||||||
|
```python
|
||||||
|
PAGE_INTENT_SCORES = {
|
||||||
|
"pricing": 90, # Your pricing page path
|
||||||
|
"demo": 85, # Demo request page
|
||||||
|
"case-study": 70, # Social proof pages
|
||||||
|
"blog": 30, # Low-intent content
|
||||||
|
# Add your own patterns...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Agency Detection
|
||||||
|
Modify `AGENCY_KEYWORDS_COMPANY` and `AGENCY_INDUSTRIES` in `rb2b_instantly_router.py` for your market.
|
||||||
|
|
||||||
|
### Loss Reason Scoring
|
||||||
|
Customize `LOSS_REASON_BONUS` in `deal_resurrector.py` based on which loss reasons actually convert when revisited.
|
||||||
|
|
||||||
|
### Trigger Queries
|
||||||
|
Edit `SEARCH_QUERIES` in `trigger_prospector.py` to target your specific market signals.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Integrations
|
||||||
|
|
||||||
|
| Tool | Required | Used By |
|
||||||
|
|------|----------|---------|
|
||||||
|
| [RB2B](https://rb2b.com) | For visitor ID | Webhook Ingest, Router |
|
||||||
|
| [Instantly](https://instantly.ai) | For cold email | Router, Suppression |
|
||||||
|
| [HubSpot](https://hubspot.com) | For CRM | Deal Resurrector, Suppression |
|
||||||
|
| [Brave Search](https://api.search.brave.com) | For web signals | Trigger Prospector |
|
||||||
|
| PostgreSQL | For ICP learning | ICP Analyzer |
|
||||||
|
| Stripe | Optional | Suppression (customer check) |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## File Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
sales-pipeline/
|
||||||
|
├── README.md # This file
|
||||||
|
├── SKILL.md # Claude Code skill definition
|
||||||
|
├── requirements.txt # Python dependencies
|
||||||
|
├── .env.example # Environment variable template
|
||||||
|
├── rb2b_webhook_ingest.py # Webhook server + intent scoring
|
||||||
|
├── rb2b_suppression_pipeline.py # 5-layer suppression checks
|
||||||
|
├── rb2b_instantly_router.py # Full pipeline orchestrator
|
||||||
|
├── deal_resurrector.py # Dead deal revival engine
|
||||||
|
├── trigger_prospector.py # Web signal prospecting
|
||||||
|
├── icp_learning_analyzer.py # Self-learning ICP optimization
|
||||||
|
└── data/
|
||||||
|
├── campaigns.json.example # Instantly campaign config template
|
||||||
|
└── icp-config.example.json # ICP analyzer config template
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## How It Works Together
|
||||||
|
|
||||||
|
1. **RB2B identifies** anonymous website visitors with name, email, company, title
|
||||||
|
2. **Webhook Ingest** scores their intent based on which pages they viewed
|
||||||
|
3. **Suppression Pipeline** checks 5 layers to avoid emailing existing contacts
|
||||||
|
4. **Router** classifies agency vs. non-agency, picks the right campaign, enrolls
|
||||||
|
5. **Meanwhile**, Deal Resurrector mines your CRM for revival opportunities
|
||||||
|
6. **Trigger Prospector** scans the web for companies showing buying signals
|
||||||
|
7. **ICP Analyzer** learns from your approve/reject decisions and tightens targeting
|
||||||
|
|
||||||
|
The result: a self-improving pipeline that gets better the more you use it.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
Built by <a href="https://www.singlegrain.com">Single Grain</a> · Open-sourced as part of <a href="https://github.com/nichochar/ai-marketing-skills">AI Marketing Skills</a>
|
||||||
|
</p>
|
||||||
66
sales-pipeline/SKILL.md
Normal file
66
sales-pipeline/SKILL.md
Normal file
|
|
@ -0,0 +1,66 @@
|
||||||
|
# AI Sales Pipeline
|
||||||
|
|
||||||
|
Complete AI-powered sales pipeline automation: website visitor identification → intent scoring → suppression → campaign routing → dead deal resurrection → trigger prospecting → self-learning ICP optimization.
|
||||||
|
|
||||||
|
## When to Use
|
||||||
|
|
||||||
|
Use this skill when:
|
||||||
|
- Setting up automated outbound from website visitor identification (RB2B)
|
||||||
|
- Running suppression checks before cold outreach
|
||||||
|
- Routing leads to the right cold email campaigns
|
||||||
|
- Reviving closed-lost deals from HubSpot
|
||||||
|
- Finding companies showing buying signals (new hires, funding, job postings)
|
||||||
|
- Analyzing prospect approve/reject patterns to improve ICP targeting
|
||||||
|
|
||||||
|
## Tools
|
||||||
|
|
||||||
|
### RB2B Pipeline (visitor → outbound)
|
||||||
|
|
||||||
|
| Script | Purpose | Key Command |
|
||||||
|
|--------|---------|-------------|
|
||||||
|
| `rb2b_webhook_ingest.py` | Webhook server + intent scoring | `python3 rb2b_webhook_ingest.py --serve --port 4100` |
|
||||||
|
| `rb2b_suppression_pipeline.py` | 5-layer suppression checks | `python3 rb2b_suppression_pipeline.py --email user@co.com` |
|
||||||
|
| `rb2b_instantly_router.py` | Full pipeline: score → suppress → route → enroll | `python3 rb2b_instantly_router.py --serve --port 4100` |
|
||||||
|
|
||||||
|
### Deal Intelligence
|
||||||
|
|
||||||
|
| Script | Purpose | Key Command |
|
||||||
|
|--------|---------|-------------|
|
||||||
|
| `deal_resurrector.py` | 3-layer dead deal revival (time decay + POC expansion + champion tracking) | `python3 deal_resurrector.py --top 10 --dry-run` |
|
||||||
|
| `trigger_prospector.py` | Web signal monitoring (new hires, funding, agency searches) | `python3 trigger_prospector.py --days 7 --top 15` |
|
||||||
|
| `icp_learning_analyzer.py` | Learn from approve/reject decisions, recommend ICP changes | `python3 icp_learning_analyzer.py` |
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
All scripts use environment variables for API keys and configuration. Copy `.env.example` to `.env` and fill in your values.
|
||||||
|
|
||||||
|
### Required Environment Variables
|
||||||
|
|
||||||
|
- `HUBSPOT_API_KEY` — HubSpot private app token (Deal Resurrector, Suppression)
|
||||||
|
- `INSTANTLY_API_KEY` — Instantly API key (Router, Suppression)
|
||||||
|
- `BRAVE_API_KEY` — Brave Search API key (Trigger Prospector)
|
||||||
|
- `DATABASE_URL` — PostgreSQL connection string (ICP Analyzer only)
|
||||||
|
|
||||||
|
### Key Customization Points
|
||||||
|
|
||||||
|
- **Intent scoring**: Edit `PAGE_INTENT_SCORES` dict in webhook_ingest to match your URL patterns
|
||||||
|
- **Agency detection**: Edit `AGENCY_KEYWORDS_*` in router for your market
|
||||||
|
- **Loss reason scoring**: Edit `LOSS_REASON_BONUS` in deal_resurrector for your close reasons
|
||||||
|
- **Signal queries**: Edit `SEARCH_QUERIES` in trigger_prospector for your target market
|
||||||
|
- **Campaign routing**: Edit `data/campaigns.json` with your Instantly campaign UUIDs
|
||||||
|
|
||||||
|
## Data Flow
|
||||||
|
|
||||||
|
```
|
||||||
|
RB2B Webhook → Ingest (score) → Suppress (5 layers) → Route (classify) → Instantly
|
||||||
|
HubSpot CRM → Deal Resurrector (score + draft emails) → Review Queue
|
||||||
|
Brave Search → Trigger Prospector (score + enrich) → Outreach Queue
|
||||||
|
Prospect DB → ICP Analyzer (learn patterns) → Filter Recommendations
|
||||||
|
```
|
||||||
|
|
||||||
|
## Dependencies
|
||||||
|
|
||||||
|
- Python 3.9+
|
||||||
|
- `requests` (for HubSpot API)
|
||||||
|
- `psycopg2-binary` (for ICP Analyzer only)
|
||||||
|
- No other external dependencies — scripts use stdlib HTTP server and urllib
|
||||||
6
sales-pipeline/data/campaigns.json.example
Normal file
6
sales-pipeline/data/campaigns.json.example
Normal file
|
|
@ -0,0 +1,6 @@
|
||||||
|
{
|
||||||
|
"campaigns": {
|
||||||
|
"Agency-Default": "your-instantly-campaign-uuid-here",
|
||||||
|
"General-Default": "your-instantly-campaign-uuid-here"
|
||||||
|
}
|
||||||
|
}
|
||||||
11
sales-pipeline/data/icp-config.example.json
Normal file
11
sales-pipeline/data/icp-config.example.json
Normal file
|
|
@ -0,0 +1,11 @@
|
||||||
|
{
|
||||||
|
"source_type_mapping": {
|
||||||
|
"cold_outbound": "cold",
|
||||||
|
"trigger_prospector": "trigger",
|
||||||
|
"website_visitor": "warm",
|
||||||
|
"deal_revival": "revival",
|
||||||
|
"referral": "warm",
|
||||||
|
"inbound": "warm"
|
||||||
|
},
|
||||||
|
"min_sample_size": 30
|
||||||
|
}
|
||||||
668
sales-pipeline/deal_resurrector.py
Normal file
668
sales-pipeline/deal_resurrector.py
Normal file
|
|
@ -0,0 +1,668 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Deal Resurrector v2 — Three intelligence layers on dead deals:
|
||||||
|
Layer 1: Time Decay Scoring (composite score with configurable decay windows)
|
||||||
|
Layer 2: POC Expansion (verify contacts, find replacements)
|
||||||
|
Layer 3: Follow the Champion (track departed POCs to new companies)
|
||||||
|
|
||||||
|
Pulls closed-lost deals from HubSpot, scores them using a composite formula
|
||||||
|
(time decay + deal value + loss reason + engagement triggers), then generates
|
||||||
|
personalized revival emails per loss reason category.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python3 deal_resurrector.py --top 10 --dry-run
|
||||||
|
python3 deal_resurrector.py --top 5 --include-champion
|
||||||
|
python3 deal_resurrector.py --add-exclusion "Acme Corp"
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import random
|
||||||
|
import re
|
||||||
|
import subprocess
|
||||||
|
import sys
|
||||||
|
import time
|
||||||
|
from datetime import datetime, timedelta, timezone
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import requests
|
||||||
|
|
||||||
|
# ─── Configuration ───────────────────────────────────────────────────────────
|
||||||
|
BASE_DIR = Path(os.environ.get("BASE_DIR", Path(__file__).resolve().parent))
|
||||||
|
DATA_DIR = BASE_DIR / "data"
|
||||||
|
EXCLUSIONS_FILE = DATA_DIR / "resurrector-exclusions.json"
|
||||||
|
OUTPUT_FILE = DATA_DIR / "deal-resurrector-latest.json"
|
||||||
|
|
||||||
|
# HubSpot API
|
||||||
|
HUBSPOT_BASE_URL = "https://api.hubapi.com"
|
||||||
|
HUBSPOT_TOKEN = os.environ.get("HUBSPOT_API_KEY", "")
|
||||||
|
|
||||||
|
# ─── Closed-Lost Stage IDs ──────────────────────────────────────────────────
|
||||||
|
# Map your HubSpot closed-lost stage IDs to pipeline names.
|
||||||
|
# Find these in HubSpot → Settings → Objects → Deals → Pipelines
|
||||||
|
CLOSED_LOST_STAGES = {
|
||||||
|
# "stage_id_here": "Pipeline Name",
|
||||||
|
# Example:
|
||||||
|
# "1079884213": "Enterprise Pipeline",
|
||||||
|
# "960522377": "ABM Pipeline",
|
||||||
|
}
|
||||||
|
|
||||||
|
# ─── HubSpot Properties to Fetch ────────────────────────────────────────────
|
||||||
|
DEAL_PROPERTIES = [
|
||||||
|
"dealname", "amount", "closedate", "dealstage",
|
||||||
|
"closed_lost_reason", "hs_closed_amount", "pipeline",
|
||||||
|
"hubspot_owner_id", "notes_last_updated",
|
||||||
|
]
|
||||||
|
CONTACT_PROPERTIES = [
|
||||||
|
"firstname", "lastname", "email", "jobtitle", "company",
|
||||||
|
"hs_last_sales_activity_date", "notes_last_updated",
|
||||||
|
"hs_email_last_open_date", "hs_email_last_click_date",
|
||||||
|
"hs_analytics_last_visit_timestamp", "hs_analytics_num_page_views",
|
||||||
|
"num_associated_deals", "recent_conversion_event_name",
|
||||||
|
]
|
||||||
|
COMPANY_PROPERTIES = [
|
||||||
|
"name", "domain", "industry", "numberofemployees",
|
||||||
|
"annualrevenue", "hs_last_sales_activity_date",
|
||||||
|
"notes_last_updated", "num_associated_deals",
|
||||||
|
"hs_analytics_last_visit_timestamp",
|
||||||
|
]
|
||||||
|
|
||||||
|
# ─── Time Decay Windows ─────────────────────────────────────────────────────
|
||||||
|
# (min_days, max_days, weight)
|
||||||
|
# Deals in the 60-90 day window get full weight; older deals decay.
|
||||||
|
DECAY_WINDOWS = [
|
||||||
|
(60, 90, 1.0), # Sweet spot — enough time has passed, still fresh
|
||||||
|
(91, 180, 0.8), # Good window
|
||||||
|
(181, 365, 0.6), # Getting stale but still viable
|
||||||
|
(366, 540, 0.4), # Long shot unless trigger present
|
||||||
|
(541, 99999, 0.2), # Only if engagement trigger detected
|
||||||
|
]
|
||||||
|
|
||||||
|
# ─── Loss Reason → Bonus Multiplier ─────────────────────────────────────────
|
||||||
|
# Deals lost to "timing" are more likely to convert than "bad fit".
|
||||||
|
LOSS_REASON_BONUS = {
|
||||||
|
"timing": 1.3,
|
||||||
|
"not ready": 1.25,
|
||||||
|
"budget": 1.15,
|
||||||
|
"price": 1.1,
|
||||||
|
"internal": 1.05,
|
||||||
|
"no decision": 1.0,
|
||||||
|
"competitor": 0.7,
|
||||||
|
"no need": 0.5,
|
||||||
|
"bad fit": 0.3,
|
||||||
|
}
|
||||||
|
|
||||||
|
# Rate limit delay between HubSpot API calls (seconds)
|
||||||
|
SEARCH_DELAY = float(os.environ.get("HUBSPOT_RATE_DELAY", "1.5"))
|
||||||
|
|
||||||
|
# ─── Your Company Info (for email templates) ────────────────────────────────
|
||||||
|
YOUR_COMPANY_NAME = os.environ.get("YOUR_COMPANY_NAME", "Your Company")
|
||||||
|
YOUR_SENDER_NAME = os.environ.get("YOUR_SENDER_NAME", "Your Name")
|
||||||
|
YOUR_SENDER_TITLE = os.environ.get("YOUR_SENDER_TITLE", "CEO")
|
||||||
|
# A brief value prop to include in emails
|
||||||
|
YOUR_VALUE_PROP = os.environ.get("YOUR_VALUE_PROP",
|
||||||
|
"We've built new capabilities since we last talked that I think you'd find interesting.")
|
||||||
|
|
||||||
|
|
||||||
|
# ─── Exclusion List ──────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def load_exclusions() -> set:
|
||||||
|
"""Load excluded company names (lowercased) from the exclusions file."""
|
||||||
|
if not EXCLUSIONS_FILE.exists():
|
||||||
|
return set()
|
||||||
|
try:
|
||||||
|
data = json.loads(EXCLUSIONS_FILE.read_text())
|
||||||
|
return {e["company"].lower() for e in data.get("excluded_deals", [])}
|
||||||
|
except Exception as ex:
|
||||||
|
print(f"⚠️ Could not load exclusions: {ex}", file=sys.stderr)
|
||||||
|
return set()
|
||||||
|
|
||||||
|
|
||||||
|
def add_exclusion(company: str, deal_id: str = "", reason: str = "manually_excluded") -> None:
|
||||||
|
"""Append a company to the exclusions file."""
|
||||||
|
data = {"excluded_deals": []}
|
||||||
|
if EXCLUSIONS_FILE.exists():
|
||||||
|
try:
|
||||||
|
data = json.loads(EXCLUSIONS_FILE.read_text())
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
existing = {e["company"].lower() for e in data["excluded_deals"]}
|
||||||
|
if company.lower() in existing:
|
||||||
|
print(f"ℹ️ {company} is already excluded.")
|
||||||
|
return
|
||||||
|
data["excluded_deals"].append({
|
||||||
|
"deal_id": deal_id or company.lower().replace(" ", "-"),
|
||||||
|
"company": company,
|
||||||
|
"reason": reason,
|
||||||
|
"excluded_date": datetime.now().strftime("%Y-%m-%d"),
|
||||||
|
})
|
||||||
|
DATA_DIR.mkdir(parents=True, exist_ok=True)
|
||||||
|
EXCLUSIONS_FILE.write_text(json.dumps(data, indent=2))
|
||||||
|
print(f"✅ Added {company} to exclusion list")
|
||||||
|
|
||||||
|
|
||||||
|
# ─── HubSpot Client ─────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
class HubSpotClient:
|
||||||
|
def __init__(self, token: str):
|
||||||
|
self.token = token.strip()
|
||||||
|
self.session = requests.Session()
|
||||||
|
self.session.headers.update({
|
||||||
|
"Authorization": f"Bearer {self.token}",
|
||||||
|
"Content-Type": "application/json",
|
||||||
|
})
|
||||||
|
self._rate_wait = 0.12
|
||||||
|
|
||||||
|
def _request(self, method, path, **kwargs):
|
||||||
|
url = f"{HUBSPOT_BASE_URL}{path}"
|
||||||
|
for attempt in range(4):
|
||||||
|
resp = self.session.request(method, url, **kwargs)
|
||||||
|
if resp.status_code == 429:
|
||||||
|
wait = int(resp.headers.get("Retry-After", 2))
|
||||||
|
print(f" ⏳ Rate limited, waiting {wait}s…", file=sys.stderr)
|
||||||
|
time.sleep(wait)
|
||||||
|
continue
|
||||||
|
resp.raise_for_status()
|
||||||
|
time.sleep(self._rate_wait)
|
||||||
|
return resp.json()
|
||||||
|
raise RuntimeError(f"Too many retries for {path}")
|
||||||
|
|
||||||
|
def get(self, path, **kwargs):
|
||||||
|
return self._request("GET", path, **kwargs)
|
||||||
|
|
||||||
|
def post(self, path, **kwargs):
|
||||||
|
return self._request("POST", path, **kwargs)
|
||||||
|
|
||||||
|
def search_closed_lost_deals(self, since_date: str):
|
||||||
|
"""Search for all closed-lost deals across configured pipelines."""
|
||||||
|
all_deals = []
|
||||||
|
for stage_id in CLOSED_LOST_STAGES:
|
||||||
|
all_deals.extend(self._search_by_stage(stage_id, since_date))
|
||||||
|
return all_deals
|
||||||
|
|
||||||
|
def _search_by_stage(self, stage_id, since_date):
|
||||||
|
deals = []
|
||||||
|
after = None
|
||||||
|
while True:
|
||||||
|
body = {
|
||||||
|
"filterGroups": [{"filters": [
|
||||||
|
{"propertyName": "dealstage", "operator": "EQ", "value": stage_id},
|
||||||
|
{"propertyName": "closedate", "operator": "GTE", "value": since_date},
|
||||||
|
]}],
|
||||||
|
"properties": DEAL_PROPERTIES,
|
||||||
|
"sorts": [{"propertyName": "closedate", "direction": "DESCENDING"}],
|
||||||
|
"limit": 100,
|
||||||
|
}
|
||||||
|
if after:
|
||||||
|
body["after"] = after
|
||||||
|
data = self.post("/crm/v3/objects/deals/search", json=body)
|
||||||
|
deals.extend(data.get("results", []))
|
||||||
|
paging = data.get("paging", {}).get("next")
|
||||||
|
if paging:
|
||||||
|
after = paging["after"]
|
||||||
|
else:
|
||||||
|
break
|
||||||
|
return deals
|
||||||
|
|
||||||
|
def get_deal_associations(self, deal_id, to_type="contacts"):
|
||||||
|
try:
|
||||||
|
data = self.get(f"/crm/v4/objects/deals/{deal_id}/associations/{to_type}")
|
||||||
|
return data.get("results", [])
|
||||||
|
except Exception:
|
||||||
|
return []
|
||||||
|
|
||||||
|
def get_contact(self, contact_id):
|
||||||
|
try:
|
||||||
|
return self.get(
|
||||||
|
f"/crm/v3/objects/contacts/{contact_id}",
|
||||||
|
params={"properties": ",".join(CONTACT_PROPERTIES)},
|
||||||
|
)
|
||||||
|
except Exception:
|
||||||
|
return None
|
||||||
|
|
||||||
|
def get_company_for_contact(self, contact_id):
|
||||||
|
try:
|
||||||
|
assocs = self.get(f"/crm/v4/objects/contacts/{contact_id}/associations/companies")
|
||||||
|
results = assocs.get("results", [])
|
||||||
|
if not results:
|
||||||
|
return None
|
||||||
|
company_id = results[0].get("toObjectId")
|
||||||
|
return self.get(
|
||||||
|
f"/crm/v3/objects/companies/{company_id}",
|
||||||
|
params={"properties": ",".join(COMPANY_PROPERTIES)},
|
||||||
|
)
|
||||||
|
except Exception:
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
# ─── Helpers ─────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def parse_ts(val):
|
||||||
|
"""Parse a timestamp value (epoch ms or ISO string) to datetime."""
|
||||||
|
if not val:
|
||||||
|
return None
|
||||||
|
try:
|
||||||
|
if isinstance(val, (int, float)) or (isinstance(val, str) and val.isdigit()):
|
||||||
|
return datetime.fromtimestamp(int(val) / 1000, tz=timezone.utc)
|
||||||
|
return datetime.fromisoformat(val.replace("Z", "+00:00"))
|
||||||
|
except Exception:
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
# ─── Layer 1: Time Decay Scoring ────────────────────────────────────────────
|
||||||
|
|
||||||
|
def compute_time_decay_score(days_since_close: int, deal_value: float,
|
||||||
|
max_deal_value: float, loss_reason: str,
|
||||||
|
has_trigger: bool) -> dict:
|
||||||
|
"""Compute composite score (0-100) using additive formula:
|
||||||
|
Time component: up to 35 pts (decay weight × 35)
|
||||||
|
Value component: up to 30 pts (normalized value × 30)
|
||||||
|
Reason component: up to 20 pts (loss reason bonus × 20)
|
||||||
|
Trigger component: up to 15 pts (engagement signals)
|
||||||
|
"""
|
||||||
|
# Time decay weight
|
||||||
|
time_weight = 0.0
|
||||||
|
for lo, hi, weight in DECAY_WINDOWS:
|
||||||
|
if lo <= days_since_close <= hi:
|
||||||
|
time_weight = weight
|
||||||
|
break
|
||||||
|
|
||||||
|
# Too fresh (<60 days) — penalize (deal is still raw)
|
||||||
|
if days_since_close < 60:
|
||||||
|
time_weight = 0.2
|
||||||
|
|
||||||
|
# Very old deals only score if trigger present
|
||||||
|
if days_since_close > 540 and not has_trigger:
|
||||||
|
time_weight = 0.0
|
||||||
|
|
||||||
|
# Normalize deal value (0-1)
|
||||||
|
value_norm = min(deal_value / max(max_deal_value, 1), 1.0)
|
||||||
|
|
||||||
|
# Loss reason bonus
|
||||||
|
reason_lower = (loss_reason or "").lower()
|
||||||
|
reason_score = 0.5 # default for unknown reasons
|
||||||
|
for keyword, bonus in LOSS_REASON_BONUS.items():
|
||||||
|
if keyword in reason_lower:
|
||||||
|
reason_score = min(bonus, 1.0)
|
||||||
|
break
|
||||||
|
|
||||||
|
# Trigger bonus
|
||||||
|
trigger_pts = 15.0 if has_trigger else 0.0
|
||||||
|
|
||||||
|
# Additive composite
|
||||||
|
time_pts = time_weight * 35
|
||||||
|
value_pts = value_norm * 30
|
||||||
|
reason_pts = reason_score * 20
|
||||||
|
|
||||||
|
composite = min(100, round(time_pts + value_pts + reason_pts + trigger_pts))
|
||||||
|
|
||||||
|
return {
|
||||||
|
"time_decay_weight": time_weight,
|
||||||
|
"value_normalized": round(value_norm, 3),
|
||||||
|
"trigger_bonus": round(reason_score, 2),
|
||||||
|
"composite_score": composite,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# ─── Email Generation ───────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def _random_cta():
|
||||||
|
return random.choice([
|
||||||
|
"Worth revisiting?",
|
||||||
|
"Open to a quick catch-up?",
|
||||||
|
"Curious if the timing is better now?",
|
||||||
|
"Worth 15 min to compare notes?",
|
||||||
|
"Any interest in reconnecting?",
|
||||||
|
"Make sense to chat again?",
|
||||||
|
])
|
||||||
|
|
||||||
|
|
||||||
|
def _random_signoff():
|
||||||
|
return random.choice([
|
||||||
|
YOUR_SENDER_NAME,
|
||||||
|
f"{YOUR_SENDER_NAME}\n{YOUR_SENDER_TITLE}, {YOUR_COMPANY_NAME}",
|
||||||
|
f"- {YOUR_SENDER_NAME}",
|
||||||
|
])
|
||||||
|
|
||||||
|
|
||||||
|
# Revival email angles — rotated based on loss reason
|
||||||
|
REVIVAL_ANGLES = {
|
||||||
|
"timing": [
|
||||||
|
{
|
||||||
|
"subject": "{first}, checking back in",
|
||||||
|
"hook": "When we last talked, you mentioned the timing wasn't right. "
|
||||||
|
"It's been {months} months. Figured I'd check in rather than assume.",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"subject": "been a while, {first}",
|
||||||
|
"hook": "It's been {months} months since we last connected on {company}. "
|
||||||
|
"A lot has probably changed on both sides.",
|
||||||
|
},
|
||||||
|
],
|
||||||
|
"competitor": [
|
||||||
|
{
|
||||||
|
"subject": "how's the current setup, {first}?",
|
||||||
|
"hook": "Last time, you went with another partner. Totally respect that. "
|
||||||
|
"Curious how it's going and whether there's room to compare notes.",
|
||||||
|
},
|
||||||
|
],
|
||||||
|
"budget": [
|
||||||
|
{
|
||||||
|
"subject": "new pricing options",
|
||||||
|
"hook": "Pricing was the sticking point last time. We've restructured since then. "
|
||||||
|
"We now offer performance-based models where you pay for results.",
|
||||||
|
},
|
||||||
|
],
|
||||||
|
"internal": [
|
||||||
|
{
|
||||||
|
"subject": "{first}, dust settled yet?",
|
||||||
|
"hook": "Last time, internal changes at {company} put things on hold. "
|
||||||
|
"Wanted to see if the original initiative is back on the table.",
|
||||||
|
},
|
||||||
|
],
|
||||||
|
"ghost": [
|
||||||
|
{
|
||||||
|
"subject": "{first}, one more try",
|
||||||
|
"hook": "We connected {months} months ago but lost touch. No hard feelings. "
|
||||||
|
"Just wanted to resurface in case the need is still there.",
|
||||||
|
},
|
||||||
|
],
|
||||||
|
"default": [
|
||||||
|
{
|
||||||
|
"subject": "quick update for {first}",
|
||||||
|
"hook": "We connected {months} months ago about growing {company}. "
|
||||||
|
"A lot has changed on our end since then.",
|
||||||
|
},
|
||||||
|
],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _categorize_loss_reason(loss_reason):
|
||||||
|
"""Map a free-text loss reason to a category for email angle selection."""
|
||||||
|
lr = (loss_reason or "").lower()
|
||||||
|
if any(w in lr for w in ["timing", "not ready", "circle back", "follow up"]):
|
||||||
|
return "timing"
|
||||||
|
if any(w in lr for w in ["competitor", "chose", "existing relationship"]):
|
||||||
|
return "competitor"
|
||||||
|
if any(w in lr for w in ["budget", "price", "pricing", "cost"]):
|
||||||
|
return "budget"
|
||||||
|
if any(w in lr for w in ["internal", "restructur", "reorg", "change"]):
|
||||||
|
return "internal"
|
||||||
|
if any(w in lr for w in ["ghost", "unresponsive", "no response"]):
|
||||||
|
return "ghost"
|
||||||
|
return "default"
|
||||||
|
|
||||||
|
|
||||||
|
def draft_revival_email(contact_name, company_name, deal_value, loss_reason,
|
||||||
|
days_since_close, contact_title=""):
|
||||||
|
"""Draft a personalized revival email based on loss reason category."""
|
||||||
|
first = contact_name.split()[0] if contact_name else "there"
|
||||||
|
months = days_since_close // 30
|
||||||
|
category = _categorize_loss_reason(loss_reason)
|
||||||
|
|
||||||
|
angle = random.choice(REVIVAL_ANGLES.get(category, REVIVAL_ANGLES["default"]))
|
||||||
|
subject = angle["subject"].format(first=first, company=company_name, months=months)
|
||||||
|
hook = angle["hook"].format(first=first, company=company_name, months=months)
|
||||||
|
|
||||||
|
cta = _random_cta()
|
||||||
|
signoff = _random_signoff()
|
||||||
|
|
||||||
|
body = f"Hey {first},\n\n{hook}\n\n{YOUR_VALUE_PROP}\n\n{cta}\n\n{signoff}"
|
||||||
|
|
||||||
|
return {"subject": subject, "body": body}
|
||||||
|
|
||||||
|
|
||||||
|
def draft_replacement_email(replacement_name, company_name, original_contact):
|
||||||
|
"""Draft email to a replacement POC at the same company."""
|
||||||
|
first = replacement_name.split()[0] if replacement_name else "there"
|
||||||
|
orig_first = original_contact.split()[0] if original_contact else "your predecessor"
|
||||||
|
cta = _random_cta()
|
||||||
|
signoff = _random_signoff()
|
||||||
|
|
||||||
|
return {
|
||||||
|
"subject": f"picking up where {orig_first} left off at {company_name}",
|
||||||
|
"body": (
|
||||||
|
f"Hey {first},\n\n"
|
||||||
|
f"We were in conversation with {original_contact} about growth for "
|
||||||
|
f"{company_name} before the team change.\n\n"
|
||||||
|
f"{YOUR_VALUE_PROP}\n\n"
|
||||||
|
f"{cta}\n\n{signoff}"
|
||||||
|
),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def draft_champion_email(champion_name, new_company, new_title, old_company):
|
||||||
|
"""Draft email to a champion who moved to a new company."""
|
||||||
|
first = champion_name.split()[0] if champion_name else "there"
|
||||||
|
cta = _random_cta()
|
||||||
|
signoff = _random_signoff()
|
||||||
|
|
||||||
|
return {
|
||||||
|
"subject": f"congrats on the move, {first}",
|
||||||
|
"body": (
|
||||||
|
f"Hey {first},\n\n"
|
||||||
|
f"Saw you moved to {new_company}. Congrats on the {new_title} role.\n\n"
|
||||||
|
f"We had a great conversation when you were at {old_company}. "
|
||||||
|
f"Now that you're settling in, I'd love to show you what we can do "
|
||||||
|
f"for {new_company}.\n\n"
|
||||||
|
f"{cta}\n\n{signoff}"
|
||||||
|
),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# ─── Main Pipeline ───────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="Deal Resurrector v2 — Time Decay + POC Expansion + Champion Tracking"
|
||||||
|
)
|
||||||
|
parser.add_argument("--top", type=int, default=10, help="Number of top deals (default: 10)")
|
||||||
|
parser.add_argument("--min-score", type=int, default=40, help="Minimum composite score (default: 40)")
|
||||||
|
parser.add_argument("--min-deal-value", type=float, default=5000, help="Min deal value (default: 5000)")
|
||||||
|
parser.add_argument("--months", type=int, default=24, help="Look back N months (default: 24)")
|
||||||
|
parser.add_argument("--include-champion", action="store_true", help="Enable Layer 3: Follow the Champion")
|
||||||
|
parser.add_argument("--dry-run", action="store_true", help="Print results, don't save")
|
||||||
|
parser.add_argument("--skip-search", action="store_true", help="Skip web searches (faster)")
|
||||||
|
parser.add_argument("--add-exclusion", metavar="COMPANY", help="Add a company to exclusion list and exit")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
if args.add_exclusion:
|
||||||
|
add_exclusion(args.add_exclusion)
|
||||||
|
return
|
||||||
|
|
||||||
|
if not HUBSPOT_TOKEN:
|
||||||
|
print("❌ HUBSPOT_API_KEY environment variable not set.", file=sys.stderr)
|
||||||
|
print(" Set it: export HUBSPOT_API_KEY='your-token-here'", file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
print("🔥 Deal Resurrector v2")
|
||||||
|
print(f" Layers: Time Decay + POC Expansion"
|
||||||
|
f"{ ' + Champion Tracking' if args.include_champion else ''}")
|
||||||
|
print(f" Top {args.top} | min score {args.min_score} | min value ${args.min_deal_value:,.0f}")
|
||||||
|
print()
|
||||||
|
|
||||||
|
excluded_companies = load_exclusions()
|
||||||
|
if excluded_companies:
|
||||||
|
print(f"🚫 Exclusion list: {len(excluded_companies)} companies will be skipped")
|
||||||
|
print()
|
||||||
|
|
||||||
|
client = HubSpotClient(HUBSPOT_TOKEN)
|
||||||
|
|
||||||
|
# Step 1: Pull closed-lost deals
|
||||||
|
since = (datetime.now(timezone.utc) - timedelta(days=args.months * 30)).strftime("%Y-%m-%d")
|
||||||
|
print(f"📥 Fetching closed-lost deals since {since}…")
|
||||||
|
deals = client.search_closed_lost_deals(since)
|
||||||
|
print(f" Found {len(deals)} closed-lost deals")
|
||||||
|
|
||||||
|
# Filter by value
|
||||||
|
filtered = []
|
||||||
|
for d in deals:
|
||||||
|
amt = float(d["properties"].get("amount") or 0)
|
||||||
|
if amt >= args.min_deal_value:
|
||||||
|
filtered.append(d)
|
||||||
|
print(f" {len(filtered)} deals above ${args.min_deal_value:,.0f}")
|
||||||
|
|
||||||
|
# Filter exclusions
|
||||||
|
if excluded_companies:
|
||||||
|
pre = len(filtered)
|
||||||
|
filtered = [
|
||||||
|
d for d in filtered
|
||||||
|
if d["properties"].get("dealname", "").lower() not in excluded_companies
|
||||||
|
and not any(excl in d["properties"].get("dealname", "").lower()
|
||||||
|
for excl in excluded_companies)
|
||||||
|
]
|
||||||
|
excluded_count = pre - len(filtered)
|
||||||
|
if excluded_count:
|
||||||
|
print(f" 🚫 {excluded_count} deal(s) excluded")
|
||||||
|
|
||||||
|
if not filtered:
|
||||||
|
print("No deals to process. Exiting.")
|
||||||
|
return
|
||||||
|
|
||||||
|
max_value = max(float(d["properties"].get("amount") or 0) for d in filtered)
|
||||||
|
now = datetime.now(timezone.utc)
|
||||||
|
|
||||||
|
# Step 2: Score and enrich
|
||||||
|
results = []
|
||||||
|
for i, deal in enumerate(filtered):
|
||||||
|
dp = deal["properties"]
|
||||||
|
deal_id = deal["id"]
|
||||||
|
deal_name = dp.get("dealname", "Unknown")
|
||||||
|
amount = float(dp.get("amount") or 0)
|
||||||
|
loss_reason = dp.get("closed_lost_reason") or "Unknown"
|
||||||
|
close_dt = parse_ts(dp.get("closedate"))
|
||||||
|
days_since = (now - close_dt).days if close_dt else 999
|
||||||
|
|
||||||
|
print(f" [{i+1}/{len(filtered)}] {deal_name} (${amount:,.0f}, {days_since}d ago)…",
|
||||||
|
end="", flush=True)
|
||||||
|
|
||||||
|
# Get primary contact
|
||||||
|
assocs = client.get_deal_associations(deal_id, "contacts")
|
||||||
|
contact_name = "Unknown"
|
||||||
|
contact_email = ""
|
||||||
|
contact_title = ""
|
||||||
|
company_name = deal_name
|
||||||
|
contact_data = None
|
||||||
|
|
||||||
|
if assocs:
|
||||||
|
cid = str(assocs[0].get("toObjectId"))
|
||||||
|
contact_data = client.get_contact(cid)
|
||||||
|
if contact_data:
|
||||||
|
cp = contact_data.get("properties", {})
|
||||||
|
fn = cp.get("firstname") or ""
|
||||||
|
ln = cp.get("lastname") or ""
|
||||||
|
contact_name = f"{fn} {ln}".strip() or "Unknown"
|
||||||
|
contact_email = cp.get("email", "")
|
||||||
|
contact_title = cp.get("jobtitle", "")
|
||||||
|
company_name = cp.get("company") or company_name
|
||||||
|
|
||||||
|
company_data = client.get_company_for_contact(cid)
|
||||||
|
if company_data:
|
||||||
|
company_name = company_data.get("properties", {}).get("name") or company_name
|
||||||
|
|
||||||
|
# Detect engagement triggers
|
||||||
|
triggers = []
|
||||||
|
if contact_data and contact_data.get("properties"):
|
||||||
|
cp = contact_data["properties"]
|
||||||
|
if parse_ts(cp.get("hs_email_last_open_date")):
|
||||||
|
if (now - parse_ts(cp.get("hs_email_last_open_date"))).days < 60:
|
||||||
|
triggers.append("recent_email_open")
|
||||||
|
if parse_ts(cp.get("hs_analytics_last_visit_timestamp")):
|
||||||
|
if (now - parse_ts(cp.get("hs_analytics_last_visit_timestamp"))).days < 90:
|
||||||
|
triggers.append("recent_site_visit")
|
||||||
|
|
||||||
|
has_trigger = len(triggers) > 0
|
||||||
|
|
||||||
|
# Layer 1: Time Decay Score
|
||||||
|
decay = compute_time_decay_score(days_since, amount, max_value, loss_reason, has_trigger)
|
||||||
|
composite = decay["composite_score"]
|
||||||
|
|
||||||
|
if composite < args.min_score:
|
||||||
|
print(f" → score {composite} (skip)")
|
||||||
|
continue
|
||||||
|
|
||||||
|
print(f" → score {composite}")
|
||||||
|
|
||||||
|
# Generate revival email
|
||||||
|
original_email = draft_revival_email(
|
||||||
|
contact_name, company_name, amount, loss_reason, days_since, contact_title
|
||||||
|
)
|
||||||
|
|
||||||
|
# Determine revival type
|
||||||
|
revival_type = "trigger" if has_trigger else "time_decay"
|
||||||
|
|
||||||
|
entry = {
|
||||||
|
"deal_id": deal_id,
|
||||||
|
"company": company_name,
|
||||||
|
"original_contact": {
|
||||||
|
"name": contact_name,
|
||||||
|
"email": contact_email,
|
||||||
|
"title": contact_title,
|
||||||
|
},
|
||||||
|
"deal_value": amount,
|
||||||
|
"days_since_close": days_since,
|
||||||
|
"close_date": dp.get("closedate", ""),
|
||||||
|
"loss_reason": loss_reason,
|
||||||
|
"pipeline": CLOSED_LOST_STAGES.get(dp.get("dealstage"), "Unknown"),
|
||||||
|
"time_decay_score": decay["time_decay_weight"],
|
||||||
|
"composite_score": composite,
|
||||||
|
"poc_status": "unknown",
|
||||||
|
"triggers": triggers,
|
||||||
|
"revival_emails": {
|
||||||
|
"original": original_email,
|
||||||
|
"replacement": None,
|
||||||
|
"champion": None,
|
||||||
|
},
|
||||||
|
"revival_type": revival_type,
|
||||||
|
}
|
||||||
|
results.append(entry)
|
||||||
|
|
||||||
|
# Sort by composite score
|
||||||
|
results.sort(key=lambda x: x["composite_score"], reverse=True)
|
||||||
|
top_results = results[:args.top]
|
||||||
|
|
||||||
|
# Output
|
||||||
|
output = {
|
||||||
|
"generated_at": now.isoformat(),
|
||||||
|
"version": "v2",
|
||||||
|
"total_closed_lost": len(deals),
|
||||||
|
"above_min_value": len(filtered),
|
||||||
|
"scored_above_threshold": len(results),
|
||||||
|
"returned": len(top_results),
|
||||||
|
"parameters": {
|
||||||
|
"months": args.months,
|
||||||
|
"min_score": args.min_score,
|
||||||
|
"min_deal_value": args.min_deal_value,
|
||||||
|
"top": args.top,
|
||||||
|
"include_champion": args.include_champion,
|
||||||
|
},
|
||||||
|
"deals": top_results,
|
||||||
|
}
|
||||||
|
|
||||||
|
# Print summary
|
||||||
|
print(f"\n{'='*70}")
|
||||||
|
print(f"🔥 TOP {len(top_results)} REVIVAL OPPORTUNITIES")
|
||||||
|
print(f"{'='*70}")
|
||||||
|
for i, d in enumerate(top_results, 1):
|
||||||
|
print(f"\n#{i} | Score: {d['composite_score']}/100 | {d['company']}")
|
||||||
|
print(f" Deal Value: ${d['deal_value']:,.0f} | Days Since Close: {d['days_since_close']}")
|
||||||
|
print(f" Contact: {d['original_contact']['name']} ({d['original_contact']['email']})")
|
||||||
|
print(f" Title: {d['original_contact']['title']}")
|
||||||
|
print(f" Loss Reason: {d['loss_reason']}")
|
||||||
|
print(f" Revival Type: {d['revival_type']}")
|
||||||
|
print(f" Triggers: {', '.join(d['triggers']) or 'none'}")
|
||||||
|
|
||||||
|
if not args.dry_run:
|
||||||
|
DATA_DIR.mkdir(parents=True, exist_ok=True)
|
||||||
|
OUTPUT_FILE.write_text(json.dumps(output, indent=2, default=str))
|
||||||
|
print(f"\n📁 Saved to {OUTPUT_FILE}")
|
||||||
|
else:
|
||||||
|
print(f"\n🏃 Dry run — not saving.")
|
||||||
|
|
||||||
|
print(f"\n{'='*70}")
|
||||||
|
print(f"✅ Deal Resurrector v2 complete. {len(top_results)} deals ready for review.")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
287
sales-pipeline/icp_learning_analyzer.py
Normal file
287
sales-pipeline/icp_learning_analyzer.py
Normal file
|
|
@ -0,0 +1,287 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
ICP Learning Analyzer — learns from your prospect approve/reject decisions.
|
||||||
|
|
||||||
|
Reads prospect approval/rejection history from a PostgreSQL database,
|
||||||
|
analyzes patterns by source type (cold, trigger, warm, revival), and
|
||||||
|
outputs recommended ICP filter changes.
|
||||||
|
|
||||||
|
Your ICP evolves from your own data instead of guesswork.
|
||||||
|
|
||||||
|
Analyzes:
|
||||||
|
- Industry patterns (which industries convert vs. get rejected)
|
||||||
|
- Company size sweet spots (employee count ranges that win)
|
||||||
|
- Title patterns (which seniority levels get approved)
|
||||||
|
- Revenue ranges (what deal sizes work)
|
||||||
|
- Approval rates per source type
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python3 icp_learning_analyzer.py
|
||||||
|
python3 icp_learning_analyzer.py --config data/icp-config.json
|
||||||
|
|
||||||
|
Requires:
|
||||||
|
- DATABASE_URL environment variable (PostgreSQL connection string)
|
||||||
|
- psycopg2-binary package
|
||||||
|
- A prospects table with status, source, and company/contact joins
|
||||||
|
|
||||||
|
Configuration:
|
||||||
|
Create data/icp-config.json with source_type_mapping and min_sample_size.
|
||||||
|
See .env.example and data/icp-config.example.json for templates.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
from collections import Counter, defaultdict
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
logging.basicConfig(level=logging.INFO, format="%(asctime)s [ICP-Analyzer] %(message)s")
|
||||||
|
log = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
# ─── Configuration ───────────────────────────────────────────────────────────
|
||||||
|
BASE_DIR = Path(os.environ.get("BASE_DIR", Path(__file__).resolve().parent))
|
||||||
|
DATA_DIR = BASE_DIR / "data"
|
||||||
|
OUTPUT_PATH = DATA_DIR / "icp-recommendations.json"
|
||||||
|
|
||||||
|
# Database connection string
|
||||||
|
DATABASE_URL = os.environ.get("DATABASE_URL", "")
|
||||||
|
|
||||||
|
# Default ICP config (override with --config flag)
|
||||||
|
DEFAULT_CONFIG = {
|
||||||
|
# Maps your prospect source names to analysis categories
|
||||||
|
"source_type_mapping": {
|
||||||
|
"cold_outbound": "cold",
|
||||||
|
"trigger_prospector": "trigger",
|
||||||
|
"website_visitor": "warm",
|
||||||
|
"deal_revival": "revival",
|
||||||
|
"referral": "warm",
|
||||||
|
"inbound": "warm",
|
||||||
|
},
|
||||||
|
# Minimum approved samples before generating recommendations
|
||||||
|
"min_sample_size": 30,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def load_config(config_path=None):
|
||||||
|
"""Load ICP config from file or use defaults."""
|
||||||
|
if config_path and Path(config_path).exists():
|
||||||
|
with open(config_path) as f:
|
||||||
|
return json.load(f)
|
||||||
|
default_path = DATA_DIR / "icp-config.json"
|
||||||
|
if default_path.exists():
|
||||||
|
with open(default_path) as f:
|
||||||
|
return json.load(f)
|
||||||
|
log.info("No config file found, using defaults")
|
||||||
|
return DEFAULT_CONFIG
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_prospects():
|
||||||
|
"""Fetch approved/rejected prospects from database.
|
||||||
|
|
||||||
|
Expected schema:
|
||||||
|
prospects: source, status, signal, conviction_score, company_id, contact_id
|
||||||
|
companies: id, industry, employees, revenue_range
|
||||||
|
contacts: id, title
|
||||||
|
|
||||||
|
Status values: approved, skipped, sent, opened, replied, meeting, won, lost
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
import psycopg2
|
||||||
|
except ImportError:
|
||||||
|
log.error("psycopg2 not installed. Run: pip install psycopg2-binary")
|
||||||
|
return []
|
||||||
|
|
||||||
|
if not DATABASE_URL:
|
||||||
|
log.error("DATABASE_URL not set. Set it in your environment or .env file.")
|
||||||
|
return []
|
||||||
|
|
||||||
|
try:
|
||||||
|
conn = psycopg2.connect(DATABASE_URL)
|
||||||
|
cur = conn.cursor()
|
||||||
|
cur.execute("""
|
||||||
|
SELECT p.source, p.status, p.signal, p.conviction_score,
|
||||||
|
c.industry, c.employees, c.revenue_range,
|
||||||
|
ct.title
|
||||||
|
FROM prospects p
|
||||||
|
LEFT JOIN companies c ON p.company_id = c.id
|
||||||
|
LEFT JOIN contacts ct ON p.contact_id = ct.id
|
||||||
|
WHERE p.status IN ('approved', 'skipped', 'sent', 'opened',
|
||||||
|
'replied', 'meeting', 'won', 'lost')
|
||||||
|
""")
|
||||||
|
cols = [d[0] for d in cur.description]
|
||||||
|
rows = [dict(zip(cols, row)) for row in cur.fetchall()]
|
||||||
|
conn.close()
|
||||||
|
log.info(f"Fetched {len(rows)} prospect records")
|
||||||
|
return rows
|
||||||
|
except Exception as e:
|
||||||
|
log.error(f"Database query failed: {e}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
|
||||||
|
def classify_status(status):
|
||||||
|
"""Map database status to binary approved/rejected for analysis."""
|
||||||
|
approved_statuses = {"approved", "sent", "opened", "replied", "meeting", "won"}
|
||||||
|
return "approved" if status in approved_statuses else "rejected"
|
||||||
|
|
||||||
|
|
||||||
|
def parse_revenue(revenue_range):
|
||||||
|
"""Parse revenue_range string to midpoint integer.
|
||||||
|
|
||||||
|
Handles formats like: "$10M-$50M", "10M-50M", "$5M - $10M"
|
||||||
|
Returns None if unparseable.
|
||||||
|
"""
|
||||||
|
if not revenue_range:
|
||||||
|
return None
|
||||||
|
cleaned = str(revenue_range).replace("$", "").replace(",", "").strip()
|
||||||
|
parts = (cleaned
|
||||||
|
.replace("M", "000000")
|
||||||
|
.replace("B", "000000000")
|
||||||
|
.replace("K", "000")
|
||||||
|
.split("-"))
|
||||||
|
try:
|
||||||
|
nums = [int(float(p.strip())) for p in parts if p.strip()]
|
||||||
|
return sum(nums) // len(nums) if nums else None
|
||||||
|
except (ValueError, ZeroDivisionError):
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def analyze_source_group(prospects, min_sample):
|
||||||
|
"""Analyze a group of prospects and return filter recommendations.
|
||||||
|
|
||||||
|
Returns recommendations for:
|
||||||
|
- industries: which to target, which to exclude
|
||||||
|
- employees: min/max employee count range
|
||||||
|
- titles: top-performing job titles
|
||||||
|
- revenue: min/max revenue range
|
||||||
|
- confidence: overall approval rate
|
||||||
|
"""
|
||||||
|
approved = [p for p in prospects if classify_status(p["status"]) == "approved"]
|
||||||
|
rejected = [p for p in prospects if classify_status(p["status"]) == "rejected"]
|
||||||
|
|
||||||
|
if len(approved) < min_sample:
|
||||||
|
return {
|
||||||
|
"status": "insufficient_data",
|
||||||
|
"sample_size": len(approved),
|
||||||
|
"min_required": min_sample,
|
||||||
|
"filters": {},
|
||||||
|
}
|
||||||
|
|
||||||
|
total_approved = len(approved)
|
||||||
|
total_rejected = max(len(rejected), 1)
|
||||||
|
|
||||||
|
# ── Industry Analysis ────────────────────────────────────────────────
|
||||||
|
approved_industries = Counter(p["industry"] for p in approved if p.get("industry"))
|
||||||
|
rejected_industries = Counter(p["industry"] for p in rejected if p.get("industry"))
|
||||||
|
|
||||||
|
# Industries with >10% of approvals = recommend targeting
|
||||||
|
rec_industries = [ind for ind, cnt in approved_industries.most_common(10)
|
||||||
|
if cnt / total_approved >= 0.10]
|
||||||
|
# Industries with >30% of rejections and <5% of approvals = recommend excluding
|
||||||
|
exclude_industries = [ind for ind, cnt in rejected_industries.most_common()
|
||||||
|
if cnt / total_rejected >= 0.30
|
||||||
|
and approved_industries.get(ind, 0) / total_approved < 0.05]
|
||||||
|
|
||||||
|
# ── Employee Count Analysis ──────────────────────────────────────────
|
||||||
|
approved_emp = sorted([p["employees"] for p in approved if p.get("employees")])
|
||||||
|
emp_filters = {}
|
||||||
|
if approved_emp:
|
||||||
|
p10 = approved_emp[max(0, len(approved_emp) // 10)]
|
||||||
|
p90 = approved_emp[min(len(approved_emp) - 1, len(approved_emp) * 9 // 10)]
|
||||||
|
emp_filters["min_employees"] = p10
|
||||||
|
emp_filters["max_employees"] = p90
|
||||||
|
|
||||||
|
# ── Title Analysis ───────────────────────────────────────────────────
|
||||||
|
approved_titles = Counter(p["title"] for p in approved if p.get("title"))
|
||||||
|
top_titles = [t for t, _ in approved_titles.most_common(8)]
|
||||||
|
|
||||||
|
# ── Revenue Analysis ─────────────────────────────────────────────────
|
||||||
|
approved_rev = [parse_revenue(p.get("revenue_range")) for p in approved]
|
||||||
|
approved_rev = sorted([r for r in approved_rev if r is not None])
|
||||||
|
rev_filters = {}
|
||||||
|
if approved_rev:
|
||||||
|
rev_filters["revenue_min"] = approved_rev[max(0, len(approved_rev) // 10)]
|
||||||
|
rev_filters["revenue_max"] = approved_rev[min(len(approved_rev) - 1,
|
||||||
|
len(approved_rev) * 9 // 10)]
|
||||||
|
|
||||||
|
# ── Compile Filters ──────────────────────────────────────────────────
|
||||||
|
approval_rate = total_approved / (total_approved + len(rejected))
|
||||||
|
filters = {**emp_filters, **rev_filters}
|
||||||
|
if rec_industries:
|
||||||
|
filters["industries"] = rec_industries
|
||||||
|
if exclude_industries:
|
||||||
|
filters["exclude_industries"] = exclude_industries
|
||||||
|
if top_titles:
|
||||||
|
filters["titles"] = top_titles
|
||||||
|
|
||||||
|
return {
|
||||||
|
"status": "ready",
|
||||||
|
"filters": filters,
|
||||||
|
"confidence": round(approval_rate, 3),
|
||||||
|
"sample_size": total_approved,
|
||||||
|
"rejected_count": len(rejected),
|
||||||
|
"approval_rate": round(approval_rate, 3),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# ─── Main ────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(description="ICP Learning Analyzer")
|
||||||
|
parser.add_argument("--config", help="Path to icp-config.json")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
config = load_config(args.config)
|
||||||
|
source_mapping = config.get("source_type_mapping", DEFAULT_CONFIG["source_type_mapping"])
|
||||||
|
min_sample = config.get("min_sample_size", DEFAULT_CONFIG["min_sample_size"])
|
||||||
|
|
||||||
|
prospects = fetch_prospects()
|
||||||
|
|
||||||
|
# Group by mapped source type
|
||||||
|
grouped = defaultdict(list)
|
||||||
|
for p in prospects:
|
||||||
|
mapped = source_mapping.get(p.get("source", ""), "other")
|
||||||
|
grouped[mapped].append(p)
|
||||||
|
|
||||||
|
recommendations = {}
|
||||||
|
for source_type in ["cold", "trigger", "warm", "revival"]:
|
||||||
|
group = grouped.get(source_type, [])
|
||||||
|
log.info(f"[{source_type}] {len(group)} total prospects")
|
||||||
|
recommendations[source_type] = analyze_source_group(group, min_sample)
|
||||||
|
|
||||||
|
output = {
|
||||||
|
"generated_at": datetime.now(timezone.utc).isoformat(),
|
||||||
|
"status": "complete" if prospects else "no_data",
|
||||||
|
"total_prospects_analyzed": len(prospects),
|
||||||
|
"recommendations": recommendations,
|
||||||
|
}
|
||||||
|
|
||||||
|
DATA_DIR.mkdir(parents=True, exist_ok=True)
|
||||||
|
with open(OUTPUT_PATH, "w") as f:
|
||||||
|
json.dump(output, f, indent=2)
|
||||||
|
|
||||||
|
log.info(f"Wrote recommendations to {OUTPUT_PATH}")
|
||||||
|
|
||||||
|
# Summary
|
||||||
|
print(f"\n📊 ICP Learning Analyzer Results")
|
||||||
|
print(f" Total prospects analyzed: {len(prospects)}")
|
||||||
|
print(f" {'─'*40}")
|
||||||
|
for src, rec in recommendations.items():
|
||||||
|
status = rec.get("status", "unknown")
|
||||||
|
sample = rec.get("sample_size", 0)
|
||||||
|
rate = rec.get("approval_rate", 0)
|
||||||
|
print(f" {src:10s}: {status:20s} (n={sample}, approval={rate:.0%})")
|
||||||
|
if rec.get("filters"):
|
||||||
|
f = rec["filters"]
|
||||||
|
if f.get("industries"):
|
||||||
|
print(f" → Target: {', '.join(f['industries'][:5])}")
|
||||||
|
if f.get("exclude_industries"):
|
||||||
|
print(f" → Exclude: {', '.join(f['exclude_industries'][:3])}")
|
||||||
|
if f.get("min_employees"):
|
||||||
|
print(f" → Employees: {f['min_employees']}-{f.get('max_employees', '?')}")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
410
sales-pipeline/rb2b_instantly_router.py
Normal file
410
sales-pipeline/rb2b_instantly_router.py
Normal file
|
|
@ -0,0 +1,410 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
RB2B → Instantly Router
|
||||||
|
|
||||||
|
Full pipeline: receives RB2B webhook data, runs suppression pipeline,
|
||||||
|
classifies visitor type, routes to correct Instantly campaign via API.
|
||||||
|
|
||||||
|
Can run as:
|
||||||
|
1. HTTP server (direct webhook endpoint)
|
||||||
|
2. Stdin processor (for testing / batch processing)
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python3 rb2b_instantly_router.py --serve --port 4100
|
||||||
|
echo '{"email":"..."}' | python3 rb2b_instantly_router.py
|
||||||
|
echo '{"email":"..."}' | python3 rb2b_instantly_router.py --dry-run
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import subprocess
|
||||||
|
import sys
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
from http.server import HTTPServer, BaseHTTPRequestHandler
|
||||||
|
from pathlib import Path
|
||||||
|
from urllib.parse import urlparse
|
||||||
|
|
||||||
|
LOG = logging.getLogger("rb2b-router")
|
||||||
|
|
||||||
|
# ─── Configuration ───────────────────────────────────────────────────────────
|
||||||
|
BASE_DIR = Path(os.environ.get("BASE_DIR", Path(__file__).resolve().parent))
|
||||||
|
|
||||||
|
# Import the suppression pipeline (lives in same directory)
|
||||||
|
sys.path.insert(0, str(BASE_DIR))
|
||||||
|
from rb2b_suppression_pipeline import run_suppression_pipeline, record_enrollment
|
||||||
|
|
||||||
|
# Instantly API key — set via environment variable
|
||||||
|
INSTANTLY_API_KEY = os.environ.get("INSTANTLY_API_KEY", "")
|
||||||
|
|
||||||
|
# Campaign configuration file — maps campaign names to Instantly campaign UUIDs
|
||||||
|
# Format: {"campaigns": {"Campaign-Name": "uuid-here", ...}}
|
||||||
|
CAMPAIGNS_FILE = BASE_DIR / "data" / "campaigns.json"
|
||||||
|
|
||||||
|
|
||||||
|
def _load_campaigns():
|
||||||
|
"""Load campaign name → UUID mapping from config file."""
|
||||||
|
try:
|
||||||
|
data = json.loads(CAMPAIGNS_FILE.read_text())
|
||||||
|
return data.get("campaigns", {})
|
||||||
|
except Exception:
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
CAMPAIGNS = _load_campaigns()
|
||||||
|
|
||||||
|
# ─── Agency Detection ────────────────────────────────────────────────────────
|
||||||
|
# Keywords that signal the visitor works at a marketing agency.
|
||||||
|
# Useful for routing agency visitors to agency-specific campaigns
|
||||||
|
# (e.g., partnership offers vs. client acquisition).
|
||||||
|
|
||||||
|
AGENCY_KEYWORDS_COMPANY = [
|
||||||
|
"agency", "digital", "media", "creative", "studio", "consultancy",
|
||||||
|
"marketing agency", "seo agency", "advertising",
|
||||||
|
]
|
||||||
|
AGENCY_KEYWORDS_TITLE = ["agency", "consultant", "freelance"]
|
||||||
|
AGENCY_INDUSTRIES = ["marketing and advertising", "advertising services"]
|
||||||
|
|
||||||
|
# ─── Seniority Tiers (for company-level dedup) ──────────────────────────────
|
||||||
|
# Lower rank = more senior. When two people from the same company visit,
|
||||||
|
# keep the more senior one.
|
||||||
|
SENIORITY_ORDER = {
|
||||||
|
"founder": 1, "ceo": 1, "co-founder": 1, "president": 1,
|
||||||
|
"cmo": 2, "cto": 2, "coo": 2, "cfo": 2, "chief": 2,
|
||||||
|
"svp": 3, "evp": 3, "senior vice president": 3,
|
||||||
|
"vp": 4, "vice president": 4,
|
||||||
|
"director": 5, "senior director": 5, "managing director": 5,
|
||||||
|
"head of": 6,
|
||||||
|
"manager": 7, "senior manager": 7,
|
||||||
|
}
|
||||||
|
|
||||||
|
# ─── Intent Scoring ─────────────────────────────────────────────────────────
|
||||||
|
# Maps URL path patterns to intent scores. Customize for your site.
|
||||||
|
PAGE_INTENT_SCORES = {
|
||||||
|
"pricing": 90, "plans": 90, "contact": 85, "demo": 85,
|
||||||
|
"get-started": 85, "free-consultation": 85, "request-demo": 85,
|
||||||
|
"case-study": 70, "case-studies": 70, "results": 70,
|
||||||
|
"services": 65, "solutions": 65, "about": 60,
|
||||||
|
"blog": 30, "podcast": 25,
|
||||||
|
}
|
||||||
|
|
||||||
|
# Visitors below this score are skipped (blog-only readers, etc.)
|
||||||
|
MIN_INTENT_SCORE = int(os.environ.get("MIN_INTENT_SCORE", "50"))
|
||||||
|
|
||||||
|
|
||||||
|
def score_intent(pages):
|
||||||
|
"""Score visitor intent from pages visited. Returns 0-100."""
|
||||||
|
if not pages:
|
||||||
|
return 30 # default low
|
||||||
|
if isinstance(pages, str):
|
||||||
|
pages = [pages]
|
||||||
|
max_score = 20
|
||||||
|
for page in pages:
|
||||||
|
path = page.lower().strip("/")
|
||||||
|
for pattern, score in PAGE_INTENT_SCORES.items():
|
||||||
|
if pattern in path:
|
||||||
|
max_score = max(max_score, score)
|
||||||
|
return max_score
|
||||||
|
|
||||||
|
|
||||||
|
def is_agency(visitor):
|
||||||
|
"""Classify visitor as agency or non-agency based on multiple signals."""
|
||||||
|
signals = 0
|
||||||
|
|
||||||
|
company = (visitor.get("company_name") or visitor.get("company") or "").lower()
|
||||||
|
title = (visitor.get("job_title") or visitor.get("title") or "").lower()
|
||||||
|
industry = (visitor.get("industry") or "").lower()
|
||||||
|
size = visitor.get("company_size") or visitor.get("employees") or 0
|
||||||
|
if isinstance(size, str):
|
||||||
|
nums = re.findall(r'\d+', size)
|
||||||
|
size = int(nums[-1]) if nums else 0
|
||||||
|
|
||||||
|
for kw in AGENCY_KEYWORDS_COMPANY:
|
||||||
|
if kw in company:
|
||||||
|
signals += 1
|
||||||
|
break
|
||||||
|
|
||||||
|
for kw in AGENCY_KEYWORDS_TITLE:
|
||||||
|
if kw in title:
|
||||||
|
signals += 1
|
||||||
|
break
|
||||||
|
|
||||||
|
if industry in AGENCY_INDUSTRIES:
|
||||||
|
signals += 1
|
||||||
|
|
||||||
|
if size < 200 and ("marketing" in industry or "advertising" in industry):
|
||||||
|
signals += 1
|
||||||
|
|
||||||
|
# Require at least 2 signals to classify as agency
|
||||||
|
return signals >= 2
|
||||||
|
|
||||||
|
|
||||||
|
def detect_source_site(visitor):
|
||||||
|
"""Determine which of your sites the visitor came from.
|
||||||
|
|
||||||
|
Customize the domain checks for your own properties.
|
||||||
|
"""
|
||||||
|
pages = visitor.get("pages_visited") or visitor.get("page_views") or visitor.get("source_url") or ""
|
||||||
|
if isinstance(pages, list):
|
||||||
|
pages = " ".join(pages)
|
||||||
|
pages = pages.lower()
|
||||||
|
|
||||||
|
# Add your site domains here
|
||||||
|
# if "product-b.com" in pages:
|
||||||
|
# return "product-b.com"
|
||||||
|
# elif "product-a.com" in pages:
|
||||||
|
# return "product-a.com"
|
||||||
|
|
||||||
|
return os.environ.get("DEFAULT_SOURCE_SITE", "your-site.com")
|
||||||
|
|
||||||
|
|
||||||
|
def route_to_campaign(source_site, agency):
|
||||||
|
"""Determine the correct Instantly campaign based on source site + agency classification.
|
||||||
|
|
||||||
|
Customize campaign names to match your CAMPAIGNS_FILE config.
|
||||||
|
Returns a campaign name string that maps to a UUID in campaigns.json.
|
||||||
|
"""
|
||||||
|
# Example routing logic — customize for your campaigns:
|
||||||
|
if agency:
|
||||||
|
return os.environ.get("CAMPAIGN_AGENCY", "Agency-Default")
|
||||||
|
return os.environ.get("CAMPAIGN_GENERAL", "General-Default")
|
||||||
|
|
||||||
|
|
||||||
|
def get_seniority_rank(title):
|
||||||
|
"""Get seniority rank (lower = more senior). Returns 99 for unknown."""
|
||||||
|
title_lower = title.lower()
|
||||||
|
for keyword, rank in SENIORITY_ORDER.items():
|
||||||
|
if keyword in title_lower:
|
||||||
|
return rank
|
||||||
|
return 99
|
||||||
|
|
||||||
|
|
||||||
|
def ensure_campaign_active(campaign_name):
|
||||||
|
"""Check if campaign is active; if paused, activate it via Instantly API."""
|
||||||
|
campaign_id = CAMPAIGNS.get(campaign_name)
|
||||||
|
if not campaign_id or not INSTANTLY_API_KEY:
|
||||||
|
return
|
||||||
|
try:
|
||||||
|
check = subprocess.run(
|
||||||
|
["curl", "-s", f"https://api.instantly.ai/api/v2/campaigns/{campaign_id}",
|
||||||
|
"-H", f"Authorization: Bearer {INSTANTLY_API_KEY}"],
|
||||||
|
capture_output=True, text=True, timeout=10
|
||||||
|
)
|
||||||
|
data = json.loads(check.stdout)
|
||||||
|
status = data.get("status", 0)
|
||||||
|
if status != 1: # 1 = active
|
||||||
|
LOG.info(f" 🔄 Campaign {campaign_name} is paused, activating...")
|
||||||
|
subprocess.run(
|
||||||
|
["curl", "-s", "-X", "POST",
|
||||||
|
f"https://api.instantly.ai/api/v2/campaigns/{campaign_id}/activate",
|
||||||
|
"-H", f"Authorization: Bearer {INSTANTLY_API_KEY}",
|
||||||
|
"-H", "Content-Type: application/json",
|
||||||
|
"-d", "{}"],
|
||||||
|
capture_output=True, text=True, timeout=10
|
||||||
|
)
|
||||||
|
except Exception as e:
|
||||||
|
LOG.warning(f" ⚠️ Could not check/activate campaign {campaign_name}: {e}")
|
||||||
|
|
||||||
|
|
||||||
|
def add_to_instantly(visitor, campaign_name):
|
||||||
|
"""Add lead to Instantly campaign via API."""
|
||||||
|
campaign_id = CAMPAIGNS.get(campaign_name)
|
||||||
|
if not campaign_id:
|
||||||
|
LOG.error(f"Campaign not found in config: {campaign_name}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
if not INSTANTLY_API_KEY:
|
||||||
|
LOG.error("INSTANTLY_API_KEY not set")
|
||||||
|
return False
|
||||||
|
|
||||||
|
ensure_campaign_active(campaign_name)
|
||||||
|
|
||||||
|
email = visitor.get("email") or visitor.get("business_email")
|
||||||
|
first_name = visitor.get("first_name") or (
|
||||||
|
visitor.get("name", "").split()[0] if visitor.get("name") else "there"
|
||||||
|
)
|
||||||
|
company = visitor.get("company_name") or visitor.get("company") or ""
|
||||||
|
|
||||||
|
# Format page visited for personalization
|
||||||
|
pages = visitor.get("pages_visited") or visitor.get("page_views") or []
|
||||||
|
if isinstance(pages, str):
|
||||||
|
pages = [pages]
|
||||||
|
page_display = pages[0] if pages else ""
|
||||||
|
if "://" in page_display:
|
||||||
|
page_display = urlparse(page_display).path
|
||||||
|
|
||||||
|
lead_data = {
|
||||||
|
"campaign": campaign_id,
|
||||||
|
"email": email,
|
||||||
|
"first_name": first_name,
|
||||||
|
"last_name": visitor.get("last_name", ""),
|
||||||
|
"company_name": company,
|
||||||
|
"website": visitor.get("company_website") or visitor.get("website") or "",
|
||||||
|
"custom_variables": {
|
||||||
|
"companyName": company,
|
||||||
|
"firstName": first_name,
|
||||||
|
"title": visitor.get("job_title") or visitor.get("title") or "",
|
||||||
|
"industry": visitor.get("industry") or "",
|
||||||
|
"pageVisited": page_display,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
result = subprocess.run(
|
||||||
|
["curl", "-s", "-X", "POST", "https://api.instantly.ai/api/v2/leads",
|
||||||
|
"-H", f"Authorization: Bearer {INSTANTLY_API_KEY}",
|
||||||
|
"-H", "Content-Type: application/json",
|
||||||
|
"-d", json.dumps(lead_data)],
|
||||||
|
capture_output=True, text=True, timeout=15
|
||||||
|
)
|
||||||
|
|
||||||
|
try:
|
||||||
|
resp = json.loads(result.stdout)
|
||||||
|
if resp.get("email") or resp.get("id"):
|
||||||
|
LOG.info(f" ✅ Added to Instantly: {email} → {campaign_name}")
|
||||||
|
return True
|
||||||
|
else:
|
||||||
|
LOG.warning(f" ⚠️ Instantly response: {result.stdout[:200]}")
|
||||||
|
return False
|
||||||
|
except Exception:
|
||||||
|
LOG.error(f" ❌ Instantly error: {result.stdout[:200]}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def process_visitor(visitor, dry_run=False):
|
||||||
|
"""Full pipeline: score → suppress → classify → route → enroll."""
|
||||||
|
email = visitor.get("email") or visitor.get("business_email")
|
||||||
|
if not email:
|
||||||
|
return {"status": "skipped", "reason": "no email"}
|
||||||
|
|
||||||
|
company = visitor.get("company_name") or visitor.get("company") or ""
|
||||||
|
title = visitor.get("job_title") or visitor.get("title") or ""
|
||||||
|
domain = email.split("@")[1].lower() if "@" in email else ""
|
||||||
|
|
||||||
|
LOG.info(f"\n{'─'*50}")
|
||||||
|
LOG.info(f"Processing: {email} ({company}, {title})")
|
||||||
|
|
||||||
|
# 1. Intent scoring
|
||||||
|
pages = visitor.get("pages_visited") or visitor.get("page_views") or []
|
||||||
|
intent_score = score_intent(pages)
|
||||||
|
if intent_score < MIN_INTENT_SCORE:
|
||||||
|
LOG.info(f" ⏭️ Low intent: {intent_score} < {MIN_INTENT_SCORE}")
|
||||||
|
return {"status": "skipped", "reason": f"low intent ({intent_score})"}
|
||||||
|
|
||||||
|
# 2. Suppression pipeline
|
||||||
|
suppressed, layers = run_suppression_pipeline(email, company, domain)
|
||||||
|
if suppressed:
|
||||||
|
last_reason = layers[-1][2] if layers else "unknown"
|
||||||
|
LOG.info(f" 🚫 Suppressed: {last_reason}")
|
||||||
|
return {"status": "suppressed", "reason": last_reason}
|
||||||
|
|
||||||
|
# 3. Classify agency
|
||||||
|
agency = is_agency(visitor)
|
||||||
|
|
||||||
|
# 4. Detect source site
|
||||||
|
source_site = detect_source_site(visitor)
|
||||||
|
|
||||||
|
# 5. Route to campaign
|
||||||
|
campaign = route_to_campaign(source_site, agency)
|
||||||
|
|
||||||
|
LOG.info(f" 📍 Source: {source_site} | Agency: {agency} | Campaign: {campaign}")
|
||||||
|
LOG.info(f" 📊 Intent: {intent_score} | Seniority: {get_seniority_rank(title)}")
|
||||||
|
|
||||||
|
if dry_run:
|
||||||
|
return {
|
||||||
|
"status": "dry_run",
|
||||||
|
"email": email,
|
||||||
|
"campaign": campaign,
|
||||||
|
"intent_score": intent_score,
|
||||||
|
"agency": agency,
|
||||||
|
"source_site": source_site,
|
||||||
|
}
|
||||||
|
|
||||||
|
# 6. Add to Instantly
|
||||||
|
success = add_to_instantly(visitor, campaign)
|
||||||
|
|
||||||
|
if success:
|
||||||
|
record_enrollment(email, domain, campaign)
|
||||||
|
return {"status": "enrolled", "email": email, "campaign": campaign}
|
||||||
|
else:
|
||||||
|
return {"status": "failed", "email": email, "campaign": campaign}
|
||||||
|
|
||||||
|
|
||||||
|
# ─── Webhook Server ──────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
class WebhookHandler(BaseHTTPRequestHandler):
|
||||||
|
"""HTTP handler for RB2B webhook."""
|
||||||
|
dry_run = False
|
||||||
|
|
||||||
|
def do_POST(self):
|
||||||
|
length = int(self.headers.get('Content-Length', 0))
|
||||||
|
if length > 1_000_000:
|
||||||
|
self.send_response(413)
|
||||||
|
self.end_headers()
|
||||||
|
return
|
||||||
|
|
||||||
|
body = self.rfile.read(length)
|
||||||
|
try:
|
||||||
|
payload = json.loads(body)
|
||||||
|
except Exception:
|
||||||
|
self.send_response(400)
|
||||||
|
self.end_headers()
|
||||||
|
return
|
||||||
|
|
||||||
|
visitors = payload if isinstance(payload, list) else [payload]
|
||||||
|
results = [process_visitor(v, dry_run=self.dry_run) for v in visitors]
|
||||||
|
|
||||||
|
self.send_response(200)
|
||||||
|
self.send_header('Content-Type', 'application/json')
|
||||||
|
self.end_headers()
|
||||||
|
self.wfile.write(json.dumps({
|
||||||
|
"processed": len(results),
|
||||||
|
"enrolled": sum(1 for r in results if r["status"] == "enrolled"),
|
||||||
|
"suppressed": sum(1 for r in results if r["status"] == "suppressed"),
|
||||||
|
"skipped": sum(1 for r in results if r["status"] == "skipped"),
|
||||||
|
}).encode())
|
||||||
|
|
||||||
|
def do_GET(self):
|
||||||
|
self.send_response(200)
|
||||||
|
self.send_header('Content-Type', 'application/json')
|
||||||
|
self.end_headers()
|
||||||
|
self.wfile.write(json.dumps({"status": "ok", "service": "rb2b-instantly-router"}).encode())
|
||||||
|
|
||||||
|
def log_message(self, fmt, *args):
|
||||||
|
LOG.info(fmt % args)
|
||||||
|
|
||||||
|
|
||||||
|
# ─── CLI ─────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(description="RB2B → Instantly Router")
|
||||||
|
parser.add_argument("--serve", action="store_true", help="Run as HTTP webhook server")
|
||||||
|
parser.add_argument("--port", type=int, default=4100, help="Server port (default: 4100)")
|
||||||
|
parser.add_argument("--dry-run", action="store_true", help="Score and classify without enrolling")
|
||||||
|
parser.add_argument("-v", "--verbose", action="store_true")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
logging.basicConfig(
|
||||||
|
level=logging.DEBUG if args.verbose else logging.INFO,
|
||||||
|
format="%(asctime)s %(message)s", datefmt="%H:%M:%S",
|
||||||
|
)
|
||||||
|
|
||||||
|
if args.serve:
|
||||||
|
WebhookHandler.dry_run = args.dry_run
|
||||||
|
server = HTTPServer(("0.0.0.0", args.port), WebhookHandler)
|
||||||
|
LOG.info(f"🚀 RB2B → Instantly router on port {args.port} (dry_run={args.dry_run})")
|
||||||
|
try:
|
||||||
|
server.serve_forever()
|
||||||
|
except KeyboardInterrupt:
|
||||||
|
server.shutdown()
|
||||||
|
else:
|
||||||
|
payload = json.load(sys.stdin)
|
||||||
|
visitors = payload if isinstance(payload, list) else [payload]
|
||||||
|
for v in visitors:
|
||||||
|
result = process_visitor(v, dry_run=args.dry_run)
|
||||||
|
print(json.dumps(result, indent=2))
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
329
sales-pipeline/rb2b_suppression_pipeline.py
Normal file
329
sales-pipeline/rb2b_suppression_pipeline.py
Normal file
|
|
@ -0,0 +1,329 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
RB2B 5-Layer Suppression Pipeline
|
||||||
|
|
||||||
|
Checks a visitor against multiple suppression layers before enrolling in outbound campaigns.
|
||||||
|
Layers: CRM → Outbound Platform → Payment Provider → Product Analytics → Internal Blocklist
|
||||||
|
|
||||||
|
Prevents you from cold-emailing existing customers, active leads, competitors, or
|
||||||
|
people you already contacted recently.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
# Check a single email
|
||||||
|
python3 rb2b_suppression_pipeline.py --email john@acme.com --company "Acme Inc"
|
||||||
|
|
||||||
|
# Dry run (show what would happen)
|
||||||
|
python3 rb2b_suppression_pipeline.py --email john@acme.com --dry-run
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
import subprocess
|
||||||
|
import sys
|
||||||
|
from datetime import datetime, timezone, timedelta
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
LOG = logging.getLogger("rb2b-suppression")
|
||||||
|
|
||||||
|
# ─── Configuration ───────────────────────────────────────────────────────────
|
||||||
|
# Base directory — override with BASE_DIR env var or defaults to script parent
|
||||||
|
BASE_DIR = Path(os.environ.get("BASE_DIR", Path(__file__).resolve().parent))
|
||||||
|
DATA_DIR = BASE_DIR / "data"
|
||||||
|
|
||||||
|
# API keys loaded from environment
|
||||||
|
OUTBOUND_API_KEY = os.environ.get("INSTANTLY_API_KEY", "")
|
||||||
|
CRM_API_KEY = os.environ.get("HUBSPOT_API_KEY", "")
|
||||||
|
|
||||||
|
# File paths for local data caches
|
||||||
|
BLOCKLIST_FILE = DATA_DIR / "blocklist.json"
|
||||||
|
ENROLLED_FILE = DATA_DIR / "enrolled.json"
|
||||||
|
STRIPE_CACHE_FILE = DATA_DIR / "stripe-customers.json"
|
||||||
|
ACTIVE_USERS_CACHE_FILE = DATA_DIR / "active-users.json"
|
||||||
|
|
||||||
|
# ─── Competitor domains to auto-suppress ─────────────────────────────────────
|
||||||
|
# Add your competitors' email domains here
|
||||||
|
COMPETITOR_DOMAINS = {
|
||||||
|
# Example: "competitor1.com", "competitor2.com",
|
||||||
|
}
|
||||||
|
|
||||||
|
# ─── Personal email domains (skip — no business value) ──────────────────────
|
||||||
|
PERSONAL_DOMAINS = {
|
||||||
|
"gmail.com", "yahoo.com", "hotmail.com", "outlook.com",
|
||||||
|
"icloud.com", "protonmail.com", "aol.com", "live.com",
|
||||||
|
"me.com", "mail.com", "ymail.com",
|
||||||
|
}
|
||||||
|
|
||||||
|
# ─── Company dedup window (days) ────────────────────────────────────────────
|
||||||
|
# Only enroll 1 contact per company domain within this window
|
||||||
|
COMPANY_DEDUP_WINDOW_DAYS = int(os.environ.get("COMPANY_DEDUP_WINDOW_DAYS", "7"))
|
||||||
|
|
||||||
|
|
||||||
|
def _curl_json(method, url, headers=None, body=None):
|
||||||
|
"""Make HTTP request via curl, return parsed JSON."""
|
||||||
|
cmd = ["curl", "-s", "-X", method, url]
|
||||||
|
for k, v in (headers or {}).items():
|
||||||
|
cmd.extend(["-H", f"{k}: {v}"])
|
||||||
|
if body:
|
||||||
|
cmd.extend(["-d", json.dumps(body)])
|
||||||
|
try:
|
||||||
|
result = subprocess.run(cmd, capture_output=True, text=True, timeout=15)
|
||||||
|
return json.loads(result.stdout) if result.stdout.strip() else None
|
||||||
|
except Exception as e:
|
||||||
|
LOG.warning(f"API error: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
# ─── Layer 0: Personal Email Filter ─────────────────────────────────────────
|
||||||
|
|
||||||
|
def check_personal_email(email):
|
||||||
|
"""Filter personal email domains (gmail, yahoo, etc.)."""
|
||||||
|
domain = email.split("@")[1].lower() if "@" in email else ""
|
||||||
|
if domain in PERSONAL_DOMAINS:
|
||||||
|
return True, f"personal email domain: {domain}"
|
||||||
|
return False, "business email"
|
||||||
|
|
||||||
|
|
||||||
|
# ─── Layer 1: CRM Check (HubSpot) ───────────────────────────────────────────
|
||||||
|
|
||||||
|
def check_crm(email, domain=None):
|
||||||
|
"""Check if contact exists in your CRM. Uses HubSpot API."""
|
||||||
|
if not CRM_API_KEY:
|
||||||
|
LOG.warning("No CRM API key available, skipping CRM layer")
|
||||||
|
return False, "crm key unavailable (skipped)"
|
||||||
|
|
||||||
|
data = _curl_json("POST", "https://api.hubapi.com/crm/v3/objects/contacts/search",
|
||||||
|
headers={
|
||||||
|
"Authorization": f"Bearer {CRM_API_KEY}",
|
||||||
|
"Content-Type": "application/json",
|
||||||
|
},
|
||||||
|
body={
|
||||||
|
"filterGroups": [{
|
||||||
|
"filters": [{
|
||||||
|
"propertyName": "email",
|
||||||
|
"operator": "EQ",
|
||||||
|
"value": email,
|
||||||
|
}]
|
||||||
|
}],
|
||||||
|
"limit": 1,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
if data and data.get("total", 0) > 0:
|
||||||
|
return True, f"exists in CRM (contact ID: {data['results'][0].get('id')})"
|
||||||
|
return False, "not in CRM"
|
||||||
|
|
||||||
|
|
||||||
|
# ─── Layer 2: Outbound Platform Check (Instantly) ───────────────────────────
|
||||||
|
|
||||||
|
def check_outbound_platform(email):
|
||||||
|
"""Check if email is already in any outbound campaign (90-day window)."""
|
||||||
|
if not OUTBOUND_API_KEY:
|
||||||
|
LOG.warning("No outbound API key available, skipping outbound layer")
|
||||||
|
return False, "outbound key unavailable (skipped)"
|
||||||
|
|
||||||
|
data = _curl_json("GET",
|
||||||
|
f"https://api.instantly.ai/api/v2/leads?email={email}&limit=10",
|
||||||
|
headers={"Authorization": f"Bearer {OUTBOUND_API_KEY}"}
|
||||||
|
)
|
||||||
|
|
||||||
|
if data and isinstance(data, dict):
|
||||||
|
items = data.get("items", [])
|
||||||
|
if items:
|
||||||
|
cutoff = datetime.now(timezone.utc) - timedelta(days=90)
|
||||||
|
for lead in items:
|
||||||
|
created = lead.get("timestamp_created", "")
|
||||||
|
campaign = lead.get("campaign_name", "unknown")
|
||||||
|
try:
|
||||||
|
dt = datetime.fromisoformat(created.replace("Z", "+00:00"))
|
||||||
|
if dt > cutoff:
|
||||||
|
return True, f"active in outbound campaign: {campaign}"
|
||||||
|
except:
|
||||||
|
return True, f"exists in outbound (campaign: {campaign})"
|
||||||
|
|
||||||
|
return False, "not in outbound platform"
|
||||||
|
|
||||||
|
|
||||||
|
# ─── Layer 3: Payment Provider Check (Stripe) ───────────────────────────────
|
||||||
|
|
||||||
|
def check_payment_provider(email, domain=None):
|
||||||
|
"""Check if email/domain matches a paying customer. Uses cached Stripe data."""
|
||||||
|
if not STRIPE_CACHE_FILE.exists():
|
||||||
|
LOG.info("Payment provider cache not found, skipping layer")
|
||||||
|
return False, "payment check skipped (no cache)"
|
||||||
|
|
||||||
|
try:
|
||||||
|
customers = json.loads(STRIPE_CACHE_FILE.read_text())
|
||||||
|
emails = {c.get("email", "").lower() for c in customers}
|
||||||
|
domains = {c.get("email", "").split("@")[1].lower()
|
||||||
|
for c in customers if "@" in c.get("email", "")}
|
||||||
|
|
||||||
|
if email.lower() in emails:
|
||||||
|
return True, "paying customer (exact email match)"
|
||||||
|
if domain and domain.lower() in domains:
|
||||||
|
return True, f"paying customer (domain match: {domain})"
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
return False, "not a paying customer"
|
||||||
|
|
||||||
|
|
||||||
|
# ─── Layer 4: Product Analytics Check (Mixpanel/Amplitude) ──────────────────
|
||||||
|
|
||||||
|
def check_product_analytics(email):
|
||||||
|
"""Check if user has been active in product recently. Uses cached data."""
|
||||||
|
if not ACTIVE_USERS_CACHE_FILE.exists():
|
||||||
|
LOG.info("Product analytics cache not found, skipping layer")
|
||||||
|
return False, "product analytics check skipped (no cache)"
|
||||||
|
|
||||||
|
try:
|
||||||
|
users = json.loads(ACTIVE_USERS_CACHE_FILE.read_text())
|
||||||
|
active_emails = {u.get("email", "").lower() for u in users}
|
||||||
|
if email.lower() in active_emails:
|
||||||
|
return True, "active product user (last 30 days)"
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
return False, "not an active product user"
|
||||||
|
|
||||||
|
|
||||||
|
# ─── Layer 5: Blocklist (competitors + manual) ──────────────────────────────
|
||||||
|
|
||||||
|
def check_blocklist(email, domain=None):
|
||||||
|
"""Check against competitor domains and manual blocklist."""
|
||||||
|
email_domain = email.split("@")[1].lower() if "@" in email else ""
|
||||||
|
if email_domain in COMPETITOR_DOMAINS:
|
||||||
|
return True, f"competitor domain: {email_domain}"
|
||||||
|
|
||||||
|
if BLOCKLIST_FILE.exists():
|
||||||
|
try:
|
||||||
|
blocklist = json.loads(BLOCKLIST_FILE.read_text())
|
||||||
|
blocked_emails = {e.lower() for e in blocklist.get("emails", [])}
|
||||||
|
blocked_domains = {d.lower() for d in blocklist.get("domains", [])}
|
||||||
|
|
||||||
|
if email.lower() in blocked_emails:
|
||||||
|
return True, "manually blocklisted (email)"
|
||||||
|
if email_domain in blocked_domains:
|
||||||
|
return True, f"manually blocklisted (domain: {email_domain})"
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
return False, "not blocklisted"
|
||||||
|
|
||||||
|
|
||||||
|
# ─── Company-Level Deduplication ─────────────────────────────────────────────
|
||||||
|
|
||||||
|
def check_company_dedup(email, company_domain, window_days=None):
|
||||||
|
"""Only allow 1 contact per company domain within a rolling window."""
|
||||||
|
window_days = window_days or COMPANY_DEDUP_WINDOW_DAYS
|
||||||
|
if not ENROLLED_FILE.exists():
|
||||||
|
return False, "no prior enrollments"
|
||||||
|
|
||||||
|
try:
|
||||||
|
enrolled = json.loads(ENROLLED_FILE.read_text())
|
||||||
|
cutoff = (datetime.now(timezone.utc) - timedelta(days=window_days)).isoformat()
|
||||||
|
|
||||||
|
for entry in enrolled:
|
||||||
|
if (entry.get("domain") == company_domain and
|
||||||
|
entry.get("enrolled_at", "") > cutoff and
|
||||||
|
entry.get("email") != email):
|
||||||
|
return True, (f"company already enrolled: {entry.get('email')} "
|
||||||
|
f"on {entry.get('enrolled_at', '')[:10]}")
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
return False, "no company dedup conflict"
|
||||||
|
|
||||||
|
|
||||||
|
# ─── Pipeline Orchestrator ───────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def run_suppression_pipeline(email, company=None, domain=None, dry_run=False):
|
||||||
|
"""Run all suppression layers in sequence.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
(should_suppress: bool, results: list of (layer_name, suppressed, reason))
|
||||||
|
"""
|
||||||
|
if not domain and "@" in email:
|
||||||
|
domain = email.split("@")[1].lower()
|
||||||
|
|
||||||
|
results = []
|
||||||
|
|
||||||
|
layers = [
|
||||||
|
("Personal Email Filter", lambda: check_personal_email(email)),
|
||||||
|
("CRM Check", lambda: check_crm(email, domain)),
|
||||||
|
("Outbound Platform", lambda: check_outbound_platform(email)),
|
||||||
|
("Payment Provider", lambda: check_payment_provider(email, domain)),
|
||||||
|
("Product Analytics", lambda: check_product_analytics(email)),
|
||||||
|
("Blocklist", lambda: check_blocklist(email, domain)),
|
||||||
|
("Company Dedup", lambda: check_company_dedup(email, domain)),
|
||||||
|
]
|
||||||
|
|
||||||
|
for layer_name, check_fn in layers:
|
||||||
|
suppressed, reason = check_fn()
|
||||||
|
results.append((layer_name, suppressed, reason))
|
||||||
|
if suppressed:
|
||||||
|
return True, results
|
||||||
|
|
||||||
|
return False, results
|
||||||
|
|
||||||
|
|
||||||
|
def record_enrollment(email, domain, campaign):
|
||||||
|
"""Record an enrollment for company-level dedup tracking."""
|
||||||
|
try:
|
||||||
|
enrolled = json.loads(ENROLLED_FILE.read_text()) if ENROLLED_FILE.exists() else []
|
||||||
|
except Exception:
|
||||||
|
enrolled = []
|
||||||
|
|
||||||
|
enrolled.append({
|
||||||
|
"email": email,
|
||||||
|
"domain": domain,
|
||||||
|
"campaign": campaign,
|
||||||
|
"enrolled_at": datetime.now(timezone.utc).isoformat(),
|
||||||
|
})
|
||||||
|
|
||||||
|
# Keep only last 90 days
|
||||||
|
cutoff = (datetime.now(timezone.utc) - timedelta(days=90)).isoformat()
|
||||||
|
enrolled = [e for e in enrolled if e.get("enrolled_at", "") > cutoff]
|
||||||
|
|
||||||
|
ENROLLED_FILE.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
ENROLLED_FILE.write_text(json.dumps(enrolled, indent=2))
|
||||||
|
|
||||||
|
|
||||||
|
# ─── CLI ─────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(description="RB2B Suppression Pipeline")
|
||||||
|
parser.add_argument("--email", required=True)
|
||||||
|
parser.add_argument("--company", default="")
|
||||||
|
parser.add_argument("--domain", default="")
|
||||||
|
parser.add_argument("--dry-run", action="store_true")
|
||||||
|
parser.add_argument("--verbose", "-v", action="store_true")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
logging.basicConfig(
|
||||||
|
level=logging.DEBUG if args.verbose else logging.INFO,
|
||||||
|
format="%(message)s",
|
||||||
|
)
|
||||||
|
|
||||||
|
suppressed, results = run_suppression_pipeline(
|
||||||
|
args.email, args.company, args.domain, args.dry_run
|
||||||
|
)
|
||||||
|
|
||||||
|
print(f"\n📋 Suppression check for: {args.email}")
|
||||||
|
print(f"{'─'*50}")
|
||||||
|
for layer_name, was_suppressed, reason in results:
|
||||||
|
icon = "🚫" if was_suppressed else "✅"
|
||||||
|
print(f" {icon} {layer_name}: {reason}")
|
||||||
|
|
||||||
|
print(f"{'─'*50}")
|
||||||
|
if suppressed:
|
||||||
|
print(f" 🚫 SUPPRESSED — do not enroll")
|
||||||
|
else:
|
||||||
|
print(f" ✅ CLEAR — eligible for enrollment")
|
||||||
|
|
||||||
|
return 0 if not suppressed else 1
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
sys.exit(main())
|
||||||
419
sales-pipeline/rb2b_webhook_ingest.py
Normal file
419
sales-pipeline/rb2b_webhook_ingest.py
Normal file
|
|
@ -0,0 +1,419 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
RB2B Webhook Ingestion Server
|
||||||
|
|
||||||
|
Receives RB2B webhook payloads (via Zapier/Make or direct integration),
|
||||||
|
scores visitor intent based on pages visited, checks ICP fit, and outputs
|
||||||
|
structured signals for downstream processing.
|
||||||
|
|
||||||
|
Can run as:
|
||||||
|
1. HTTP webhook server (direct RB2B integration)
|
||||||
|
2. Stdin processor (for testing / batch processing)
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
# Process a single webhook payload from stdin
|
||||||
|
echo '{"email":"john@acme.com",...}' | python3 rb2b_webhook_ingest.py
|
||||||
|
|
||||||
|
# Process a batch file (one JSON per line)
|
||||||
|
python3 rb2b_webhook_ingest.py --batch webhooks.jsonl
|
||||||
|
|
||||||
|
# Run as HTTP webhook server
|
||||||
|
python3 rb2b_webhook_ingest.py --serve --port 4100
|
||||||
|
|
||||||
|
# Dry run (show scoring without side effects)
|
||||||
|
python3 rb2b_webhook_ingest.py --dry-run < payload.json
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import sys
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
from http.server import HTTPServer, BaseHTTPRequestHandler
|
||||||
|
from pathlib import Path
|
||||||
|
from urllib.parse import urlparse
|
||||||
|
|
||||||
|
# ─── Configuration ───────────────────────────────────────────────────────────
|
||||||
|
LOG = logging.getLogger("rb2b-ingest")
|
||||||
|
BASE_DIR = Path(os.environ.get("BASE_DIR", Path(__file__).resolve().parent))
|
||||||
|
OUTPUT_DIR = BASE_DIR / "data" / "signals"
|
||||||
|
|
||||||
|
# ─── Intent Scoring ─────────────────────────────────────────────────────────
|
||||||
|
# Maps URL path patterns to intent scores (0-100).
|
||||||
|
# Higher score = stronger purchase intent.
|
||||||
|
# Customize these for your site structure.
|
||||||
|
PAGE_INTENT_SCORES = {
|
||||||
|
# Hot pages — active buying signals
|
||||||
|
"pricing": 90,
|
||||||
|
"plans": 90,
|
||||||
|
"contact": 85,
|
||||||
|
"demo": 85,
|
||||||
|
"request-demo": 85,
|
||||||
|
"book-a-call": 85,
|
||||||
|
"get-started": 85,
|
||||||
|
"free-consultation": 85,
|
||||||
|
"proposal": 80,
|
||||||
|
"quote": 80,
|
||||||
|
|
||||||
|
# Warm pages — research/evaluation
|
||||||
|
"case-study": 70,
|
||||||
|
"case-studies": 70,
|
||||||
|
"results": 70,
|
||||||
|
"testimonials": 65,
|
||||||
|
"about": 60,
|
||||||
|
"team": 55,
|
||||||
|
"services": 65,
|
||||||
|
"solutions": 65,
|
||||||
|
|
||||||
|
# Service pages — customize for your offerings
|
||||||
|
# "your-service-1": 75,
|
||||||
|
# "your-service-2": 75,
|
||||||
|
|
||||||
|
# Cool pages — awareness/education
|
||||||
|
"blog": 30,
|
||||||
|
"podcast": 25,
|
||||||
|
"webinar": 40,
|
||||||
|
"resource": 35,
|
||||||
|
"guide": 35,
|
||||||
|
"ebook": 40,
|
||||||
|
}
|
||||||
|
|
||||||
|
# Minimum intent score to process (skip pure blog readers)
|
||||||
|
MIN_INTENT_SCORE = int(os.environ.get("MIN_INTENT_SCORE", "50"))
|
||||||
|
|
||||||
|
# ─── ICP Filters ────────────────────────────────────────────────────────────
|
||||||
|
# Title keywords that indicate decision-maker seniority
|
||||||
|
ICP_SENIORITY_KEYWORDS = [
|
||||||
|
"cmo", "vp", "vice president", "director", "head of", "chief",
|
||||||
|
"svp", "evp", "founder", "ceo", "coo", "cto", "partner",
|
||||||
|
"senior director", "managing director", "president",
|
||||||
|
]
|
||||||
|
|
||||||
|
# Minimum company size (employees) for ICP match
|
||||||
|
ICP_MIN_COMPANY_SIZE = int(os.environ.get("ICP_MIN_COMPANY_SIZE", "50"))
|
||||||
|
|
||||||
|
|
||||||
|
def score_pages(pages_visited):
|
||||||
|
"""Score visitor intent based on pages they viewed.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
pages_visited: list of URL strings or page paths
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
tuple: (max_score, hot_pages list, page_summary string)
|
||||||
|
"""
|
||||||
|
if not pages_visited:
|
||||||
|
return 0, [], "no pages tracked"
|
||||||
|
|
||||||
|
scores = []
|
||||||
|
hot_pages = []
|
||||||
|
|
||||||
|
for page_url in pages_visited:
|
||||||
|
try:
|
||||||
|
path = urlparse(page_url).path if "://" in page_url else page_url
|
||||||
|
except Exception:
|
||||||
|
path = page_url
|
||||||
|
path = path.lower().strip("/")
|
||||||
|
|
||||||
|
best_score = 20 # default for unknown pages
|
||||||
|
matched_pattern = None
|
||||||
|
|
||||||
|
for pattern, score in PAGE_INTENT_SCORES.items():
|
||||||
|
if pattern in path:
|
||||||
|
if score > best_score:
|
||||||
|
best_score = score
|
||||||
|
matched_pattern = pattern
|
||||||
|
|
||||||
|
scores.append(best_score)
|
||||||
|
if best_score >= 65:
|
||||||
|
hot_pages.append({
|
||||||
|
"page": path or "/",
|
||||||
|
"score": best_score,
|
||||||
|
"pattern": matched_pattern or "unknown",
|
||||||
|
})
|
||||||
|
|
||||||
|
max_score = max(scores) if scores else 0
|
||||||
|
page_count = len(pages_visited)
|
||||||
|
summary = f"{page_count} pages, max intent {max_score}"
|
||||||
|
if hot_pages:
|
||||||
|
summary += f", hot: {', '.join(p['pattern'] for p in hot_pages[:3])}"
|
||||||
|
|
||||||
|
return max_score, hot_pages, summary
|
||||||
|
|
||||||
|
|
||||||
|
def check_icp_match(visitor):
|
||||||
|
"""Check if visitor matches ICP criteria.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
tuple: (is_match: bool, reason: str)
|
||||||
|
"""
|
||||||
|
title = (visitor.get("job_title") or visitor.get("title") or "").lower()
|
||||||
|
company_size = visitor.get("company_size") or visitor.get("employees") or 0
|
||||||
|
|
||||||
|
if isinstance(company_size, str):
|
||||||
|
nums = re.findall(r'\d+', company_size)
|
||||||
|
company_size = int(nums[-1]) if nums else 0
|
||||||
|
|
||||||
|
seniority_match = any(kw in title for kw in ICP_SENIORITY_KEYWORDS)
|
||||||
|
size_match = company_size >= ICP_MIN_COMPANY_SIZE
|
||||||
|
|
||||||
|
if seniority_match and size_match:
|
||||||
|
return True, f"ICP match: {title}, {company_size}+ employees"
|
||||||
|
elif seniority_match:
|
||||||
|
return True, f"seniority match: {title} (company size unknown/small)"
|
||||||
|
elif size_match:
|
||||||
|
return False, f"size match but low seniority: {title}"
|
||||||
|
else:
|
||||||
|
return False, f"no ICP match: {title}, ~{company_size} employees"
|
||||||
|
|
||||||
|
|
||||||
|
def extract_domain(visitor):
|
||||||
|
"""Extract company domain from visitor data."""
|
||||||
|
domain = visitor.get("company_domain") or visitor.get("domain") or ""
|
||||||
|
if domain:
|
||||||
|
return domain.lower().replace("www.", "")
|
||||||
|
|
||||||
|
email = visitor.get("email") or visitor.get("business_email") or ""
|
||||||
|
if email and "@" in email:
|
||||||
|
domain = email.split("@")[1].lower()
|
||||||
|
generic = {"gmail.com", "yahoo.com", "hotmail.com", "outlook.com", "aol.com"}
|
||||||
|
if domain not in generic:
|
||||||
|
return domain
|
||||||
|
|
||||||
|
website = visitor.get("company_website") or visitor.get("website") or ""
|
||||||
|
if website:
|
||||||
|
try:
|
||||||
|
parsed = urlparse(website if "://" in website else f"https://{website}")
|
||||||
|
return parsed.netloc.lower().replace("www.", "")
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def process_visitor(visitor, dry_run=False, source_site="your-site.com"):
|
||||||
|
"""Process a single RB2B visitor webhook payload.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
visitor: dict with RB2B webhook data
|
||||||
|
dry_run: if True, don't write output files
|
||||||
|
source_site: which site the visitor came from
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
dict with processing result
|
||||||
|
"""
|
||||||
|
# Basic input validation
|
||||||
|
if not isinstance(visitor, dict):
|
||||||
|
return {"status": "error", "reason": "invalid payload type"}
|
||||||
|
|
||||||
|
# Extract key fields
|
||||||
|
name = visitor.get("name") or visitor.get("full_name") or "Unknown"
|
||||||
|
first_name = visitor.get("first_name") or (name.split()[0] if name != "Unknown" else "there")
|
||||||
|
email = visitor.get("email") or visitor.get("business_email")
|
||||||
|
title = visitor.get("job_title") or visitor.get("title") or "Unknown role"
|
||||||
|
company = visitor.get("company_name") or visitor.get("company") or "Unknown company"
|
||||||
|
linkedin = visitor.get("linkedin_url") or visitor.get("linkedin_profile") or ""
|
||||||
|
pages = visitor.get("pages_visited") or visitor.get("page_views") or visitor.get("pages") or []
|
||||||
|
|
||||||
|
if isinstance(pages, str):
|
||||||
|
pages = [pages]
|
||||||
|
|
||||||
|
domain = extract_domain(visitor)
|
||||||
|
|
||||||
|
# Score intent
|
||||||
|
intent_score, hot_pages, page_summary = score_pages(pages)
|
||||||
|
|
||||||
|
# Check ICP
|
||||||
|
is_icp, icp_reason = check_icp_match(visitor)
|
||||||
|
|
||||||
|
# Determine priority
|
||||||
|
if intent_score >= 80 and is_icp:
|
||||||
|
priority = "high"
|
||||||
|
elif intent_score >= 60 or is_icp:
|
||||||
|
priority = "medium"
|
||||||
|
else:
|
||||||
|
priority = "low"
|
||||||
|
|
||||||
|
result = {
|
||||||
|
"name": name,
|
||||||
|
"email": email,
|
||||||
|
"title": title,
|
||||||
|
"company": company,
|
||||||
|
"domain": domain,
|
||||||
|
"intent_score": intent_score,
|
||||||
|
"is_icp": is_icp,
|
||||||
|
"icp_reason": icp_reason,
|
||||||
|
"priority": priority,
|
||||||
|
"page_summary": page_summary,
|
||||||
|
"source_site": source_site,
|
||||||
|
}
|
||||||
|
|
||||||
|
# Skip low-intent visitors
|
||||||
|
if intent_score < MIN_INTENT_SCORE and not is_icp:
|
||||||
|
result["status"] = "skipped"
|
||||||
|
result["reason"] = f"below threshold: intent {intent_score} < {MIN_INTENT_SCORE}, not ICP"
|
||||||
|
LOG.info(f"⏭️ Skipped {name} ({company}): {result['reason']}")
|
||||||
|
return result
|
||||||
|
|
||||||
|
# Build structured signal output
|
||||||
|
hot_page_str = ""
|
||||||
|
if hot_pages:
|
||||||
|
hot_page_str = f" (viewed: {', '.join(p['pattern'] for p in hot_pages[:2])})"
|
||||||
|
|
||||||
|
signal = {
|
||||||
|
"type": "site_visit",
|
||||||
|
"topic": f"Website visitor: {name}, {title} at {company}{hot_page_str} — {source_site}",
|
||||||
|
"priority": priority,
|
||||||
|
"domain": domain,
|
||||||
|
"data": {
|
||||||
|
"name": name,
|
||||||
|
"first_name": first_name,
|
||||||
|
"email": email,
|
||||||
|
"title": title,
|
||||||
|
"company": company,
|
||||||
|
"linkedin": linkedin,
|
||||||
|
"pages_visited": pages,
|
||||||
|
"intent_score": intent_score,
|
||||||
|
"hot_pages": hot_pages,
|
||||||
|
"is_icp": is_icp,
|
||||||
|
"icp_reason": icp_reason,
|
||||||
|
"source_site": source_site,
|
||||||
|
},
|
||||||
|
"created_at": datetime.now(timezone.utc).isoformat(),
|
||||||
|
}
|
||||||
|
|
||||||
|
if dry_run:
|
||||||
|
result["status"] = "dry_run"
|
||||||
|
result["signal"] = signal
|
||||||
|
LOG.info(f"🔍 [DRY RUN] Would create signal: {signal['topic']}")
|
||||||
|
else:
|
||||||
|
# Write signal to output directory as JSON
|
||||||
|
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
|
||||||
|
ts = datetime.now(timezone.utc).strftime("%Y%m%d_%H%M%S")
|
||||||
|
safe_email = (email or "unknown").replace("@", "_at_")
|
||||||
|
signal_file = OUTPUT_DIR / f"signal_{ts}_{safe_email}.json"
|
||||||
|
signal_file.write_text(json.dumps(signal, indent=2))
|
||||||
|
result["status"] = "signal_created"
|
||||||
|
result["signal_file"] = str(signal_file)
|
||||||
|
|
||||||
|
LOG.info(
|
||||||
|
f"{'✅' if result['status'] == 'signal_created' else '📋'} "
|
||||||
|
f"{name} ({company}) — intent:{intent_score} icp:{is_icp} → {result['status']}"
|
||||||
|
)
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
# ─── Webhook Server ──────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
class RB2BWebhookHandler(BaseHTTPRequestHandler):
|
||||||
|
"""HTTP handler for direct RB2B webhook integration."""
|
||||||
|
|
||||||
|
dry_run = False
|
||||||
|
source_site = "your-site.com"
|
||||||
|
|
||||||
|
def do_POST(self):
|
||||||
|
content_length = int(self.headers.get('Content-Length', 0))
|
||||||
|
if content_length > 1_000_000: # 1MB limit
|
||||||
|
self.send_response(413)
|
||||||
|
self.end_headers()
|
||||||
|
return
|
||||||
|
|
||||||
|
body = self.rfile.read(content_length)
|
||||||
|
try:
|
||||||
|
payload = json.loads(body)
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
self.send_response(400)
|
||||||
|
self.end_headers()
|
||||||
|
self.wfile.write(b'{"error":"invalid json"}')
|
||||||
|
return
|
||||||
|
|
||||||
|
visitors = payload if isinstance(payload, list) else [payload]
|
||||||
|
results = [process_visitor(v, dry_run=self.dry_run, source_site=self.source_site)
|
||||||
|
for v in visitors]
|
||||||
|
|
||||||
|
self.send_response(200)
|
||||||
|
self.send_header('Content-Type', 'application/json')
|
||||||
|
self.end_headers()
|
||||||
|
self.wfile.write(json.dumps({
|
||||||
|
"processed": len(results),
|
||||||
|
"signals_created": sum(1 for r in results if r.get("status") == "signal_created"),
|
||||||
|
"skipped": sum(1 for r in results if r.get("status") == "skipped"),
|
||||||
|
}).encode())
|
||||||
|
|
||||||
|
def do_GET(self):
|
||||||
|
"""Health check endpoint."""
|
||||||
|
self.send_response(200)
|
||||||
|
self.send_header('Content-Type', 'application/json')
|
||||||
|
self.end_headers()
|
||||||
|
self.wfile.write(json.dumps({"status": "ok", "service": "rb2b-webhook-ingest"}).encode())
|
||||||
|
|
||||||
|
def log_message(self, format, *args):
|
||||||
|
LOG.info(format % args)
|
||||||
|
|
||||||
|
|
||||||
|
# ─── CLI ─────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(description="RB2B Webhook → Signal Pipeline")
|
||||||
|
parser.add_argument("--batch", help="Process batch file (one JSON per line)")
|
||||||
|
parser.add_argument("--serve", action="store_true", help="Run as HTTP webhook server")
|
||||||
|
parser.add_argument("--port", type=int, default=4100, help="Server port (default: 4100)")
|
||||||
|
parser.add_argument("--dry-run", action="store_true", help="Don't write signal files")
|
||||||
|
parser.add_argument("--source-site", default="your-site.com", help="Source site name")
|
||||||
|
parser.add_argument("--verbose", "-v", action="store_true")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
logging.basicConfig(
|
||||||
|
level=logging.DEBUG if args.verbose else logging.INFO,
|
||||||
|
format="%(asctime)s [%(levelname)s] %(message)s",
|
||||||
|
datefmt="%Y-%m-%d %H:%M:%S",
|
||||||
|
)
|
||||||
|
|
||||||
|
if args.serve:
|
||||||
|
RB2BWebhookHandler.dry_run = args.dry_run
|
||||||
|
RB2BWebhookHandler.source_site = args.source_site
|
||||||
|
server = HTTPServer(("0.0.0.0", args.port), RB2BWebhookHandler)
|
||||||
|
LOG.info(f"🚀 RB2B webhook server listening on port {args.port}")
|
||||||
|
LOG.info(f" POST http://localhost:{args.port}/ to ingest visitors")
|
||||||
|
LOG.info(f" Dry run: {args.dry_run}")
|
||||||
|
try:
|
||||||
|
server.serve_forever()
|
||||||
|
except KeyboardInterrupt:
|
||||||
|
LOG.info("Shutting down...")
|
||||||
|
server.shutdown()
|
||||||
|
|
||||||
|
elif args.batch:
|
||||||
|
results = []
|
||||||
|
with open(args.batch) as f:
|
||||||
|
for line in f:
|
||||||
|
line = line.strip()
|
||||||
|
if not line:
|
||||||
|
continue
|
||||||
|
try:
|
||||||
|
visitor = json.loads(line)
|
||||||
|
result = process_visitor(visitor, dry_run=args.dry_run, source_site=args.source_site)
|
||||||
|
results.append(result)
|
||||||
|
except json.JSONDecodeError as e:
|
||||||
|
LOG.error(f"Invalid JSON line: {e}")
|
||||||
|
|
||||||
|
created = sum(1 for r in results if r.get("status") == "signal_created")
|
||||||
|
skipped = sum(1 for r in results if r.get("status") == "skipped")
|
||||||
|
print(f"\n📊 Batch complete: {len(results)} processed, {created} signals, {skipped} skipped")
|
||||||
|
|
||||||
|
else:
|
||||||
|
try:
|
||||||
|
payload = json.load(sys.stdin)
|
||||||
|
except json.JSONDecodeError as e:
|
||||||
|
LOG.error(f"Invalid JSON on stdin: {e}")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
visitors = payload if isinstance(payload, list) else [payload]
|
||||||
|
for visitor in visitors:
|
||||||
|
result = process_visitor(visitor, dry_run=args.dry_run, source_site=args.source_site)
|
||||||
|
print(json.dumps(result, indent=2))
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
9
sales-pipeline/requirements.txt
Normal file
9
sales-pipeline/requirements.txt
Normal file
|
|
@ -0,0 +1,9 @@
|
||||||
|
# Core dependencies
|
||||||
|
requests>=2.28.0
|
||||||
|
|
||||||
|
# For ICP Learning Analyzer (PostgreSQL connection)
|
||||||
|
# Only needed if using icp_learning_analyzer.py
|
||||||
|
psycopg2-binary>=2.9.0
|
||||||
|
|
||||||
|
# Optional: python-dotenv for loading .env files
|
||||||
|
python-dotenv>=1.0.0
|
||||||
410
sales-pipeline/trigger_prospector.py
Normal file
410
sales-pipeline/trigger_prospector.py
Normal file
|
|
@ -0,0 +1,410 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Trigger-Based Prospecting Engine
|
||||||
|
|
||||||
|
Monitors job postings, new hires, and funding signals to identify
|
||||||
|
companies where new marketing leaders are evaluating agency/vendor relationships.
|
||||||
|
|
||||||
|
Searches across multiple signal categories:
|
||||||
|
- New CMO/VP Marketing hires (leadership change = budget reallocation)
|
||||||
|
- Marketing leadership job postings (team building = growth mode)
|
||||||
|
- Agency search signals (active evaluation)
|
||||||
|
- Funding rounds (capital to deploy on growth)
|
||||||
|
|
||||||
|
Each signal is scored, enriched with industry/size estimates, and paired
|
||||||
|
with a personalized outreach hook and email draft.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python3 trigger_prospector.py --days 7 --top 15 --min-score 50
|
||||||
|
|
||||||
|
Requires: BRAVE_API_KEY environment variable
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import random
|
||||||
|
import re
|
||||||
|
import sys
|
||||||
|
from datetime import datetime, timedelta
|
||||||
|
from pathlib import Path
|
||||||
|
from urllib.parse import urlencode
|
||||||
|
from urllib.request import Request, urlopen
|
||||||
|
|
||||||
|
# ─── Configuration ───────────────────────────────────────────────────────────
|
||||||
|
BASE_DIR = Path(os.environ.get("BASE_DIR", Path(__file__).resolve().parent))
|
||||||
|
DATA_DIR = BASE_DIR / "data"
|
||||||
|
OUTPUT_FILE = DATA_DIR / "trigger-prospects-latest.json"
|
||||||
|
|
||||||
|
BRAVE_SEARCH_URL = "https://api.search.brave.com/res/v1/web/search"
|
||||||
|
|
||||||
|
# Your company info (for email templates)
|
||||||
|
YOUR_COMPANY_NAME = os.environ.get("YOUR_COMPANY_NAME", "Your Company")
|
||||||
|
YOUR_SENDER_NAME = os.environ.get("YOUR_SENDER_NAME", "Your Name")
|
||||||
|
|
||||||
|
# ─── Signal Search Queries ───────────────────────────────────────────────────
|
||||||
|
# Customize these queries for your target market.
|
||||||
|
# Each category maps to a list of search queries that detect buying signals.
|
||||||
|
SEARCH_QUERIES = {
|
||||||
|
"new_hire": [
|
||||||
|
'"hired head of marketing"',
|
||||||
|
'"new CMO" announced',
|
||||||
|
'"VP marketing joined"',
|
||||||
|
'"head of growth" hired',
|
||||||
|
'"VP of marketing" appointed',
|
||||||
|
'"chief marketing officer" joins',
|
||||||
|
],
|
||||||
|
"job_posting": [
|
||||||
|
'"head of marketing" job posting site:linkedin.com',
|
||||||
|
'"VP of marketing" hiring site:linkedin.com',
|
||||||
|
'"CMO" open role site:linkedin.com',
|
||||||
|
],
|
||||||
|
"agency_search": [
|
||||||
|
'"looking for marketing agency"',
|
||||||
|
'"looking for agency" marketing',
|
||||||
|
'"seeking marketing partner"',
|
||||||
|
'"RFP" "marketing agency"',
|
||||||
|
],
|
||||||
|
"funding": [
|
||||||
|
'"series A" raised marketing',
|
||||||
|
'"series B" raised marketing',
|
||||||
|
'"raised" million marketing growth',
|
||||||
|
'"funding round" marketing scale',
|
||||||
|
],
|
||||||
|
}
|
||||||
|
|
||||||
|
# ─── Service Keyword Mapping ────────────────────────────────────────────────
|
||||||
|
# Maps your service offerings to keywords found in signal text.
|
||||||
|
# Used to suggest which services to pitch to each prospect.
|
||||||
|
SERVICE_KEYWORDS = {
|
||||||
|
"SEO": ["seo", "organic", "search engine", "content marketing", "blog", "rankings"],
|
||||||
|
"Paid Media": ["paid", "ppc", "ads", "advertising", "google ads", "facebook ads",
|
||||||
|
"media buy", "paid social", "paid search"],
|
||||||
|
"Creative": ["creative", "brand", "design", "video", "content", "storytelling"],
|
||||||
|
"CRO": ["conversion", "cro", "optimization", "landing page", "funnel", "a/b test"],
|
||||||
|
"AI Marketing": ["ai", "artificial intelligence", "machine learning", "automation",
|
||||||
|
"personalization"],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def get_brave_api_key():
|
||||||
|
"""Get Brave Search API key from environment."""
|
||||||
|
key = os.environ.get("BRAVE_API_KEY")
|
||||||
|
if not key:
|
||||||
|
print("❌ BRAVE_API_KEY not set.", file=sys.stderr)
|
||||||
|
print(" Get one at: https://api.search.brave.com/", file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
return key
|
||||||
|
|
||||||
|
|
||||||
|
def brave_search(query: str, api_key: str, freshness: str = "pw", count: int = 10) -> list:
|
||||||
|
"""Search Brave and return results list."""
|
||||||
|
params = urlencode({"q": query, "count": count, "freshness": freshness})
|
||||||
|
url = f"{BRAVE_SEARCH_URL}?{params}"
|
||||||
|
req = Request(url, headers={
|
||||||
|
"Accept": "application/json",
|
||||||
|
"Accept-Encoding": "identity",
|
||||||
|
"X-Subscription-Token": api_key,
|
||||||
|
})
|
||||||
|
try:
|
||||||
|
with urlopen(req, timeout=15) as resp:
|
||||||
|
data = json.loads(resp.read().decode())
|
||||||
|
return data.get("web", {}).get("results", [])
|
||||||
|
except Exception as e:
|
||||||
|
print(f" Warning: Search failed for '{query[:50]}...': {e}", file=sys.stderr)
|
||||||
|
return []
|
||||||
|
|
||||||
|
|
||||||
|
def freshness_for_days(days: int) -> str:
|
||||||
|
"""Map day count to Brave freshness parameter."""
|
||||||
|
if days <= 1:
|
||||||
|
return "pd"
|
||||||
|
elif days <= 7:
|
||||||
|
return "pw"
|
||||||
|
elif days <= 30:
|
||||||
|
return "pm"
|
||||||
|
return "py"
|
||||||
|
|
||||||
|
|
||||||
|
def extract_company_name(title: str, description: str) -> str:
|
||||||
|
"""Best-effort company name extraction from search result text."""
|
||||||
|
patterns = [
|
||||||
|
r"(?:at|joins?|hired by|appointed at|named .* at)\s+([A-Z][A-Za-z0-9&\.\- ]{1,40}?)"
|
||||||
|
r"(?:\s+as|\s*[,\.\-\|]|\s+to\b)",
|
||||||
|
r"([A-Z][A-Za-z0-9&\.\- ]{1,40}?)\s+(?:hires?|appoints?|names?|announces?|welcomes?)\b",
|
||||||
|
r"([A-Z][A-Za-z0-9&\.\- ]{1,40}?)\s+(?:raises?|secures?|closes?)\s+\$",
|
||||||
|
r"([A-Z][A-Za-z0-9&\.\- ]{1,40}?)\s+(?:series [A-C]|funding)",
|
||||||
|
]
|
||||||
|
text = f"{title} {description}"
|
||||||
|
for pat in patterns:
|
||||||
|
m = re.search(pat, text)
|
||||||
|
if m:
|
||||||
|
name = m.group(1).strip().rstrip(" -,.|")
|
||||||
|
if name.lower() not in {"the", "a", "new", "former", "our", "this", "why", "how", "what"}:
|
||||||
|
return name
|
||||||
|
parts = re.split(r"[\|\-–—:]", title)
|
||||||
|
if parts:
|
||||||
|
candidate = parts[0].strip()
|
||||||
|
if len(candidate) < 50 and candidate[0:1].isupper():
|
||||||
|
return candidate
|
||||||
|
return title[:60]
|
||||||
|
|
||||||
|
|
||||||
|
def estimate_company_size(text: str) -> str:
|
||||||
|
"""Estimate company size from context clues in signal text."""
|
||||||
|
text_lower = text.lower()
|
||||||
|
if any(w in text_lower for w in ["enterprise", "fortune 500", "10,000", "global"]):
|
||||||
|
return "1000+"
|
||||||
|
if any(w in text_lower for w in ["series c", "series d", "ipo", "public"]):
|
||||||
|
return "500-1000"
|
||||||
|
if any(w in text_lower for w in ["series b", "growth stage", "scale"]):
|
||||||
|
return "200-500"
|
||||||
|
if any(w in text_lower for w in ["series a", "startup", "seed"]):
|
||||||
|
return "50-200"
|
||||||
|
if any(w in text_lower for w in ["pre-seed", "bootstrapped", "early stage"]):
|
||||||
|
return "10-50"
|
||||||
|
return "50-500"
|
||||||
|
|
||||||
|
|
||||||
|
def estimate_industry(text: str) -> str:
|
||||||
|
"""Estimate industry from signal text."""
|
||||||
|
text_lower = text.lower()
|
||||||
|
industries = {
|
||||||
|
"SaaS": ["saas", "software", "platform", "cloud", "app"],
|
||||||
|
"E-commerce": ["ecommerce", "e-commerce", "retail", "shop", "store", "dtc", "d2c"],
|
||||||
|
"Fintech": ["fintech", "financial", "banking", "payments", "insurance"],
|
||||||
|
"Healthcare": ["health", "medical", "biotech", "pharma", "wellness"],
|
||||||
|
"Education": ["edtech", "education", "learning", "course"],
|
||||||
|
"AI/ML": ["artificial intelligence", "machine learning", "ai-powered", "ai company"],
|
||||||
|
"Crypto/Web3": ["crypto", "blockchain", "web3", "defi", "nft"],
|
||||||
|
"Media": ["media", "publishing", "news", "content"],
|
||||||
|
"B2B Services": ["b2b", "consulting", "services", "agency"],
|
||||||
|
}
|
||||||
|
for industry, keywords in industries.items():
|
||||||
|
if any(k in text_lower for k in keywords):
|
||||||
|
return industry
|
||||||
|
return "Technology"
|
||||||
|
|
||||||
|
|
||||||
|
def suggest_services(text: str) -> list:
|
||||||
|
"""Suggest which of your services to pitch based on signal text."""
|
||||||
|
text_lower = text.lower()
|
||||||
|
matched = []
|
||||||
|
for service, keywords in SERVICE_KEYWORDS.items():
|
||||||
|
if any(k in text_lower for k in keywords):
|
||||||
|
matched.append(service)
|
||||||
|
if not matched:
|
||||||
|
matched = ["SEO", "Paid Media"] # Sensible defaults
|
||||||
|
return matched
|
||||||
|
|
||||||
|
|
||||||
|
def score_prospect(signal_type: str, size_est: str, services: list, text: str) -> int:
|
||||||
|
"""Score a prospect 0-100 based on signal type, company fit, and context."""
|
||||||
|
score = 0
|
||||||
|
|
||||||
|
# Signal type scoring
|
||||||
|
signal_scores = {"new_hire": 35, "job_posting": 25, "funding": 30, "agency_search": 40}
|
||||||
|
score += signal_scores.get(signal_type, 20)
|
||||||
|
|
||||||
|
# Company size fit (mid-market is ideal for most agencies)
|
||||||
|
size_scores = {"10-50": 10, "50-200": 25, "200-500": 25, "500-1000": 15, "1000+": 5}
|
||||||
|
score += size_scores.get(size_est, 15)
|
||||||
|
|
||||||
|
# Service alignment
|
||||||
|
score += min(len(services) * 5, 20)
|
||||||
|
|
||||||
|
# Bonus signals in text
|
||||||
|
text_lower = text.lower()
|
||||||
|
if "cmo" in text_lower or "chief marketing" in text_lower:
|
||||||
|
score += 10
|
||||||
|
if "agency" in text_lower:
|
||||||
|
score += 5
|
||||||
|
if any(w in text_lower for w in ["review", "evaluate", "looking for", "rfp"]):
|
||||||
|
score += 5
|
||||||
|
|
||||||
|
return min(score, 100)
|
||||||
|
|
||||||
|
|
||||||
|
def generate_outreach_hook(company: str, signal_type: str) -> str:
|
||||||
|
"""Generate a casual outreach hook based on the signal type."""
|
||||||
|
hooks = {
|
||||||
|
"new_hire": [
|
||||||
|
f"New marketing leadership at {company}. The first 90 days is when the best "
|
||||||
|
f"leaders figure out what's actually working.",
|
||||||
|
f"Congrats on the new hire at {company}. Leadership changes are the best time "
|
||||||
|
f"to audit what's driving results and what's noise.",
|
||||||
|
],
|
||||||
|
"job_posting": [
|
||||||
|
f"Noticed {company} is hiring marketing roles. Usually means growth is the "
|
||||||
|
f"priority. We help companies hit targets while the team ramps up.",
|
||||||
|
f"{company} is building out the marketing team. We've been the bridge for "
|
||||||
|
f"companies in that exact phase.",
|
||||||
|
],
|
||||||
|
"funding": [
|
||||||
|
f"Congrats on the raise. Post-funding is when the pressure to scale "
|
||||||
|
f"acquisition hits. We help turn capital into pipeline efficiently.",
|
||||||
|
f"Saw the funding news for {company}. The companies that win post-raise "
|
||||||
|
f"scale acquisition without burning through runway.",
|
||||||
|
],
|
||||||
|
"agency_search": [
|
||||||
|
f"Saw {company} is evaluating marketing partners. Happy to throw our hat in.",
|
||||||
|
f"Noticed you're looking for a marketing partner at {company}.",
|
||||||
|
],
|
||||||
|
}
|
||||||
|
options = hooks.get(signal_type, [f"Noticed some movement at {company}."])
|
||||||
|
return random.choice(options)
|
||||||
|
|
||||||
|
|
||||||
|
def generate_email_draft(company, signal_type, services):
|
||||||
|
"""Generate a trigger-based cold email draft."""
|
||||||
|
services_str = ", ".join(services[:3]) if services else "growth marketing"
|
||||||
|
cta = random.choice([
|
||||||
|
"Worth exploring?", "Curious if relevant?", "Worth a conversation?",
|
||||||
|
"Make sense to chat?", "Worth 15 min?",
|
||||||
|
])
|
||||||
|
signoff = YOUR_SENDER_NAME
|
||||||
|
|
||||||
|
templates = {
|
||||||
|
"new_hire": {
|
||||||
|
"subject": f"{company}, new leadership = fresh eyes",
|
||||||
|
"body": (f"Hey,\n\nSaw the leadership change at {company}. The first 90 days "
|
||||||
|
f"are when the best marketing leaders audit what's working and cut what's not.\n\n"
|
||||||
|
f"We specialize in {services_str} and figured the timing might be right.\n\n"
|
||||||
|
f"{cta}\n\n{signoff}"),
|
||||||
|
},
|
||||||
|
"job_posting": {
|
||||||
|
"subject": f"{company} is hiring, we can help now",
|
||||||
|
"body": (f"Hey,\n\nNoticed {company} is hiring marketing roles. Hiring takes time, "
|
||||||
|
f"but growth targets don't wait.\n\n"
|
||||||
|
f"We've been the bridge for companies in that exact gap, handling "
|
||||||
|
f"{services_str} while the team ramps up.\n\n{cta}\n\n{signoff}"),
|
||||||
|
},
|
||||||
|
"funding": {
|
||||||
|
"subject": f"congrats on the raise, {company}",
|
||||||
|
"body": (f"Hey,\n\nSaw the funding news. Congrats. Post-raise is when the pressure "
|
||||||
|
f"to scale acquisition really hits.\n\n"
|
||||||
|
f"We help companies turn funding into efficient pipeline growth, "
|
||||||
|
f"specifically through {services_str}.\n\n{cta}\n\n{signoff}"),
|
||||||
|
},
|
||||||
|
"agency_search": {
|
||||||
|
"subject": f"{company} + {YOUR_COMPANY_NAME}",
|
||||||
|
"body": (f"Hey,\n\nSaw you're evaluating marketing partners at {company}.\n\n"
|
||||||
|
f"We specialize in {services_str}. Happy to share a few quick wins "
|
||||||
|
f"we'd go after in the first 30 days. No commitment.\n\n{cta}\n\n{signoff}"),
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
t = templates.get(signal_type, templates["agency_search"])
|
||||||
|
return f"Subject: {t['subject']}\n\n{t['body']}"
|
||||||
|
|
||||||
|
|
||||||
|
def suggest_channel(signal_type: str) -> str:
|
||||||
|
"""Suggest the best outreach channel for this signal type."""
|
||||||
|
channels = {
|
||||||
|
"new_hire": "LinkedIn (congratulate + connect)",
|
||||||
|
"agency_search": "Email (direct response)",
|
||||||
|
"funding": "LinkedIn + Email (warm congrats)",
|
||||||
|
"job_posting": "Email",
|
||||||
|
}
|
||||||
|
return channels.get(signal_type, "Email")
|
||||||
|
|
||||||
|
|
||||||
|
# ─── Main Pipeline ───────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def run(days: int = 7, top: int = 15, min_score: int = 50):
|
||||||
|
api_key = get_brave_api_key()
|
||||||
|
freshness = freshness_for_days(days)
|
||||||
|
|
||||||
|
print(f"🔍 Trigger-Based Prospecting Engine")
|
||||||
|
print(f" Scanning last {days} days | Top {top} | Min score: {min_score}")
|
||||||
|
print(f" {'-'*50}")
|
||||||
|
|
||||||
|
all_prospects = []
|
||||||
|
seen_urls = set()
|
||||||
|
|
||||||
|
for signal_type, queries in SEARCH_QUERIES.items():
|
||||||
|
print(f"\n📡 Scanning: {signal_type.replace('_', ' ').title()}")
|
||||||
|
for query in queries:
|
||||||
|
print(f" → {query[:60]}...")
|
||||||
|
results = brave_search(query, api_key, freshness=freshness, count=8)
|
||||||
|
|
||||||
|
for r in results:
|
||||||
|
url = r.get("url", "")
|
||||||
|
if url in seen_urls:
|
||||||
|
continue
|
||||||
|
seen_urls.add(url)
|
||||||
|
|
||||||
|
title = r.get("title", "")
|
||||||
|
desc = r.get("description", "")
|
||||||
|
full_text = f"{title} {desc}"
|
||||||
|
|
||||||
|
company = extract_company_name(title, desc)
|
||||||
|
size_est = estimate_company_size(full_text)
|
||||||
|
industry = estimate_industry(full_text)
|
||||||
|
services = suggest_services(full_text)
|
||||||
|
score = score_prospect(signal_type, size_est, services, full_text)
|
||||||
|
|
||||||
|
if score < min_score:
|
||||||
|
continue
|
||||||
|
|
||||||
|
prospect = {
|
||||||
|
"company": company,
|
||||||
|
"signal_type": signal_type,
|
||||||
|
"signal_detail": title,
|
||||||
|
"signal_url": url,
|
||||||
|
"signal_date": datetime.now().strftime("%Y-%m-%d"),
|
||||||
|
"prospect_score": score,
|
||||||
|
"industry": industry,
|
||||||
|
"est_company_size": size_est,
|
||||||
|
"suggested_services": services,
|
||||||
|
"suggested_channel": suggest_channel(signal_type),
|
||||||
|
"outreach_hook": generate_outreach_hook(company, signal_type),
|
||||||
|
"email_draft": generate_email_draft(company, signal_type, services),
|
||||||
|
}
|
||||||
|
all_prospects.append(prospect)
|
||||||
|
|
||||||
|
# Deduplicate by company (keep highest score)
|
||||||
|
company_best = {}
|
||||||
|
for p in all_prospects:
|
||||||
|
key = p["company"].lower().strip()
|
||||||
|
if key not in company_best or p["prospect_score"] > company_best[key]["prospect_score"]:
|
||||||
|
company_best[key] = p
|
||||||
|
|
||||||
|
prospects = sorted(company_best.values(),
|
||||||
|
key=lambda x: x["prospect_score"], reverse=True)[:top]
|
||||||
|
|
||||||
|
# Save output
|
||||||
|
DATA_DIR.mkdir(parents=True, exist_ok=True)
|
||||||
|
output = {
|
||||||
|
"generated_at": datetime.now().isoformat(),
|
||||||
|
"params": {"days": days, "top": top, "min_score": min_score},
|
||||||
|
"total_signals_found": len(all_prospects),
|
||||||
|
"prospects": prospects,
|
||||||
|
}
|
||||||
|
OUTPUT_FILE.write_text(json.dumps(output, indent=2))
|
||||||
|
|
||||||
|
# Print summary
|
||||||
|
print(f"\n{'='*60}")
|
||||||
|
print(f"🎯 TOP {len(prospects)} PROSPECTS (of {len(all_prospects)} signals found)")
|
||||||
|
print(f"{'='*60}\n")
|
||||||
|
|
||||||
|
for i, p in enumerate(prospects, 1):
|
||||||
|
print(f" {i:2d}. [{p['prospect_score']:3d}] {p['company']}")
|
||||||
|
print(f" Signal: {p['signal_type']} — {p['signal_detail'][:70]}")
|
||||||
|
print(f" Size: {p['est_company_size']} | Industry: {p['industry']}")
|
||||||
|
print(f" Services: {', '.join(p['suggested_services'])}")
|
||||||
|
print(f" Channel: {p['suggested_channel']}")
|
||||||
|
print()
|
||||||
|
|
||||||
|
print(f"📁 Saved to: {OUTPUT_FILE}")
|
||||||
|
return prospects
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
parser = argparse.ArgumentParser(description="Trigger-Based Prospecting Engine")
|
||||||
|
parser.add_argument("--days", type=int, default=7, help="Lookback window in days (default: 7)")
|
||||||
|
parser.add_argument("--top", type=int, default=15, help="Number of top prospects (default: 15)")
|
||||||
|
parser.add_argument("--min-score", type=int, default=50, help="Minimum prospect score (default: 50)")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
run(days=args.days, top=args.top, min_score=args.min_score)
|
||||||
62
seo-ops/.env.example
Normal file
62
seo-ops/.env.example
Normal file
|
|
@ -0,0 +1,62 @@
|
||||||
|
# ─────────────────────────────────────────────
|
||||||
|
# AI SEO Ops — Environment Configuration
|
||||||
|
# ─────────────────────────────────────────────
|
||||||
|
# Copy this file to .env and fill in your values.
|
||||||
|
# All scripts read from these environment variables.
|
||||||
|
|
||||||
|
# ─── Required ───────────────────────────────
|
||||||
|
|
||||||
|
# Your website domain (used for Ahrefs organic keyword lookup)
|
||||||
|
YOUR_DOMAIN=example.com
|
||||||
|
|
||||||
|
# Google Search Console site URL (run gsc_auth.py to see your verified sites)
|
||||||
|
# Format: "https://www.example.com/" or "sc-domain:example.com"
|
||||||
|
GSC_SITE_URL=https://www.example.com/
|
||||||
|
|
||||||
|
# Google OAuth credentials (for GSC API access)
|
||||||
|
# Create at: https://console.cloud.google.com/apis/credentials
|
||||||
|
# Choose "OAuth 2.0 Client ID" → "Desktop application"
|
||||||
|
GOOGLE_CLIENT_ID=your-client-id.apps.googleusercontent.com
|
||||||
|
GOOGLE_CLIENT_SECRET=your-client-secret
|
||||||
|
|
||||||
|
# ─── Optional ───────────────────────────────
|
||||||
|
|
||||||
|
# Ahrefs API token (enables keyword data + competitor analysis)
|
||||||
|
# Get from: https://ahrefs.com/api
|
||||||
|
AHREFS_TOKEN=
|
||||||
|
|
||||||
|
# Competitor domains to analyze (comma-separated)
|
||||||
|
COMPETITORS=competitor1.com,competitor2.com,competitor3.com
|
||||||
|
|
||||||
|
# Brave Search API key (enables X/Twitter trend scanning in trend_scout.py)
|
||||||
|
# Get from: https://brave.com/search/api/
|
||||||
|
BRAVE_API_KEY=
|
||||||
|
|
||||||
|
# ─── Content Configuration ──────────────────
|
||||||
|
|
||||||
|
# Directory containing your content files (markdown, JSON atoms)
|
||||||
|
# Used by content_attack_brief.py for topic fingerprinting
|
||||||
|
CONTENT_DIR=./content
|
||||||
|
|
||||||
|
# Output directory for generated reports and JSON
|
||||||
|
OUTPUT_DIR=./output
|
||||||
|
|
||||||
|
# Content verticals for trend relevance scoring (comma-separated)
|
||||||
|
CONTENT_VERTICALS=AI marketing automation,SEO trends,content marketing AI,startup growth strategy
|
||||||
|
|
||||||
|
# Reddit subreddits to monitor for trends (comma-separated)
|
||||||
|
TREND_SUBREDDITS=marketing,SEO,startups,entrepreneur,digitalmarketing
|
||||||
|
|
||||||
|
# ─── Advanced ───────────────────────────────
|
||||||
|
|
||||||
|
# Path to GSC OAuth token file (auto-created by gsc_auth.py)
|
||||||
|
# GSC_TOKEN_FILE=.gsc-token.json
|
||||||
|
|
||||||
|
# Path to Google credentials JSON file (alternative to GOOGLE_CLIENT_ID/SECRET)
|
||||||
|
# GOOGLE_CREDENTIALS_FILE=/path/to/credentials.json
|
||||||
|
|
||||||
|
# Custom topic keywords for content fingerprinting (JSON format)
|
||||||
|
# TOPIC_KEYWORDS_JSON='{"AI agents": ["ai agent", "autonomous agent"], "SEO": ["seo", "keyword", "ranking"]}'
|
||||||
|
|
||||||
|
# Custom seed keywords per topic for Ahrefs research (JSON format)
|
||||||
|
# TOPIC_TO_SEEDS_JSON='{"AI agents": ["ai agents for marketing", "ai agent platform"]}'
|
||||||
203
seo-ops/README.md
Normal file
203
seo-ops/README.md
Normal file
|
|
@ -0,0 +1,203 @@
|
||||||
|
# AI SEO Ops
|
||||||
|
|
||||||
|
**Find the keywords your competitors missed. Automatically.**
|
||||||
|
|
||||||
|
An AI-powered SEO operations suite that replaces the manual grind of keyword research, competitor analysis, content briefing, and trend detection. These tools pull data from Google Search Console, Ahrefs, and the open web to surface the exact opportunities your team should act on — ranked by impact, scored by confidence, and ready to execute.
|
||||||
|
|
||||||
|
## What's Inside
|
||||||
|
|
||||||
|
### 🎯 Content Attack Brief Generator
|
||||||
|
Synthesizes your content footprint, Ahrefs keyword data, GSC performance, and competitor gaps into a weekly prioritized keyword brief. Scores every keyword on Impact × Confidence and assigns an execution path (fully automated → team-assisted → expert-only).
|
||||||
|
|
||||||
|
**What it finds:**
|
||||||
|
- BOFU money keywords your competitors rank for but you don't
|
||||||
|
- Trending keywords surging before the competition notices
|
||||||
|
- Decaying pages losing traffic that need a refresh
|
||||||
|
- Outside-the-box content angles where you have unique authority
|
||||||
|
|
||||||
|
### 📊 GSC Keyword Optimizer
|
||||||
|
Pulls Google Search Console data and identifies "striking distance" keywords — queries where you rank positions 4–20 with decent impressions. These are your quick wins: small optimizations that can push you onto page one.
|
||||||
|
|
||||||
|
**What it does:**
|
||||||
|
- Finds keywords on the cusp of page one
|
||||||
|
- Calculates potential traffic gains from position improvements
|
||||||
|
- Identifies CTR underperformers (you rank well but nobody clicks)
|
||||||
|
- Groups related keywords for efficient optimization
|
||||||
|
|
||||||
|
### 🔑 GSC Client & Auth
|
||||||
|
A reusable Google Search Console API client with OAuth flow. Use it standalone or import it as a library in your own scripts.
|
||||||
|
|
||||||
|
**Features:**
|
||||||
|
- Top queries, pages, device splits, country splits
|
||||||
|
- Striking distance finder (positions 4–20)
|
||||||
|
- Query + page matrix for cannibalization analysis
|
||||||
|
- Daily trend tracking
|
||||||
|
- CLI and library modes
|
||||||
|
|
||||||
|
### 🔥 Trend Scout
|
||||||
|
Scans Google Trends, Hacker News, Reddit, and X/Twitter to find trending topics in your niche before they peak. Scores each trend for relevance to your content verticals and suggests content angles.
|
||||||
|
|
||||||
|
**Sources monitored:**
|
||||||
|
- Google Trends RSS (US)
|
||||||
|
- Hacker News (filtered for your niche)
|
||||||
|
- Reddit (configurable subreddits)
|
||||||
|
- X/Twitter (via Brave Search)
|
||||||
|
- YouTube competitor outlier detection
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### 1. Install dependencies
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install -r requirements.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Configure environment
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cp .env.example .env
|
||||||
|
# Edit .env with your API keys and site configuration
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Authenticate with Google Search Console
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python gsc_auth.py
|
||||||
|
```
|
||||||
|
|
||||||
|
This opens a browser for OAuth consent. Your token is saved locally for subsequent use.
|
||||||
|
|
||||||
|
### 4. Run the tools
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Content Attack Brief — full keyword intelligence report
|
||||||
|
python content_attack_brief.py
|
||||||
|
|
||||||
|
# GSC Keyword Optimizer — find striking distance keywords
|
||||||
|
python gsc_client.py --striking --days 28
|
||||||
|
|
||||||
|
# GSC top queries
|
||||||
|
python gsc_client.py --queries 50 --days 28
|
||||||
|
|
||||||
|
# Trend Scout — find what's trending in your niche
|
||||||
|
python trend_scout.py
|
||||||
|
```
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
┌──────────────────────────────────────────────────┐
|
||||||
|
│ Content Attack Brief │
|
||||||
|
│ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │
|
||||||
|
│ │ Content │ │ Ahrefs │ │ Competitor Gap │ │
|
||||||
|
│ │Fingerprint│ │ Keywords │ │ Analysis │ │
|
||||||
|
│ └────┬─────┘ └────┬─────┘ └────────┬─────────┘ │
|
||||||
|
│ └─────────────┼────────────────┘ │
|
||||||
|
│ ▼ │
|
||||||
|
│ ┌──────────────────────────────────────────────┐ │
|
||||||
|
│ │ Impact × Confidence Scoring │ │
|
||||||
|
│ │ Volume · KD · CPC · Trend · Funnel Stage │ │
|
||||||
|
│ └──────────────────────────────────────────────┘ │
|
||||||
|
│ │ │
|
||||||
|
│ ┌──────────┐ ┌─────┴─────┐ ┌──────────────────┐ │
|
||||||
|
│ │ GSC │ │ Execution │ │ Trend Scout │ │
|
||||||
|
│ │ Client │ │ Pipeline │ │ (Google Trends │ │
|
||||||
|
│ │ │ │ │ │ HN · Reddit) │ │
|
||||||
|
│ └──────────┘ └───────────┘ └──────────────────┘ │
|
||||||
|
└──────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
## Scoring Algorithm
|
||||||
|
|
||||||
|
Every keyword gets an **Impact** score (0–10) and a **Confidence** score (0–10). Priority = Impact × Confidence.
|
||||||
|
|
||||||
|
### Impact factors:
|
||||||
|
| Signal | Score |
|
||||||
|
|--------|-------|
|
||||||
|
| Volume ≥ 10K | +3 |
|
||||||
|
| Volume ≥ 2K | +2 |
|
||||||
|
| Volume ≥ 500 | +1 |
|
||||||
|
| CPC ≥ $15 | +3 |
|
||||||
|
| CPC ≥ $5 | +2 |
|
||||||
|
| CPC ≥ $1 | +1 |
|
||||||
|
| BOFU intent | +2 |
|
||||||
|
| MOFU intent | +1 |
|
||||||
|
| Trend surging (>50%) | +2 |
|
||||||
|
| Trend rising (>20%) | +1 |
|
||||||
|
|
||||||
|
### Confidence factors:
|
||||||
|
| Signal | Score |
|
||||||
|
|--------|-------|
|
||||||
|
| KD ≤ 10 | +4 |
|
||||||
|
| KD ≤ 20 | +3 |
|
||||||
|
| KD ≤ 35 | +2 |
|
||||||
|
| KD ≤ 50 | +1 |
|
||||||
|
| Already ranking top 10 | +3 |
|
||||||
|
| Ranking 11–30 | +2 |
|
||||||
|
| Ranking 31–50 | +1 |
|
||||||
|
| Topic authority match | +2 |
|
||||||
|
|
||||||
|
## Funnel Classification
|
||||||
|
|
||||||
|
Keywords are auto-classified into funnel stages:
|
||||||
|
|
||||||
|
- **BOFU** (Bottom of Funnel): "agency", "services", "pricing", "best", "vs", "alternative", "hire"
|
||||||
|
- **MOFU** (Middle of Funnel): "how to", "guide", "strategy", "case study", "roi", "tutorial"
|
||||||
|
- **TOFU** (Top of Funnel): Everything else
|
||||||
|
|
||||||
|
Commercial/transactional intent from Ahrefs automatically promotes to BOFU.
|
||||||
|
|
||||||
|
## Trend Detection
|
||||||
|
|
||||||
|
The Trend Scout scores each trending topic against your configured content verticals:
|
||||||
|
|
||||||
|
- **High relevance (25pts each):** Exact matches to your core topics
|
||||||
|
- **Medium relevance (10pts each):** Related industry terms
|
||||||
|
- **Low relevance (5pts each):** Tangential business terms
|
||||||
|
|
||||||
|
Trends scoring ≥20 get surfaced with content angle suggestions and recommended platforms.
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
All configuration is via environment variables (see `.env.example`):
|
||||||
|
|
||||||
|
| Variable | Required | Description |
|
||||||
|
|----------|----------|-------------|
|
||||||
|
| `GSC_SITE_URL` | Yes | Your GSC property (e.g., `https://www.example.com/`) |
|
||||||
|
| `GSC_TOKEN_FILE` | No | Path to GSC OAuth token (default: `.gsc-token.json`) |
|
||||||
|
| `GOOGLE_CLIENT_ID` | Yes | Google OAuth client ID |
|
||||||
|
| `GOOGLE_CLIENT_SECRET` | Yes | Google OAuth client secret |
|
||||||
|
| `AHREFS_TOKEN` | No | Ahrefs API token (enables keyword data + competitor analysis) |
|
||||||
|
| `YOUR_DOMAIN` | Yes | Your root domain for organic keyword tracking |
|
||||||
|
| `COMPETITORS` | No | Comma-separated competitor domains |
|
||||||
|
| `CONTENT_VERTICALS` | No | Comma-separated topic verticals for trend scoring |
|
||||||
|
| `TREND_SUBREDDITS` | No | Comma-separated subreddit names to monitor |
|
||||||
|
| `BRAVE_API_KEY` | No | Brave Search API key (enables X/Twitter trend scanning) |
|
||||||
|
| `OUTPUT_DIR` | No | Where to save output files (default: `./output`) |
|
||||||
|
|
||||||
|
## Using as a Claude Code Skill
|
||||||
|
|
||||||
|
Add this to your `.claude/agents/` directory and use the `SKILL.md` for Claude Code integration. The skill enables Claude to:
|
||||||
|
|
||||||
|
1. Run keyword analysis on demand
|
||||||
|
2. Generate weekly content attack briefs
|
||||||
|
3. Find and prioritize quick-win keywords from GSC
|
||||||
|
4. Monitor trending topics and suggest content
|
||||||
|
|
||||||
|
## File Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
seo-ops/
|
||||||
|
├── README.md # This file
|
||||||
|
├── SKILL.md # Claude Code agent skill definition
|
||||||
|
├── content_attack_brief.py # Full keyword intelligence pipeline
|
||||||
|
├── gsc_client.py # GSC API client (library + CLI)
|
||||||
|
├── gsc_auth.py # GSC OAuth setup flow
|
||||||
|
├── trend_scout.py # Multi-source trend detection
|
||||||
|
├── requirements.txt # Python dependencies
|
||||||
|
└── .env.example # Environment variable template
|
||||||
|
```
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
MIT
|
||||||
123
seo-ops/SKILL.md
Normal file
123
seo-ops/SKILL.md
Normal file
|
|
@ -0,0 +1,123 @@
|
||||||
|
# AI SEO Ops
|
||||||
|
|
||||||
|
AI-powered SEO operations: keyword intelligence, competitor gap analysis, GSC optimization, and trend detection.
|
||||||
|
|
||||||
|
## When to Use
|
||||||
|
|
||||||
|
- User asks for keyword research, content brief, or SEO analysis
|
||||||
|
- User wants to find quick-win keywords from Google Search Console
|
||||||
|
- User needs a competitor gap analysis
|
||||||
|
- User wants to identify trending topics for content creation
|
||||||
|
- User asks about decaying content or traffic drops
|
||||||
|
- User wants a prioritized list of keywords to target
|
||||||
|
|
||||||
|
## Tools
|
||||||
|
|
||||||
|
### Content Attack Brief (`content_attack_brief.py`)
|
||||||
|
|
||||||
|
Full keyword intelligence pipeline. Requires `AHREFS_TOKEN` and GSC auth.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run the full brief
|
||||||
|
python content_attack_brief.py
|
||||||
|
```
|
||||||
|
|
||||||
|
**What it produces:**
|
||||||
|
- Topic fingerprint from your content library
|
||||||
|
- BOFU money keywords ranked by Impact × Confidence
|
||||||
|
- Trending keywords with sparkline visualizations
|
||||||
|
- Competitor gap analysis (keywords they rank for, you don't)
|
||||||
|
- Decaying page alerts (traffic drops >30%)
|
||||||
|
- Execution pipeline (auto-create → semi-auto → team)
|
||||||
|
|
||||||
|
**Output:** Prints formatted report to stdout + saves JSON to `OUTPUT_DIR/content-attack-brief-latest.json`
|
||||||
|
|
||||||
|
### GSC Client (`gsc_client.py`)
|
||||||
|
|
||||||
|
Google Search Console API client. Works as CLI or importable library.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# CLI usage
|
||||||
|
python gsc_client.py --queries 50 --days 28
|
||||||
|
python gsc_client.py --striking # Striking distance keywords (pos 4-20)
|
||||||
|
python gsc_client.py --pages 100 --days 7
|
||||||
|
python gsc_client.py --trend # Daily click/impression trend
|
||||||
|
python gsc_client.py --devices # Mobile vs desktop split
|
||||||
|
python gsc_client.py --sites # List verified properties
|
||||||
|
python gsc_client.py --json --queries 25 # JSON output
|
||||||
|
```
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Library usage
|
||||||
|
from gsc_client import GSCClient
|
||||||
|
|
||||||
|
gsc = GSCClient()
|
||||||
|
rows = gsc.striking_distance(days=28, min_position=4, max_position=20)
|
||||||
|
for row in rows:
|
||||||
|
print(f"{row['keys'][0]}: pos {row['position']:.1f}, {row['impressions']} impressions")
|
||||||
|
```
|
||||||
|
|
||||||
|
### GSC Auth (`gsc_auth.py`)
|
||||||
|
|
||||||
|
One-time OAuth setup for Google Search Console access.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python gsc_auth.py
|
||||||
|
# Opens browser → Google Sign-In → saves token locally
|
||||||
|
```
|
||||||
|
|
||||||
|
### Trend Scout (`trend_scout.py`)
|
||||||
|
|
||||||
|
Multi-source trend detection. No API keys required for basic functionality.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python trend_scout.py
|
||||||
|
```
|
||||||
|
|
||||||
|
**Sources:** Google Trends RSS, Hacker News, Reddit, X/Twitter (needs `BRAVE_API_KEY`), YouTube outlier detection
|
||||||
|
|
||||||
|
**Output:** Prints summary + saves JSON to `OUTPUT_DIR/flash-trends-latest.json` and markdown report.
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
All scripts read from environment variables. Copy `.env.example` to `.env` and fill in your values.
|
||||||
|
|
||||||
|
Required:
|
||||||
|
- `GSC_SITE_URL` — your Google Search Console property URL
|
||||||
|
- `GOOGLE_CLIENT_ID` / `GOOGLE_CLIENT_SECRET` — for GSC OAuth
|
||||||
|
- `YOUR_DOMAIN` — your root domain
|
||||||
|
|
||||||
|
Optional:
|
||||||
|
- `AHREFS_TOKEN` — enables Ahrefs keyword data and competitor analysis
|
||||||
|
- `COMPETITORS` — comma-separated competitor domains
|
||||||
|
- `BRAVE_API_KEY` — enables X/Twitter trend scanning
|
||||||
|
- `CONTENT_VERTICALS` — comma-separated topics for trend relevance scoring
|
||||||
|
- `TREND_SUBREDDITS` — comma-separated subreddits to monitor
|
||||||
|
|
||||||
|
## Scoring Model
|
||||||
|
|
||||||
|
Keywords are scored on two axes:
|
||||||
|
|
||||||
|
**Impact (0-10):** Volume + CPC + Funnel Stage + Trend direction
|
||||||
|
**Confidence (0-10):** Keyword Difficulty + Current ranking position + Topic authority
|
||||||
|
|
||||||
|
**Priority = Impact × Confidence** (max 100)
|
||||||
|
|
||||||
|
## Funnel Classification
|
||||||
|
|
||||||
|
- **BOFU:** Commercial/transactional intent, or keywords containing "agency", "services", "pricing", "best", "vs", "hire"
|
||||||
|
- **MOFU:** Informational with buying signals — "how to", "guide", "roi", "case study"
|
||||||
|
- **TOFU:** Pure informational
|
||||||
|
|
||||||
|
## Recommended Workflow
|
||||||
|
|
||||||
|
1. **Weekly:** Run `content_attack_brief.py` for the full intelligence report
|
||||||
|
2. **Daily:** Run `gsc_client.py --striking` to monitor striking distance keywords
|
||||||
|
3. **2x/week:** Run `trend_scout.py` to catch trending topics early
|
||||||
|
4. **Monthly:** Review competitor gaps and adjust `COMPETITORS` list
|
||||||
|
|
||||||
|
## Dependencies
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install -r requirements.txt
|
||||||
|
```
|
||||||
951
seo-ops/content_attack_brief.py
Normal file
951
seo-ops/content_attack_brief.py
Normal file
|
|
@ -0,0 +1,951 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Content Attack Brief Generator
|
||||||
|
|
||||||
|
Synthesizes your content library, Ahrefs keyword data, GSC performance,
|
||||||
|
and competitor gaps into a weekly prioritized keyword brief.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
# Set environment variables (see .env.example)
|
||||||
|
python content_attack_brief.py
|
||||||
|
|
||||||
|
# Or export inline
|
||||||
|
AHREFS_TOKEN="..." YOUR_DOMAIN="example.com" python content_attack_brief.py
|
||||||
|
"""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import re
|
||||||
|
import glob
|
||||||
|
import importlib.util
|
||||||
|
import math
|
||||||
|
import requests
|
||||||
|
from datetime import datetime, timedelta, date
|
||||||
|
from collections import Counter, defaultdict
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────
|
||||||
|
# Config (all from environment variables)
|
||||||
|
# ─────────────────────────────────────────────
|
||||||
|
OUTPUT_DIR = Path(os.environ.get("OUTPUT_DIR", "./output"))
|
||||||
|
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
CONTENT_DIR = Path(os.environ.get("CONTENT_DIR", "./content"))
|
||||||
|
# Directory containing your content files (markdown, JSON atoms, etc.)
|
||||||
|
|
||||||
|
AHREFS_TOKEN = os.environ.get("AHREFS_TOKEN", "")
|
||||||
|
AHREFS_BASE = "https://api.ahrefs.com/v3"
|
||||||
|
AHREFS_HEADERS = lambda: {"Authorization": f"Bearer {AHREFS_TOKEN}"}
|
||||||
|
|
||||||
|
YOUR_DOMAIN = os.environ.get("YOUR_DOMAIN", "example.com")
|
||||||
|
COMPETITORS = [c.strip() for c in os.environ.get("COMPETITORS", "").split(",") if c.strip()]
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────
|
||||||
|
# 1. CONTENT FINGERPRINT
|
||||||
|
# ─────────────────────────────────────────────
|
||||||
|
|
||||||
|
STOPWORDS = {
|
||||||
|
"the","a","an","is","are","was","were","be","been","being","have","has","had",
|
||||||
|
"do","does","did","will","would","could","should","may","might","shall",
|
||||||
|
"and","but","or","nor","for","yet","so","at","by","for","in","of","on","to",
|
||||||
|
"with","as","that","this","these","those","it","its","i","we","you","they",
|
||||||
|
"he","she","him","her","our","their","your","my","what","which","who","when",
|
||||||
|
"where","how","not","all","also","more","very","just","from","about","into",
|
||||||
|
"than","then","there","so","up","out","if","no","can","one","time","like",
|
||||||
|
"get","got","use","used","make","made","work","well","way","new","good",
|
||||||
|
"go","going","know","think","want","need","see","look","come","give",
|
||||||
|
"take","say","even","most","much","such","here","now","over","any","some",
|
||||||
|
"them","us","first","two","other","his","her","its",
|
||||||
|
}
|
||||||
|
|
||||||
|
# Customize these topic keywords to match your content verticals
|
||||||
|
TOPIC_KEYWORDS = {
|
||||||
|
"AI agents": ["ai agent", "ai agents", "agent fleet", "autonomous agent", "llm agent", "multi-agent"],
|
||||||
|
"Claude/OpenAI": ["claude", "openai", "gpt", "anthropic", "gemini", "chatgpt"],
|
||||||
|
"SEO/AEO": ["seo", "aeo", "search engine", "organic", "keyword", "serp", "ranking", "backlink", "gsc"],
|
||||||
|
"Content marketing": ["content", "blog", "article", "post", "write", "writing", "publishing"],
|
||||||
|
"Marketing agency": ["agency", "client", "service", "campaign", "marketing"],
|
||||||
|
"Lead generation": ["lead gen", "leads", "pipeline", "outbound", "inbound", "funnel", "prospect"],
|
||||||
|
"AI automation": ["automation", "automate", "automated", "workflow", "script", "cron", "pipeline"],
|
||||||
|
"Revenue/ROI": ["revenue", "roi", "growth", "profit", "income", "mrr", "arr", "monetize"],
|
||||||
|
"Sales": ["sales", "deal", "close", "outreach", "cold email", "crm"],
|
||||||
|
"Social media": ["instagram", "tiktok", "youtube", "twitter", "linkedin", "social media", "viral"],
|
||||||
|
"B2B SaaS": ["saas", "b2b", "software", "product", "platform"],
|
||||||
|
"Strategy": ["strategy", "strategic", "plan", "roadmap", "framework", "playbook"],
|
||||||
|
"Analytics/Data": ["analytics", "data", "metrics", "kpi", "ga4", "mixpanel", "dashboard"],
|
||||||
|
}
|
||||||
|
|
||||||
|
# Override with environment variable if set (JSON format)
|
||||||
|
_custom_topics = os.environ.get("TOPIC_KEYWORDS_JSON")
|
||||||
|
if _custom_topics:
|
||||||
|
try:
|
||||||
|
TOPIC_KEYWORDS = json.loads(_custom_topics)
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
print(" [WARN] Invalid TOPIC_KEYWORDS_JSON, using defaults", file=sys.stderr)
|
||||||
|
|
||||||
|
|
||||||
|
def extract_fingerprint():
|
||||||
|
"""Read content files from CONTENT_DIR, count topic frequencies."""
|
||||||
|
topic_counts = Counter()
|
||||||
|
phrase_counts = Counter()
|
||||||
|
|
||||||
|
# Load JSON content atoms (if available)
|
||||||
|
atom_files = sorted(glob.glob(str(CONTENT_DIR / "content-atoms-*.json")))
|
||||||
|
if atom_files:
|
||||||
|
latest = atom_files[-1]
|
||||||
|
try:
|
||||||
|
with open(latest) as f:
|
||||||
|
d = json.load(f)
|
||||||
|
atoms = d.get("atoms", [])
|
||||||
|
for atom in atoms:
|
||||||
|
text = (atom.get("content", "") + " " + " ".join(atom.get("tags", []))).lower()
|
||||||
|
_score_text(text, topic_counts, phrase_counts)
|
||||||
|
except Exception as e:
|
||||||
|
print(f" [WARN] Content atoms load error: {e}", file=sys.stderr)
|
||||||
|
|
||||||
|
# Load markdown files (last 30 days by filename prefix)
|
||||||
|
cutoff = date.today() - timedelta(days=30)
|
||||||
|
if CONTENT_DIR.exists():
|
||||||
|
for f in sorted(CONTENT_DIR.glob("**/*.md")):
|
||||||
|
m = re.match(r"(\d{4}-\d{2}-\d{2})", f.name)
|
||||||
|
if m:
|
||||||
|
try:
|
||||||
|
file_date = date.fromisoformat(m.group(1))
|
||||||
|
if file_date < cutoff:
|
||||||
|
continue
|
||||||
|
except ValueError:
|
||||||
|
pass
|
||||||
|
try:
|
||||||
|
text = f.read_text(errors="ignore").lower()
|
||||||
|
_score_text(text, topic_counts, phrase_counts)
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
return topic_counts, phrase_counts
|
||||||
|
|
||||||
|
|
||||||
|
def _score_text(text, topic_counts, phrase_counts):
|
||||||
|
"""Score text against topic keywords and count phrase frequencies."""
|
||||||
|
for topic, keywords in TOPIC_KEYWORDS.items():
|
||||||
|
for kw in keywords:
|
||||||
|
count = text.count(kw)
|
||||||
|
if count > 0:
|
||||||
|
topic_counts[topic] += count
|
||||||
|
|
||||||
|
# Count meaningful 2-3 word phrases
|
||||||
|
words = re.findall(r'\b[a-z][a-z\-]{2,}\b', text)
|
||||||
|
words = [w for w in words if w not in STOPWORDS and len(w) > 3]
|
||||||
|
for i in range(len(words)-1):
|
||||||
|
bigram = f"{words[i]} {words[i+1]}"
|
||||||
|
phrase_counts[bigram] += 1
|
||||||
|
if i < len(words)-2:
|
||||||
|
trigram = f"{words[i]} {words[i+1]} {words[i+2]}"
|
||||||
|
phrase_counts[trigram] += 1
|
||||||
|
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────
|
||||||
|
# 2. KEYWORD SEEDS from fingerprint
|
||||||
|
# ─────────────────────────────────────────────
|
||||||
|
|
||||||
|
# Map topics to seed keywords for Ahrefs research
|
||||||
|
# Customize these for your industry/niche
|
||||||
|
TOPIC_TO_SEEDS = {
|
||||||
|
"AI agents": [
|
||||||
|
"ai agents for marketing", "ai agent platform", "marketing ai agents",
|
||||||
|
"ai agents b2b", "autonomous ai agents", "ai agent tools",
|
||||||
|
"build ai agents", "ai agents for business",
|
||||||
|
],
|
||||||
|
"Claude/OpenAI": [
|
||||||
|
"claude ai for business", "openai for marketing", "chatgpt marketing",
|
||||||
|
"gpt for content marketing", "ai writing tools",
|
||||||
|
],
|
||||||
|
"SEO/AEO": [
|
||||||
|
"seo agency", "ai seo tools", "seo for ai", "aeo optimization",
|
||||||
|
"answer engine optimization", "seo content strategy", "technical seo services",
|
||||||
|
"seo reporting tools", "enterprise seo agency",
|
||||||
|
],
|
||||||
|
"Content marketing": [
|
||||||
|
"content marketing agency", "content marketing strategy", "b2b content marketing",
|
||||||
|
"content marketing roi", "content marketing tools", "ai content marketing",
|
||||||
|
"content marketing services", "content strategy agency",
|
||||||
|
],
|
||||||
|
"Marketing agency": [
|
||||||
|
"digital marketing agency", "performance marketing agency", "b2b marketing agency",
|
||||||
|
"marketing agency pricing", "hire marketing agency", "marketing agency services",
|
||||||
|
"best marketing agencies", "saas marketing agency",
|
||||||
|
],
|
||||||
|
"Lead generation": [
|
||||||
|
"b2b lead generation", "lead generation agency", "lead generation strategy",
|
||||||
|
"b2b lead gen tools", "outbound lead generation", "lead generation services",
|
||||||
|
"demand generation agency", "lead generation for saas",
|
||||||
|
],
|
||||||
|
"AI automation": [
|
||||||
|
"marketing automation ai", "ai workflow automation", "automate marketing tasks",
|
||||||
|
"ai marketing automation tools", "marketing automation platform",
|
||||||
|
],
|
||||||
|
"Revenue/ROI": [
|
||||||
|
"marketing roi", "content marketing roi", "seo roi", "digital marketing roi",
|
||||||
|
"revenue driven marketing", "roi tracking marketing",
|
||||||
|
],
|
||||||
|
"Sales": [
|
||||||
|
"ai sales tools", "sales automation software", "cold email software",
|
||||||
|
"outbound sales automation", "ai cold email", "sales engagement platform",
|
||||||
|
],
|
||||||
|
"B2B SaaS": [
|
||||||
|
"saas marketing agency", "b2b saas marketing", "saas seo strategy",
|
||||||
|
"saas content marketing", "saas growth marketing",
|
||||||
|
],
|
||||||
|
"Analytics/Data": [
|
||||||
|
"marketing analytics tools", "seo analytics platform", "content analytics",
|
||||||
|
"marketing data analytics",
|
||||||
|
],
|
||||||
|
"Strategy": [
|
||||||
|
"digital marketing strategy", "content strategy consulting",
|
||||||
|
"marketing strategy agency", "growth strategy consulting",
|
||||||
|
],
|
||||||
|
}
|
||||||
|
|
||||||
|
# Override with environment variable if set (JSON format)
|
||||||
|
_custom_seeds = os.environ.get("TOPIC_TO_SEEDS_JSON")
|
||||||
|
if _custom_seeds:
|
||||||
|
try:
|
||||||
|
TOPIC_TO_SEEDS = json.loads(_custom_seeds)
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
print(" [WARN] Invalid TOPIC_TO_SEEDS_JSON, using defaults", file=sys.stderr)
|
||||||
|
|
||||||
|
|
||||||
|
def derive_seeds(topic_counts):
|
||||||
|
"""Return ranked list of keyword seeds based on topic frequency."""
|
||||||
|
seeds = []
|
||||||
|
seen = set()
|
||||||
|
for topic, _ in topic_counts.most_common():
|
||||||
|
for seed in TOPIC_TO_SEEDS.get(topic, []):
|
||||||
|
if seed not in seen:
|
||||||
|
seeds.append(seed)
|
||||||
|
seen.add(seed)
|
||||||
|
# Add fallback seeds
|
||||||
|
fallbacks = [
|
||||||
|
"ai marketing", "seo services", "content marketing",
|
||||||
|
"digital marketing agency", "marketing automation",
|
||||||
|
"b2b lead generation", "marketing strategy",
|
||||||
|
]
|
||||||
|
for s in fallbacks:
|
||||||
|
if s not in seen:
|
||||||
|
seeds.append(s)
|
||||||
|
seen.add(s)
|
||||||
|
return seeds[:150]
|
||||||
|
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────
|
||||||
|
# 3. AHREFS KEYWORDS EXPLORER
|
||||||
|
# ─────────────────────────────────────────────
|
||||||
|
|
||||||
|
def fetch_ahrefs_keywords(seeds):
|
||||||
|
"""Pull Ahrefs Keywords Explorer data in batches of 50."""
|
||||||
|
if not AHREFS_TOKEN:
|
||||||
|
print(" [WARN] No AHREFS_TOKEN — skipping keyword data", file=sys.stderr)
|
||||||
|
return {}
|
||||||
|
|
||||||
|
results = {}
|
||||||
|
today = date.today()
|
||||||
|
date_to = today.replace(day=1) - timedelta(days=1)
|
||||||
|
date_from = (date_to.replace(day=1) - timedelta(days=335)).replace(day=1)
|
||||||
|
|
||||||
|
batch_size = 50
|
||||||
|
for i in range(0, len(seeds), batch_size):
|
||||||
|
batch = seeds[i:i+batch_size]
|
||||||
|
try:
|
||||||
|
import urllib.parse
|
||||||
|
qs = urllib.parse.urlencode({
|
||||||
|
"country": "us",
|
||||||
|
"keywords": ",".join(batch),
|
||||||
|
"select": "keyword,volume,difficulty,cpc,traffic_potential,intents,volume_monthly_history",
|
||||||
|
"volume_monthly_date_from": date_from.strftime("%Y-%m-%d"),
|
||||||
|
"volume_monthly_date_to": date_to.strftime("%Y-%m-%d"),
|
||||||
|
})
|
||||||
|
resp = requests.get(
|
||||||
|
f"{AHREFS_BASE}/keywords-explorer/overview?{qs}",
|
||||||
|
headers=AHREFS_HEADERS(),
|
||||||
|
timeout=30,
|
||||||
|
)
|
||||||
|
if resp.status_code == 200:
|
||||||
|
data = resp.json()
|
||||||
|
for kw_data in data.get("keywords", []):
|
||||||
|
kw = kw_data.get("keyword", "").lower()
|
||||||
|
if kw:
|
||||||
|
if "difficulty" in kw_data and "keyword_difficulty" not in kw_data:
|
||||||
|
kw_data["keyword_difficulty"] = kw_data["difficulty"]
|
||||||
|
intents = kw_data.get("intents", {})
|
||||||
|
if isinstance(intents, dict):
|
||||||
|
kw_data["is_commercial"] = intents.get("commercial", False)
|
||||||
|
kw_data["is_transactional"] = intents.get("transactional", False)
|
||||||
|
results[kw] = kw_data
|
||||||
|
else:
|
||||||
|
print(f" [WARN] Ahrefs keywords batch {i//batch_size+1}: HTTP {resp.status_code}", file=sys.stderr)
|
||||||
|
except Exception as e:
|
||||||
|
print(f" [WARN] Ahrefs keywords batch error: {e}", file=sys.stderr)
|
||||||
|
|
||||||
|
return results
|
||||||
|
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────
|
||||||
|
# 4. AHREFS ORGANIC KEYWORDS
|
||||||
|
# ─────────────────────────────────────────────
|
||||||
|
|
||||||
|
def fetch_organic_keywords(domain, limit=1000):
|
||||||
|
"""Pull Ahrefs organic keywords for a domain."""
|
||||||
|
if not AHREFS_TOKEN:
|
||||||
|
return []
|
||||||
|
|
||||||
|
today = date.today()
|
||||||
|
first_of_month = today.replace(day=1).strftime("%Y-%m-%d")
|
||||||
|
|
||||||
|
try:
|
||||||
|
resp = requests.get(
|
||||||
|
f"{AHREFS_BASE}/site-explorer/organic-keywords",
|
||||||
|
headers=AHREFS_HEADERS(),
|
||||||
|
params={
|
||||||
|
"target": domain,
|
||||||
|
"country": "us",
|
||||||
|
"date": first_of_month,
|
||||||
|
"select": "keyword,volume,best_position,keyword_difficulty,sum_traffic,is_commercial,is_transactional,best_position_url",
|
||||||
|
"order_by": "volume:desc",
|
||||||
|
"limit": limit,
|
||||||
|
"mode": "subdomains",
|
||||||
|
},
|
||||||
|
timeout=30,
|
||||||
|
)
|
||||||
|
if resp.status_code == 200:
|
||||||
|
return resp.json().get("keywords", [])
|
||||||
|
else:
|
||||||
|
print(f" [WARN] Ahrefs organic {domain}: HTTP {resp.status_code}", file=sys.stderr)
|
||||||
|
except Exception as e:
|
||||||
|
print(f" [WARN] Ahrefs organic {domain} error: {e}", file=sys.stderr)
|
||||||
|
return []
|
||||||
|
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────
|
||||||
|
# 5. COMPETITOR GAP ANALYSIS
|
||||||
|
# ─────────────────────────────────────────────
|
||||||
|
|
||||||
|
# Terms that indicate keywords are relevant to your business
|
||||||
|
# Customize this set for your niche
|
||||||
|
RELEVANT_TERMS = {
|
||||||
|
"marketing","seo","content","agency","digital","growth","lead","analytics",
|
||||||
|
"advertising","social","email","conversion","b2b","saas","strategy","ai",
|
||||||
|
"search","traffic","keyword","backlink","campaign","funnel","inbound",
|
||||||
|
"outbound","automation","crm","ppc","sem","cro","optimization",
|
||||||
|
"brand","performance","demand","revenue","roi","reporting","tools",
|
||||||
|
"software","platform","services","hire","consultant","pricing","best",
|
||||||
|
"vs","alternative","guide","how to","enterprise","startup","ecommerce",
|
||||||
|
"agent","aeo","answer engine","generative engine",
|
||||||
|
}
|
||||||
|
|
||||||
|
# Keywords to block from competitor gap results (noise)
|
||||||
|
GAP_BLOCKLIST = {
|
||||||
|
"photo search","reverse video","image search","reverse image",
|
||||||
|
"paragraph generator","paragraph writer","paragraph rewriter",
|
||||||
|
"sentence rewriter","text rewriter","text humanizer","ai rewrite",
|
||||||
|
"rewording tool","reword ai","paraphrasing tool","essay writer",
|
||||||
|
"grammar checker","spell checker","word counter","character counter",
|
||||||
|
"reviews","review","coupon","promo code","login","sign up","free trial",
|
||||||
|
"what is","definition of","wikipedia",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def is_relevant_keyword(kw):
|
||||||
|
"""Check if a keyword is relevant to your business."""
|
||||||
|
kw_lower = kw.lower()
|
||||||
|
if not any(term in kw_lower for term in RELEVANT_TERMS):
|
||||||
|
return False
|
||||||
|
if any(blocked in kw_lower for blocked in GAP_BLOCKLIST):
|
||||||
|
return False
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def find_competitor_gaps(my_keywords, competitor_data):
|
||||||
|
"""Find keywords where competitors rank top 20 but you don't rank or rank >50."""
|
||||||
|
my_positions = {}
|
||||||
|
for item in my_keywords:
|
||||||
|
kw = item.get("keyword", "").lower()
|
||||||
|
pos = item.get("best_position", 999)
|
||||||
|
my_positions[kw] = pos
|
||||||
|
|
||||||
|
gaps = []
|
||||||
|
seen_kws = set()
|
||||||
|
|
||||||
|
for comp_domain, comp_keywords in competitor_data.items():
|
||||||
|
for item in comp_keywords:
|
||||||
|
kw = item.get("keyword", "").lower()
|
||||||
|
if not kw or kw in seen_kws:
|
||||||
|
continue
|
||||||
|
if not is_relevant_keyword(kw):
|
||||||
|
continue
|
||||||
|
|
||||||
|
comp_pos = item.get("best_position", 999)
|
||||||
|
my_pos = my_positions.get(kw, 999)
|
||||||
|
|
||||||
|
if comp_pos <= 20 and my_pos > 50:
|
||||||
|
seen_kws.add(kw)
|
||||||
|
gaps.append({
|
||||||
|
"keyword": kw,
|
||||||
|
"volume": item.get("volume", 0),
|
||||||
|
"kd": item.get("keyword_difficulty", 0),
|
||||||
|
"competitor": comp_domain,
|
||||||
|
"comp_pos": comp_pos,
|
||||||
|
"your_pos": my_pos,
|
||||||
|
"is_commercial": item.get("is_commercial", False),
|
||||||
|
"is_transactional": item.get("is_transactional", False),
|
||||||
|
})
|
||||||
|
|
||||||
|
gaps.sort(key=lambda x: x.get("volume", 0), reverse=True)
|
||||||
|
return gaps
|
||||||
|
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────
|
||||||
|
# 6. GSC DATA
|
||||||
|
# ─────────────────────────────────────────────
|
||||||
|
|
||||||
|
def fetch_gsc_data():
|
||||||
|
"""Import gsc_client and pull 28d + 90d query data."""
|
||||||
|
try:
|
||||||
|
# Try importing from same directory
|
||||||
|
script_dir = Path(__file__).resolve().parent
|
||||||
|
spec = importlib.util.spec_from_file_location("gsc_client", str(script_dir / "gsc_client.py"))
|
||||||
|
gsc_mod = importlib.util.module_from_spec(spec)
|
||||||
|
spec.loader.exec_module(gsc_mod)
|
||||||
|
gsc = gsc_mod.GSCClient()
|
||||||
|
|
||||||
|
rows_28 = gsc.query(dimensions=["query"], row_limit=1000, days=28)
|
||||||
|
rows_90 = gsc.query(dimensions=["query"], row_limit=1000, days=90)
|
||||||
|
|
||||||
|
return rows_28, rows_90
|
||||||
|
except Exception as e:
|
||||||
|
print(f" [WARN] GSC error: {e}", file=sys.stderr)
|
||||||
|
return [], []
|
||||||
|
|
||||||
|
|
||||||
|
def find_decaying_pages(rows_28, rows_90):
|
||||||
|
"""Find queries that lost >30% clicks in 28d vs 90d per-day average."""
|
||||||
|
clicks_28 = {}
|
||||||
|
for row in rows_28:
|
||||||
|
keys = row.get("keys", [])
|
||||||
|
if keys:
|
||||||
|
clicks_28[keys[0].lower()] = row.get("clicks", 0)
|
||||||
|
|
||||||
|
clicks_90_norm = {}
|
||||||
|
for row in rows_90:
|
||||||
|
keys = row.get("keys", [])
|
||||||
|
if keys:
|
||||||
|
clicks_90_norm[keys[0].lower()] = row.get("clicks", 0) * (28 / 90)
|
||||||
|
|
||||||
|
decaying = []
|
||||||
|
for kw, c28 in clicks_28.items():
|
||||||
|
c90 = clicks_90_norm.get(kw, 0)
|
||||||
|
if c90 > 5:
|
||||||
|
if c28 < c90 * 0.7:
|
||||||
|
pct_loss = (c90 - c28) / c90 * 100
|
||||||
|
decaying.append({
|
||||||
|
"keyword": kw,
|
||||||
|
"clicks_28d": round(c28),
|
||||||
|
"clicks_90d_avg": round(c90),
|
||||||
|
"pct_loss": round(pct_loss, 1),
|
||||||
|
})
|
||||||
|
|
||||||
|
decaying.sort(key=lambda x: x.get("pct_loss", 0), reverse=True)
|
||||||
|
return decaying
|
||||||
|
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────
|
||||||
|
# 7 & 8. SCORING + TREND
|
||||||
|
# ─────────────────────────────────────────────
|
||||||
|
|
||||||
|
def compute_trend(history):
|
||||||
|
"""Compare first 3 months avg to last 3 months avg from volume_monthly_history."""
|
||||||
|
if not history or len(history) < 3:
|
||||||
|
return 0.0, "→ Stable"
|
||||||
|
|
||||||
|
volumes = []
|
||||||
|
try:
|
||||||
|
if isinstance(history[0], dict):
|
||||||
|
sorted_h = sorted(history, key=lambda x: x.get("date", x.get("month", "")))
|
||||||
|
volumes = [h.get("volume", h.get("search_volume", 0)) for h in sorted_h]
|
||||||
|
else:
|
||||||
|
volumes = [int(v) for v in history]
|
||||||
|
except Exception:
|
||||||
|
return 0.0, "→ Stable"
|
||||||
|
|
||||||
|
if len(volumes) < 6:
|
||||||
|
return 0.0, "→ Stable"
|
||||||
|
|
||||||
|
early_avg = sum(volumes[:3]) / 3
|
||||||
|
late_avg = sum(volumes[-3:]) / 3
|
||||||
|
|
||||||
|
if early_avg == 0:
|
||||||
|
pct = 100.0 if late_avg > 0 else 0.0
|
||||||
|
else:
|
||||||
|
pct = (late_avg - early_avg) / early_avg * 100
|
||||||
|
|
||||||
|
if pct > 50:
|
||||||
|
label = "🔥 Surging"
|
||||||
|
elif pct > 20:
|
||||||
|
label = "📈 Rising"
|
||||||
|
elif pct > 5:
|
||||||
|
label = "↗️ Growing"
|
||||||
|
elif pct >= -5:
|
||||||
|
label = "→ Stable"
|
||||||
|
elif pct >= -20:
|
||||||
|
label = "↘️ Declining"
|
||||||
|
else:
|
||||||
|
label = "📉 Falling"
|
||||||
|
|
||||||
|
return round(pct, 1), label
|
||||||
|
|
||||||
|
|
||||||
|
def make_sparkline(history):
|
||||||
|
"""ASCII sparkline from volume history."""
|
||||||
|
SPARKS = "▁▂▃▄▅▆▇█"
|
||||||
|
if not history:
|
||||||
|
return ""
|
||||||
|
try:
|
||||||
|
if isinstance(history[0], dict):
|
||||||
|
sorted_h = sorted(history, key=lambda x: x.get("date", x.get("month", "")))
|
||||||
|
volumes = [h.get("volume", h.get("search_volume", 0)) for h in sorted_h]
|
||||||
|
else:
|
||||||
|
volumes = [int(v) for v in history]
|
||||||
|
except Exception:
|
||||||
|
return ""
|
||||||
|
|
||||||
|
if not volumes or max(volumes) == 0:
|
||||||
|
return "▁" * min(len(volumes), 12)
|
||||||
|
|
||||||
|
mn, mx = min(volumes), max(volumes)
|
||||||
|
rng = mx - mn or 1
|
||||||
|
return "".join(SPARKS[min(7, int((v - mn) / rng * 7))] for v in volumes[-12:])
|
||||||
|
|
||||||
|
|
||||||
|
def funnel_stage(kw, is_commercial=False, is_transactional=False):
|
||||||
|
"""Classify keyword into funnel stage."""
|
||||||
|
kw_lower = kw.lower()
|
||||||
|
bofu_terms = ["agency", "services", "hire", "pricing", "tools", "software",
|
||||||
|
"best", " vs ", "alternative", "platform", "cost", "price",
|
||||||
|
"company", "firms", "consultant", "consultancy", "outsource"]
|
||||||
|
mofu_terms = ["how to", "guide", "strategy", "examples", "case study",
|
||||||
|
"roi", "tutorial", "template", "checklist", "tips", "framework",
|
||||||
|
"what is", "explained", "overview", "comparison"]
|
||||||
|
|
||||||
|
if is_commercial or is_transactional:
|
||||||
|
return "BOFU"
|
||||||
|
if any(t in kw_lower for t in bofu_terms):
|
||||||
|
return "BOFU"
|
||||||
|
if any(t in kw_lower for t in mofu_terms):
|
||||||
|
return "MOFU"
|
||||||
|
return "TOFU"
|
||||||
|
|
||||||
|
|
||||||
|
def execution_path(kd, current_pos, volume=0):
|
||||||
|
"""Determine execution path based on difficulty and current ranking."""
|
||||||
|
has_page = current_pos < 999
|
||||||
|
if kd <= 20 and not has_page:
|
||||||
|
return "🤖 AUTO — create new content"
|
||||||
|
if has_page and kd <= 50:
|
||||||
|
return "🤖 AUTO — refresh existing content"
|
||||||
|
if kd <= 40:
|
||||||
|
return "🤖+👤 SEMI — AI drafts, team reviews"
|
||||||
|
if kd <= 60:
|
||||||
|
return "👤+🤖 TEAM — writes content, AI optimizes"
|
||||||
|
return "👤 TEAM — expert content + link building"
|
||||||
|
|
||||||
|
|
||||||
|
def score_keyword(kw_data, current_pos=999, topic_counts=None):
|
||||||
|
"""Score a keyword dict with Impact × Confidence."""
|
||||||
|
volume = kw_data.get("volume", 0) or 0
|
||||||
|
kd = kw_data.get("keyword_difficulty", kw_data.get("kd", 50)) or 50
|
||||||
|
cpc = float(kw_data.get("cpc", 0) or 0)
|
||||||
|
history = kw_data.get("volume_monthly_history", [])
|
||||||
|
is_commercial = kw_data.get("is_commercial", False)
|
||||||
|
is_transactional = kw_data.get("is_transactional", False)
|
||||||
|
kw = kw_data.get("keyword", "").lower()
|
||||||
|
|
||||||
|
trend_pct, trend_label = compute_trend(history)
|
||||||
|
sparkline = make_sparkline(history)
|
||||||
|
stage = funnel_stage(kw, is_commercial, is_transactional)
|
||||||
|
|
||||||
|
# ── Impact (0-10) ──
|
||||||
|
impact = 0
|
||||||
|
if volume >= 10000:
|
||||||
|
impact += 3
|
||||||
|
elif volume >= 2000:
|
||||||
|
impact += 2
|
||||||
|
elif volume >= 500:
|
||||||
|
impact += 1
|
||||||
|
|
||||||
|
if cpc >= 15:
|
||||||
|
impact += 3
|
||||||
|
elif cpc >= 5:
|
||||||
|
impact += 2
|
||||||
|
elif cpc >= 1:
|
||||||
|
impact += 1
|
||||||
|
|
||||||
|
if stage == "BOFU":
|
||||||
|
impact += 2
|
||||||
|
elif stage == "MOFU":
|
||||||
|
impact += 1
|
||||||
|
|
||||||
|
if trend_pct > 50:
|
||||||
|
impact += 2
|
||||||
|
elif trend_pct > 20:
|
||||||
|
impact += 1
|
||||||
|
|
||||||
|
impact = min(10, impact)
|
||||||
|
|
||||||
|
# ── Confidence (0-10) ──
|
||||||
|
confidence = 0
|
||||||
|
if kd <= 10:
|
||||||
|
confidence += 4
|
||||||
|
elif kd <= 20:
|
||||||
|
confidence += 3
|
||||||
|
elif kd <= 35:
|
||||||
|
confidence += 2
|
||||||
|
elif kd <= 50:
|
||||||
|
confidence += 1
|
||||||
|
|
||||||
|
if current_pos <= 10:
|
||||||
|
confidence += 3
|
||||||
|
elif current_pos <= 30:
|
||||||
|
confidence += 2
|
||||||
|
elif current_pos <= 50:
|
||||||
|
confidence += 1
|
||||||
|
|
||||||
|
# Topic authority: check if keyword topic appears in content fingerprint
|
||||||
|
if topic_counts:
|
||||||
|
for topic, cnt in topic_counts.items():
|
||||||
|
topic_seeds = TOPIC_TO_SEEDS.get(topic, [])
|
||||||
|
if any(seed.lower() in kw or kw in seed.lower() for seed in topic_seeds):
|
||||||
|
if cnt > 5:
|
||||||
|
confidence += 2
|
||||||
|
break
|
||||||
|
|
||||||
|
confidence = min(10, confidence)
|
||||||
|
|
||||||
|
priority = impact * confidence
|
||||||
|
|
||||||
|
exec_path = execution_path(kd, current_pos, volume)
|
||||||
|
|
||||||
|
return {
|
||||||
|
"keyword": kw,
|
||||||
|
"volume": volume,
|
||||||
|
"kd": kd,
|
||||||
|
"cpc": round(cpc, 2),
|
||||||
|
"traffic_potential": kw_data.get("traffic_potential", 0),
|
||||||
|
"current_pos": current_pos if current_pos < 999 else None,
|
||||||
|
"stage": stage,
|
||||||
|
"trend_pct": trend_pct,
|
||||||
|
"trend_label": trend_label,
|
||||||
|
"sparkline": sparkline,
|
||||||
|
"impact": impact,
|
||||||
|
"confidence": confidence,
|
||||||
|
"priority": priority,
|
||||||
|
"exec_path": exec_path,
|
||||||
|
"is_commercial": is_commercial,
|
||||||
|
"is_transactional": is_transactional,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────
|
||||||
|
# OUTPUT FORMATTING
|
||||||
|
# ─────────────────────────────────────────────
|
||||||
|
|
||||||
|
def fmt_vol(v):
|
||||||
|
if not v:
|
||||||
|
return "—"
|
||||||
|
if v >= 1000000:
|
||||||
|
return f"{v/1000000:.1f}M"
|
||||||
|
if v >= 1000:
|
||||||
|
return f"{v/1000:.1f}K"
|
||||||
|
return str(v)
|
||||||
|
|
||||||
|
def fmt_pos(p):
|
||||||
|
if p is None:
|
||||||
|
return "—"
|
||||||
|
return f"#{p}"
|
||||||
|
|
||||||
|
def fmt_kd(k):
|
||||||
|
if k is None:
|
||||||
|
return "—"
|
||||||
|
if k <= 20:
|
||||||
|
return f"KD{k}🟢"
|
||||||
|
if k <= 40:
|
||||||
|
return f"KD{k}🟡"
|
||||||
|
if k <= 60:
|
||||||
|
return f"KD{k}🟠"
|
||||||
|
return f"KD{k}🔴"
|
||||||
|
|
||||||
|
def fmt_cpc(c):
|
||||||
|
if not c:
|
||||||
|
return "—"
|
||||||
|
return f"${c:.2f}"
|
||||||
|
|
||||||
|
def print_kw_row(scored, idx=None):
|
||||||
|
prefix = f" {idx:>2}. " if idx else " "
|
||||||
|
pos_str = fmt_pos(scored.get("current_pos"))
|
||||||
|
trend = scored.get("trend_label", "→ Stable")
|
||||||
|
spark = scored.get("sparkline", "")
|
||||||
|
kw = scored.get("keyword", "")
|
||||||
|
vol = fmt_vol(scored.get("volume"))
|
||||||
|
kd = fmt_kd(scored.get("kd"))
|
||||||
|
cpc = fmt_cpc(scored.get("cpc"))
|
||||||
|
imp = scored.get("impact", 0)
|
||||||
|
conf = scored.get("confidence", 0)
|
||||||
|
pri = scored.get("priority", 0)
|
||||||
|
stage = scored.get("stage", "TOFU")
|
||||||
|
ep = scored.get("exec_path", "")
|
||||||
|
|
||||||
|
print(f"{prefix}{kw}")
|
||||||
|
print(f" Vol:{vol} {kd} CPC:{cpc} Pos:{pos_str} [{stage}]")
|
||||||
|
print(f" Trend: {trend} {spark} ({scored.get('trend_pct', 0):+.0f}%)")
|
||||||
|
print(f" Impact:{imp} Conf:{conf} Priority:{pri}")
|
||||||
|
print(f" {ep}")
|
||||||
|
print()
|
||||||
|
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────
|
||||||
|
# MAIN
|
||||||
|
# ─────────────────────────────────────────────
|
||||||
|
|
||||||
|
def main():
|
||||||
|
today = date.today()
|
||||||
|
week_str = today.strftime("%B %d, %Y")
|
||||||
|
|
||||||
|
print("=" * 68)
|
||||||
|
print(f"🎯 CONTENT ATTACK BRIEF — {YOUR_DOMAIN}")
|
||||||
|
print(" Content Fingerprint × Ahrefs × GSC × Competitor Gaps")
|
||||||
|
print(f" Week of {week_str}")
|
||||||
|
print("=" * 68)
|
||||||
|
print()
|
||||||
|
|
||||||
|
# ── Step 1: Topic fingerprint ──
|
||||||
|
print("📡 Ingesting content library...", file=sys.stderr)
|
||||||
|
topic_counts, phrase_counts = extract_fingerprint()
|
||||||
|
|
||||||
|
print()
|
||||||
|
print("🧬 TOPIC FINGERPRINT (30-day content)")
|
||||||
|
print()
|
||||||
|
if topic_counts:
|
||||||
|
max_count = max(topic_counts.values())
|
||||||
|
for topic, count in topic_counts.most_common(15):
|
||||||
|
bar_len = max(1, int(count / max_count * 30))
|
||||||
|
bar = "█" * bar_len
|
||||||
|
print(f" {topic:<25} {bar} {count}")
|
||||||
|
else:
|
||||||
|
print(" [No content found in CONTENT_DIR]")
|
||||||
|
print()
|
||||||
|
|
||||||
|
# ── Step 2: Seeds ──
|
||||||
|
print("🔍 Deriving keyword seeds...", file=sys.stderr)
|
||||||
|
seeds = derive_seeds(topic_counts)
|
||||||
|
print(f" Derived {len(seeds)} keyword seeds from topic fingerprint")
|
||||||
|
print()
|
||||||
|
|
||||||
|
# ── Step 3: Ahrefs keywords ──
|
||||||
|
print("📊 Pulling Ahrefs keyword data...", file=sys.stderr)
|
||||||
|
seeds_data = fetch_ahrefs_keywords(seeds)
|
||||||
|
print(f" Got data for {len(seeds_data)} keywords", file=sys.stderr)
|
||||||
|
|
||||||
|
# ── Step 4: Your organic keywords ──
|
||||||
|
print(f"🌐 Pulling {YOUR_DOMAIN} organic keywords...", file=sys.stderr)
|
||||||
|
my_keywords = fetch_organic_keywords(YOUR_DOMAIN, limit=1000)
|
||||||
|
print(f" Got {len(my_keywords)} organic keywords", file=sys.stderr)
|
||||||
|
|
||||||
|
my_positions = {}
|
||||||
|
my_kw_data = {}
|
||||||
|
for item in my_keywords:
|
||||||
|
kw = item.get("keyword", "").lower()
|
||||||
|
my_positions[kw] = item.get("best_position", 999)
|
||||||
|
my_kw_data[kw] = item
|
||||||
|
|
||||||
|
# ── Step 5: Competitor gaps ──
|
||||||
|
competitor_gaps = []
|
||||||
|
if COMPETITORS:
|
||||||
|
print("🕵️ Pulling competitor keywords...", file=sys.stderr)
|
||||||
|
competitor_data = {}
|
||||||
|
for comp in COMPETITORS:
|
||||||
|
print(f" {comp}...", file=sys.stderr)
|
||||||
|
competitor_data[comp] = fetch_organic_keywords(comp, limit=200)
|
||||||
|
competitor_gaps = find_competitor_gaps(my_keywords, competitor_data)
|
||||||
|
print(f" Found {len(competitor_gaps)} competitor gap keywords", file=sys.stderr)
|
||||||
|
|
||||||
|
# ── Step 6: GSC data ──
|
||||||
|
print("📈 Pulling GSC data...", file=sys.stderr)
|
||||||
|
rows_28, rows_90 = fetch_gsc_data()
|
||||||
|
print(f" GSC 28d: {len(rows_28)} queries, 90d: {len(rows_90)} queries", file=sys.stderr)
|
||||||
|
decaying = find_decaying_pages(rows_28, rows_90)
|
||||||
|
|
||||||
|
# ── Step 7: Score all keywords ──
|
||||||
|
print("⚡ Scoring keywords...", file=sys.stderr)
|
||||||
|
all_scored = []
|
||||||
|
|
||||||
|
for kw, data in seeds_data.items():
|
||||||
|
pos = my_positions.get(kw, 999)
|
||||||
|
scored = score_keyword(data, current_pos=pos, topic_counts=topic_counts)
|
||||||
|
all_scored.append(scored)
|
||||||
|
|
||||||
|
for item in my_keywords:
|
||||||
|
kw = item.get("keyword", "").lower()
|
||||||
|
if kw not in seeds_data:
|
||||||
|
pos = item.get("best_position", 999)
|
||||||
|
scored = score_keyword(
|
||||||
|
{
|
||||||
|
"keyword": kw,
|
||||||
|
"volume": item.get("volume", 0),
|
||||||
|
"keyword_difficulty": item.get("keyword_difficulty", 50),
|
||||||
|
"cpc": 0,
|
||||||
|
"is_commercial": item.get("is_commercial", False),
|
||||||
|
"is_transactional": item.get("is_transactional", False),
|
||||||
|
},
|
||||||
|
current_pos=pos,
|
||||||
|
topic_counts=topic_counts,
|
||||||
|
)
|
||||||
|
all_scored.append(scored)
|
||||||
|
|
||||||
|
# Deduplicate
|
||||||
|
seen_kws = set()
|
||||||
|
deduped = []
|
||||||
|
for s in sorted(all_scored, key=lambda x: x["priority"], reverse=True):
|
||||||
|
if s["keyword"] not in seen_kws:
|
||||||
|
seen_kws.add(s["keyword"])
|
||||||
|
deduped.append(s)
|
||||||
|
all_scored = deduped
|
||||||
|
|
||||||
|
# ── BOFU: Money Keywords ──
|
||||||
|
bofu = [s for s in all_scored if s["stage"] == "BOFU"]
|
||||||
|
bofu.sort(key=lambda x: x["priority"], reverse=True)
|
||||||
|
|
||||||
|
print("💰 BOFU: MONEY KEYWORDS (top 12)")
|
||||||
|
print()
|
||||||
|
for i, s in enumerate(bofu[:12], 1):
|
||||||
|
print_kw_row(s, i)
|
||||||
|
|
||||||
|
# ── Trending ──
|
||||||
|
trending = sorted(all_scored, key=lambda x: x.get("trend_pct", 0), reverse=True)
|
||||||
|
trending = [t for t in trending if t.get("trend_pct", 0) > 5]
|
||||||
|
|
||||||
|
print("🔥 TRENDING: Fastest-growing (top 10)")
|
||||||
|
print()
|
||||||
|
for i, s in enumerate(trending[:10], 1):
|
||||||
|
print_kw_row(s, i)
|
||||||
|
|
||||||
|
# ── Competitor Gaps ──
|
||||||
|
if competitor_gaps:
|
||||||
|
print("🕳️ COMPETITOR GAP (top 15 relevant)")
|
||||||
|
print()
|
||||||
|
for i, gap in enumerate(competitor_gaps[:15], 1):
|
||||||
|
vol = fmt_vol(gap.get("volume", 0))
|
||||||
|
kd = fmt_kd(gap.get("kd", 0))
|
||||||
|
comp = gap.get("competitor", "")
|
||||||
|
comp_pos = gap.get("comp_pos", "?")
|
||||||
|
your_pos = gap.get("your_pos", 999)
|
||||||
|
your_pos_str = "not ranking" if your_pos >= 999 else f"#{your_pos}"
|
||||||
|
stage = funnel_stage(gap["keyword"], gap.get("is_commercial", False), gap.get("is_transactional", False))
|
||||||
|
ep = execution_path(gap.get("kd", 50), your_pos)
|
||||||
|
print(f" {i:>2}. {gap['keyword']}")
|
||||||
|
print(f" Vol:{vol} {kd} [{stage}] {comp} #{comp_pos} You:{your_pos_str}")
|
||||||
|
print(f" {ep}")
|
||||||
|
print()
|
||||||
|
|
||||||
|
# ── Decay Alert ──
|
||||||
|
print("📉 DECAY ALERT: Pages losing traffic (top 10)")
|
||||||
|
print()
|
||||||
|
if decaying:
|
||||||
|
for i, d in enumerate(decaying[:10], 1):
|
||||||
|
kw = d["keyword"]
|
||||||
|
c28 = d["clicks_28d"]
|
||||||
|
c90 = d["clicks_90d_avg"]
|
||||||
|
loss = d["pct_loss"]
|
||||||
|
pos = my_positions.get(kw)
|
||||||
|
pos_str = fmt_pos(pos) if pos and pos < 999 else "?"
|
||||||
|
print(f" {i:>2}. {kw}")
|
||||||
|
print(f" 28d clicks: {c28} 90d avg: {c90} Loss: {loss:.0f}% Pos: {pos_str}")
|
||||||
|
print()
|
||||||
|
else:
|
||||||
|
print(" No significant decay detected (GSC may be unavailable)")
|
||||||
|
print()
|
||||||
|
|
||||||
|
# ── Execution Pipeline ──
|
||||||
|
print("⚡ EXECUTION PIPELINE (crawl → walk → run)")
|
||||||
|
print()
|
||||||
|
pipeline = defaultdict(list)
|
||||||
|
for s in all_scored:
|
||||||
|
pipeline[s["exec_path"]].append(s)
|
||||||
|
|
||||||
|
order = [
|
||||||
|
"🤖 AUTO — create new content",
|
||||||
|
"🤖 AUTO — refresh existing content",
|
||||||
|
"🤖+👤 SEMI — AI drafts, team reviews",
|
||||||
|
"👤+🤖 TEAM — writes content, AI optimizes",
|
||||||
|
"👤 TEAM — expert content + link building",
|
||||||
|
]
|
||||||
|
for path in order:
|
||||||
|
items = pipeline.get(path, [])
|
||||||
|
if not items:
|
||||||
|
continue
|
||||||
|
print(f" {path} ({len(items)} keywords)")
|
||||||
|
top = sorted(items, key=lambda x: x["priority"], reverse=True)[:5]
|
||||||
|
for kw_item in top:
|
||||||
|
vol = fmt_vol(kw_item["volume"])
|
||||||
|
kd = kw_item["kd"]
|
||||||
|
pri = kw_item["priority"]
|
||||||
|
print(f" • {kw_item['keyword']} Vol:{vol} KD:{kd} Pri:{pri}")
|
||||||
|
print()
|
||||||
|
|
||||||
|
# ── Summary ──
|
||||||
|
print("📊 SUMMARY")
|
||||||
|
print()
|
||||||
|
bofu_count = len([s for s in all_scored if s["stage"] == "BOFU"])
|
||||||
|
mofu_count = len([s for s in all_scored if s["stage"] == "MOFU"])
|
||||||
|
tofu_count = len([s for s in all_scored if s["stage"] == "TOFU"])
|
||||||
|
auto_count = len(pipeline.get("🤖 AUTO — create new content", []))
|
||||||
|
refresh_count = len(pipeline.get("🤖 AUTO — refresh existing content", []))
|
||||||
|
surging = [s for s in all_scored if "Surging" in s.get("trend_label", "")]
|
||||||
|
|
||||||
|
print(f" Keywords analyzed: {len(all_scored)}")
|
||||||
|
print(f" Competitor gaps: {len(competitor_gaps)}")
|
||||||
|
print(f" Decaying pages: {len(decaying)}")
|
||||||
|
print(f" BOFU / MOFU / TOFU: {bofu_count} / {mofu_count} / {tofu_count}")
|
||||||
|
print(f" Auto-create ready: {auto_count}")
|
||||||
|
print(f" Auto-refresh ready: {refresh_count}")
|
||||||
|
print(f" Surging keywords: {len(surging)}")
|
||||||
|
print(f" Top topics covered: {', '.join(t for t, _ in topic_counts.most_common(5))}")
|
||||||
|
print()
|
||||||
|
print("=" * 68)
|
||||||
|
|
||||||
|
# ── Save JSON ──
|
||||||
|
json_output = {
|
||||||
|
"generated_at": datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ"),
|
||||||
|
"week_of": week_str,
|
||||||
|
"domain": YOUR_DOMAIN,
|
||||||
|
"topic_fingerprint": dict(topic_counts.most_common(20)),
|
||||||
|
"all_keywords": all_scored,
|
||||||
|
"competitor_gaps": competitor_gaps[:30],
|
||||||
|
"decaying_pages": decaying[:20],
|
||||||
|
"summary": {
|
||||||
|
"total_keywords": len(all_scored),
|
||||||
|
"competitor_gaps": len(competitor_gaps),
|
||||||
|
"decaying_pages": len(decaying),
|
||||||
|
"bofu": bofu_count,
|
||||||
|
"mofu": mofu_count,
|
||||||
|
"tofu": tofu_count,
|
||||||
|
"auto_create": auto_count,
|
||||||
|
"auto_refresh": refresh_count,
|
||||||
|
"surging": len(surging),
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
output_path = OUTPUT_DIR / "content-attack-brief-latest.json"
|
||||||
|
output_path.write_text(json.dumps(json_output, indent=2))
|
||||||
|
print(f"\n✅ JSON saved to {output_path}", file=sys.stderr)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
131
seo-ops/gsc_auth.py
Normal file
131
seo-ops/gsc_auth.py
Normal file
|
|
@ -0,0 +1,131 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Google Search Console OAuth Setup
|
||||||
|
|
||||||
|
One-time authentication flow for GSC API access.
|
||||||
|
Opens a browser for Google Sign-In, exchanges the auth code for a token,
|
||||||
|
and saves the token locally for use by gsc_client.py.
|
||||||
|
|
||||||
|
Prerequisites:
|
||||||
|
1. Create a Google Cloud project with Search Console API enabled
|
||||||
|
2. Create OAuth 2.0 credentials (Desktop application type)
|
||||||
|
3. Set GOOGLE_CLIENT_ID and GOOGLE_CLIENT_SECRET env vars
|
||||||
|
OR set GOOGLE_CREDENTIALS_FILE to a JSON file with client_id/client_secret
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python gsc_auth.py
|
||||||
|
"""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import webbrowser
|
||||||
|
from http.server import HTTPServer, BaseHTTPRequestHandler
|
||||||
|
from urllib.parse import urlparse, parse_qs
|
||||||
|
import requests
|
||||||
|
|
||||||
|
# Configuration
|
||||||
|
CLIENT_ID = os.environ.get("GOOGLE_CLIENT_ID", "")
|
||||||
|
CLIENT_SECRET = os.environ.get("GOOGLE_CLIENT_SECRET", "")
|
||||||
|
REDIRECT_URI = os.environ.get("GSC_REDIRECT_URI", "http://localhost:8765")
|
||||||
|
SCOPES = "https://www.googleapis.com/auth/webmasters.readonly"
|
||||||
|
TOKEN_FILE = os.environ.get("GSC_TOKEN_FILE", os.path.join(os.path.dirname(__file__), ".gsc-token.json"))
|
||||||
|
|
||||||
|
# Try loading from credentials file if env vars not set
|
||||||
|
CREDS_FILE = os.environ.get("GOOGLE_CREDENTIALS_FILE", "")
|
||||||
|
if CREDS_FILE and os.path.exists(CREDS_FILE) and (not CLIENT_ID or not CLIENT_SECRET):
|
||||||
|
with open(CREDS_FILE) as f:
|
||||||
|
creds = json.load(f)
|
||||||
|
CLIENT_ID = CLIENT_ID or creds.get("client_id", "")
|
||||||
|
CLIENT_SECRET = CLIENT_SECRET or creds.get("client_secret", "")
|
||||||
|
|
||||||
|
if not CLIENT_ID or not CLIENT_SECRET:
|
||||||
|
print("ERROR: Google OAuth credentials required.")
|
||||||
|
print("Set GOOGLE_CLIENT_ID and GOOGLE_CLIENT_SECRET environment variables,")
|
||||||
|
print("or set GOOGLE_CREDENTIALS_FILE to a JSON file with client_id/client_secret.")
|
||||||
|
print("\nTo create credentials:")
|
||||||
|
print(" 1. Go to https://console.cloud.google.com/apis/credentials")
|
||||||
|
print(" 2. Create OAuth 2.0 Client ID (Desktop application)")
|
||||||
|
print(" 3. Download the JSON and set GOOGLE_CREDENTIALS_FILE, or copy the values")
|
||||||
|
exit(1)
|
||||||
|
|
||||||
|
auth_code = None
|
||||||
|
|
||||||
|
class CallbackHandler(BaseHTTPRequestHandler):
|
||||||
|
def do_GET(self):
|
||||||
|
global auth_code
|
||||||
|
query = parse_qs(urlparse(self.path).query)
|
||||||
|
auth_code = query.get("code", [None])[0]
|
||||||
|
self.send_response(200)
|
||||||
|
self.send_header("Content-Type", "text/html")
|
||||||
|
self.end_headers()
|
||||||
|
self.wfile.write(b"<h1>GSC Authorized! You can close this tab.</h1>")
|
||||||
|
def log_message(self, *args):
|
||||||
|
pass
|
||||||
|
|
||||||
|
# Build auth URL
|
||||||
|
auth_url = (
|
||||||
|
f"https://accounts.google.com/o/oauth2/v2/auth?"
|
||||||
|
f"client_id={CLIENT_ID}&"
|
||||||
|
f"redirect_uri={REDIRECT_URI}&"
|
||||||
|
f"response_type=code&"
|
||||||
|
f"scope={SCOPES}&"
|
||||||
|
f"access_type=offline&"
|
||||||
|
f"prompt=consent"
|
||||||
|
)
|
||||||
|
|
||||||
|
print("Opening browser for Google Sign-In...")
|
||||||
|
print(f"If the browser doesn't open, visit:\n{auth_url}\n")
|
||||||
|
webbrowser.open(auth_url)
|
||||||
|
|
||||||
|
# Wait for callback
|
||||||
|
port = int(REDIRECT_URI.split(":")[-1])
|
||||||
|
server = HTTPServer(("localhost", port), CallbackHandler)
|
||||||
|
server.handle_request()
|
||||||
|
|
||||||
|
if not auth_code:
|
||||||
|
print("ERROR: No auth code received")
|
||||||
|
exit(1)
|
||||||
|
|
||||||
|
# Exchange code for token
|
||||||
|
print("Exchanging code for token...")
|
||||||
|
resp = requests.post("https://oauth2.googleapis.com/token", data={
|
||||||
|
"code": auth_code,
|
||||||
|
"client_id": CLIENT_ID,
|
||||||
|
"client_secret": CLIENT_SECRET,
|
||||||
|
"redirect_uri": REDIRECT_URI,
|
||||||
|
"grant_type": "authorization_code"
|
||||||
|
})
|
||||||
|
|
||||||
|
if resp.status_code != 200:
|
||||||
|
print(f"ERROR: Token exchange failed — {resp.text}")
|
||||||
|
exit(1)
|
||||||
|
|
||||||
|
token_data = resp.json()
|
||||||
|
with open(TOKEN_FILE, "w") as f:
|
||||||
|
json.dump(token_data, f, indent=2)
|
||||||
|
os.chmod(TOKEN_FILE, 0o600)
|
||||||
|
|
||||||
|
print(f"✅ GSC token saved to {TOKEN_FILE}")
|
||||||
|
|
||||||
|
# Quick verification
|
||||||
|
try:
|
||||||
|
from google.oauth2.credentials import Credentials
|
||||||
|
from googleapiclient.discovery import build
|
||||||
|
|
||||||
|
cred = Credentials(
|
||||||
|
token=token_data["access_token"],
|
||||||
|
refresh_token=token_data.get("refresh_token"),
|
||||||
|
token_uri="https://oauth2.googleapis.com/token",
|
||||||
|
client_id=CLIENT_ID,
|
||||||
|
client_secret=CLIENT_SECRET,
|
||||||
|
)
|
||||||
|
service = build("searchconsole", "v1", credentials=cred)
|
||||||
|
sites = service.sites().list().execute()
|
||||||
|
site_urls = [s["siteUrl"] for s in sites.get("siteEntry", [])]
|
||||||
|
print(f"✅ GSC connected! Verified sites: {site_urls}")
|
||||||
|
print(f"\nSet GSC_SITE_URL to one of the above, e.g.:")
|
||||||
|
if site_urls:
|
||||||
|
print(f' export GSC_SITE_URL="{site_urls[0]}"')
|
||||||
|
except Exception as e:
|
||||||
|
print(f"⚠️ Token saved but verification failed: {e}")
|
||||||
|
print("The token should still work — try running gsc_client.py")
|
||||||
250
seo-ops/gsc_client.py
Normal file
250
seo-ops/gsc_client.py
Normal file
|
|
@ -0,0 +1,250 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Google Search Console API Client
|
||||||
|
|
||||||
|
Direct API access via google-api-python-client. Token auto-refreshes on every call.
|
||||||
|
|
||||||
|
Usage as library:
|
||||||
|
from gsc_client import GSCClient
|
||||||
|
gsc = GSCClient()
|
||||||
|
gsc = GSCClient(site_url="https://www.example.com/")
|
||||||
|
|
||||||
|
# Top queries
|
||||||
|
rows = gsc.query(dimensions=["query"], row_limit=25, days=28)
|
||||||
|
|
||||||
|
# Page performance
|
||||||
|
rows = gsc.query(dimensions=["page"], row_limit=100, days=7)
|
||||||
|
|
||||||
|
# Striking distance keywords (positions 4-20)
|
||||||
|
rows = gsc.striking_distance(days=28)
|
||||||
|
|
||||||
|
# List all verified sites
|
||||||
|
sites = gsc.list_sites()
|
||||||
|
|
||||||
|
Usage as CLI:
|
||||||
|
python gsc_client.py --queries 25 --days 28
|
||||||
|
python gsc_client.py --pages 100 --days 7
|
||||||
|
python gsc_client.py --striking
|
||||||
|
python gsc_client.py --sites
|
||||||
|
python gsc_client.py --site "https://www.example.com/" --queries 10
|
||||||
|
python gsc_client.py --raw '{"dimensions":["query","page"],"rowLimit":5}'
|
||||||
|
"""
|
||||||
|
|
||||||
|
import json, os, sys, argparse
|
||||||
|
from datetime import datetime, timedelta
|
||||||
|
|
||||||
|
# Configuration via environment variables
|
||||||
|
GSC_SITE_URL = os.environ.get("GSC_SITE_URL", "")
|
||||||
|
GSC_TOKEN_FILE = os.environ.get("GSC_TOKEN_FILE", os.path.join(os.path.dirname(__file__), ".gsc-token.json"))
|
||||||
|
GOOGLE_CREDENTIALS_FILE = os.environ.get("GOOGLE_CREDENTIALS_FILE", "")
|
||||||
|
|
||||||
|
|
||||||
|
class GSCClient:
|
||||||
|
def __init__(self, site_url=None, token_file=None, creds_file=None):
|
||||||
|
self.site_url = site_url or GSC_SITE_URL
|
||||||
|
if not self.site_url:
|
||||||
|
raise ValueError(
|
||||||
|
"GSC site URL required. Set GSC_SITE_URL env var or pass site_url parameter.\n"
|
||||||
|
"Example: GSC_SITE_URL='https://www.example.com/'"
|
||||||
|
)
|
||||||
|
self.token_file = token_file or GSC_TOKEN_FILE
|
||||||
|
self.creds_file = creds_file or GOOGLE_CREDENTIALS_FILE
|
||||||
|
self._service = None
|
||||||
|
|
||||||
|
def _get_service(self):
|
||||||
|
if self._service:
|
||||||
|
return self._service
|
||||||
|
|
||||||
|
from google.oauth2.credentials import Credentials
|
||||||
|
from google.auth.transport.requests import Request
|
||||||
|
from googleapiclient.discovery import build
|
||||||
|
|
||||||
|
if not os.path.exists(self.token_file):
|
||||||
|
raise FileNotFoundError(
|
||||||
|
f"GSC token file not found: {self.token_file}\n"
|
||||||
|
"Run gsc_auth.py first to authenticate with Google Search Console."
|
||||||
|
)
|
||||||
|
|
||||||
|
with open(self.token_file) as f:
|
||||||
|
token_data = json.load(f)
|
||||||
|
|
||||||
|
# Build credentials — client ID/secret can come from token file, creds file, or env vars
|
||||||
|
client_id = os.environ.get("GOOGLE_CLIENT_ID", "")
|
||||||
|
client_secret = os.environ.get("GOOGLE_CLIENT_SECRET", "")
|
||||||
|
|
||||||
|
if self.creds_file and os.path.exists(self.creds_file):
|
||||||
|
with open(self.creds_file) as f:
|
||||||
|
creds_data = json.load(f)
|
||||||
|
client_id = client_id or creds_data.get("client_id", "")
|
||||||
|
client_secret = client_secret or creds_data.get("client_secret", "")
|
||||||
|
|
||||||
|
if not client_id or not client_secret:
|
||||||
|
raise ValueError(
|
||||||
|
"Google OAuth credentials required. Set GOOGLE_CLIENT_ID and GOOGLE_CLIENT_SECRET "
|
||||||
|
"env vars, or set GOOGLE_CREDENTIALS_FILE to a JSON file with client_id/client_secret."
|
||||||
|
)
|
||||||
|
|
||||||
|
cred = Credentials(
|
||||||
|
token=token_data.get("access_token"),
|
||||||
|
refresh_token=token_data.get("refresh_token"),
|
||||||
|
token_uri="https://oauth2.googleapis.com/token",
|
||||||
|
client_id=client_id,
|
||||||
|
client_secret=client_secret,
|
||||||
|
scopes=["https://www.googleapis.com/auth/webmasters.readonly"],
|
||||||
|
)
|
||||||
|
|
||||||
|
# Always refresh to ensure valid token
|
||||||
|
cred.refresh(Request())
|
||||||
|
token_data["access_token"] = cred.token
|
||||||
|
with open(self.token_file, "w") as f:
|
||||||
|
json.dump(token_data, f, indent=2)
|
||||||
|
|
||||||
|
self._service = build("searchconsole", "v1", credentials=cred)
|
||||||
|
return self._service
|
||||||
|
|
||||||
|
def list_sites(self):
|
||||||
|
"""List all verified Search Console sites."""
|
||||||
|
service = self._get_service()
|
||||||
|
result = service.sites().list().execute()
|
||||||
|
return result.get("siteEntry", [])
|
||||||
|
|
||||||
|
def query(self, dimensions=None, row_limit=25, days=28, start_date=None,
|
||||||
|
end_date=None, filters=None, search_type="web", data_state="final"):
|
||||||
|
"""
|
||||||
|
Query Search Console analytics.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
dimensions: list of "query", "page", "device", "country", "date", "searchAppearance"
|
||||||
|
row_limit: max rows (API max 25000)
|
||||||
|
days: lookback window (ignored if start_date/end_date provided)
|
||||||
|
start_date: "YYYY-MM-DD" (inclusive)
|
||||||
|
end_date: "YYYY-MM-DD" (inclusive)
|
||||||
|
filters: list of {"dimension": str, "operator": str, "expression": str}
|
||||||
|
search_type: "web", "image", "video", "news", "discover", "googleNews"
|
||||||
|
data_state: "final" or "all" (all includes fresh/unfinalized data)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
list of row dicts with keys, clicks, impressions, ctr, position
|
||||||
|
"""
|
||||||
|
service = self._get_service()
|
||||||
|
|
||||||
|
if not end_date:
|
||||||
|
end_date = (datetime.now() - timedelta(days=3)).strftime("%Y-%m-%d")
|
||||||
|
if not start_date:
|
||||||
|
start_date = (datetime.now() - timedelta(days=days + 2)).strftime("%Y-%m-%d")
|
||||||
|
|
||||||
|
body = {
|
||||||
|
"startDate": start_date,
|
||||||
|
"endDate": end_date,
|
||||||
|
"dimensions": dimensions or ["query"],
|
||||||
|
"rowLimit": min(row_limit, 25000),
|
||||||
|
"type": search_type,
|
||||||
|
"dataState": data_state,
|
||||||
|
}
|
||||||
|
|
||||||
|
if filters:
|
||||||
|
body["dimensionFilterGroups"] = [{"filters": filters}]
|
||||||
|
|
||||||
|
result = service.searchanalytics().query(
|
||||||
|
siteUrl=self.site_url, body=body
|
||||||
|
).execute()
|
||||||
|
|
||||||
|
return result.get("rows", [])
|
||||||
|
|
||||||
|
def top_queries(self, n=25, days=28, **kwargs):
|
||||||
|
"""Convenience: top N queries by clicks."""
|
||||||
|
return self.query(dimensions=["query"], row_limit=n, days=days, **kwargs)
|
||||||
|
|
||||||
|
def top_pages(self, n=100, days=28, **kwargs):
|
||||||
|
"""Convenience: top N pages by clicks."""
|
||||||
|
return self.query(dimensions=["page"], row_limit=n, days=days, **kwargs)
|
||||||
|
|
||||||
|
def query_page_matrix(self, n=1000, days=28, **kwargs):
|
||||||
|
"""Get query+page combos for cannibalization analysis."""
|
||||||
|
return self.query(dimensions=["query", "page"], row_limit=n, days=days, **kwargs)
|
||||||
|
|
||||||
|
def daily_trend(self, days=28, **kwargs):
|
||||||
|
"""Daily clicks/impressions trend."""
|
||||||
|
return self.query(dimensions=["date"], row_limit=days, days=days, **kwargs)
|
||||||
|
|
||||||
|
def device_split(self, days=28, **kwargs):
|
||||||
|
"""Traffic by device type."""
|
||||||
|
return self.query(dimensions=["device"], row_limit=10, days=days, **kwargs)
|
||||||
|
|
||||||
|
def country_split(self, n=25, days=28, **kwargs):
|
||||||
|
"""Traffic by country."""
|
||||||
|
return self.query(dimensions=["country"], row_limit=n, days=days, **kwargs)
|
||||||
|
|
||||||
|
def striking_distance(self, days=28, min_position=4, max_position=20, min_impressions=50):
|
||||||
|
"""Find queries in striking distance (positions 4-20 with decent impressions)."""
|
||||||
|
rows = self.query(dimensions=["query"], row_limit=5000, days=days)
|
||||||
|
return [
|
||||||
|
r for r in rows
|
||||||
|
if min_position <= r["position"] <= max_position
|
||||||
|
and r["impressions"] >= min_impressions
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(description="Google Search Console CLI")
|
||||||
|
parser.add_argument("--site", default=GSC_SITE_URL, help="Site URL (or set GSC_SITE_URL env var)")
|
||||||
|
parser.add_argument("--queries", type=int, help="Top N queries")
|
||||||
|
parser.add_argument("--pages", type=int, help="Top N pages")
|
||||||
|
parser.add_argument("--days", type=int, default=28, help="Lookback days")
|
||||||
|
parser.add_argument("--striking", action="store_true", help="Striking distance queries (pos 4-20)")
|
||||||
|
parser.add_argument("--trend", action="store_true", help="Daily trend")
|
||||||
|
parser.add_argument("--devices", action="store_true", help="Device split")
|
||||||
|
parser.add_argument("--countries", type=int, help="Top N countries")
|
||||||
|
parser.add_argument("--sites", action="store_true", help="List all verified sites")
|
||||||
|
parser.add_argument("--raw", help="Raw query body as JSON")
|
||||||
|
parser.add_argument("--json", action="store_true", help="Output as JSON")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
gsc = GSCClient(site_url=args.site)
|
||||||
|
|
||||||
|
if args.sites:
|
||||||
|
sites = gsc.list_sites()
|
||||||
|
if args.json:
|
||||||
|
print(json.dumps(sites, indent=2))
|
||||||
|
else:
|
||||||
|
print(f"{'Site URL':<60} {'Permission':<20}")
|
||||||
|
print("-" * 80)
|
||||||
|
for s in sorted(sites, key=lambda x: x["siteUrl"]):
|
||||||
|
print(f"{s['siteUrl']:<60} {s['permissionLevel']:<20}")
|
||||||
|
return
|
||||||
|
|
||||||
|
if args.raw:
|
||||||
|
body = json.loads(args.raw)
|
||||||
|
rows = gsc.query(**body)
|
||||||
|
elif args.queries:
|
||||||
|
rows = gsc.top_queries(n=args.queries, days=args.days)
|
||||||
|
elif args.pages:
|
||||||
|
rows = gsc.top_pages(n=args.pages, days=args.days)
|
||||||
|
elif args.striking:
|
||||||
|
rows = gsc.striking_distance(days=args.days)
|
||||||
|
elif args.trend:
|
||||||
|
rows = gsc.daily_trend(days=args.days)
|
||||||
|
elif args.devices:
|
||||||
|
rows = gsc.device_split(days=args.days)
|
||||||
|
elif args.countries:
|
||||||
|
rows = gsc.country_split(n=args.countries, days=args.days)
|
||||||
|
else:
|
||||||
|
rows = gsc.top_queries(n=25, days=args.days)
|
||||||
|
|
||||||
|
if args.json:
|
||||||
|
print(json.dumps(rows, indent=2))
|
||||||
|
else:
|
||||||
|
if not rows:
|
||||||
|
print("No data returned.")
|
||||||
|
return
|
||||||
|
dims = rows[0]["keys"]
|
||||||
|
dim_count = len(dims)
|
||||||
|
print(f"{'|'.join(f'Dim{i+1}' for i in range(dim_count)):<60} {'Clicks':>8} {'Impr':>10} {'CTR':>8} {'Pos':>6}")
|
||||||
|
print("-" * 95)
|
||||||
|
for r in rows:
|
||||||
|
key_str = " | ".join(str(k)[:40] for k in r["keys"])
|
||||||
|
print(f"{key_str:<60} {r['clicks']:>8} {r['impressions']:>10} {r['ctr']:>7.1%} {r['position']:>6.1f}")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
7
seo-ops/requirements.txt
Normal file
7
seo-ops/requirements.txt
Normal file
|
|
@ -0,0 +1,7 @@
|
||||||
|
# Core dependencies
|
||||||
|
requests>=2.28.0
|
||||||
|
|
||||||
|
# Google Search Console API
|
||||||
|
google-api-python-client>=2.100.0
|
||||||
|
google-auth>=2.23.0
|
||||||
|
google-auth-httplib2>=0.1.1
|
||||||
440
seo-ops/trend_scout.py
Normal file
440
seo-ops/trend_scout.py
Normal file
|
|
@ -0,0 +1,440 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Trend Scout — Multi-source trend detection for content marketing.
|
||||||
|
|
||||||
|
Scans Google Trends, Hacker News, Reddit, and X/Twitter to find trending
|
||||||
|
topics in your niche before they peak. Scores each trend for relevance
|
||||||
|
to your configured content verticals and suggests content angles.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python trend_scout.py
|
||||||
|
|
||||||
|
Environment variables:
|
||||||
|
CONTENT_VERTICALS — Comma-separated topic verticals (default: marketing-focused set)
|
||||||
|
TREND_SUBREDDITS — Comma-separated subreddits to monitor
|
||||||
|
BRAVE_API_KEY — Brave Search API key (enables X/Twitter scanning)
|
||||||
|
OUTPUT_DIR — Where to save output files (default: ./output)
|
||||||
|
"""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import urllib.request
|
||||||
|
import xml.etree.ElementTree as ET
|
||||||
|
from datetime import datetime, timedelta
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────
|
||||||
|
# Config
|
||||||
|
# ─────────────────────────────────────────────
|
||||||
|
OUTPUT_DIR = Path(os.environ.get("OUTPUT_DIR", "./output"))
|
||||||
|
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
# Content verticals — what topics are relevant to you?
|
||||||
|
# Override with CONTENT_VERTICALS env var (comma-separated)
|
||||||
|
DEFAULT_VERTICALS = [
|
||||||
|
"AI marketing automation",
|
||||||
|
"AI agents for business",
|
||||||
|
"SEO trends",
|
||||||
|
"content marketing AI",
|
||||||
|
"marketing agency transformation",
|
||||||
|
"programmatic SEO",
|
||||||
|
"AI search optimization AEO",
|
||||||
|
"startup growth strategy",
|
||||||
|
]
|
||||||
|
|
||||||
|
_verticals_env = os.environ.get("CONTENT_VERTICALS", "")
|
||||||
|
VERTICALS = [v.strip() for v in _verticals_env.split(",") if v.strip()] if _verticals_env else DEFAULT_VERTICALS
|
||||||
|
|
||||||
|
# Subreddits to monitor
|
||||||
|
DEFAULT_SUBREDDITS = ["marketing", "SEO", "startups", "entrepreneur", "artificial", "digitalmarketing"]
|
||||||
|
_subs_env = os.environ.get("TREND_SUBREDDITS", "")
|
||||||
|
SUBREDDITS = [s.strip() for s in _subs_env.split(",") if s.strip()] if _subs_env else DEFAULT_SUBREDDITS
|
||||||
|
|
||||||
|
BRAVE_API_KEY = os.environ.get("BRAVE_API_KEY", "")
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────
|
||||||
|
# Relevance scoring keywords
|
||||||
|
# Customize these for your niche
|
||||||
|
# ─────────────────────────────────────────────
|
||||||
|
HIGH_RELEVANCE_KEYWORDS = [
|
||||||
|
"ai marketing", "seo", "ai agent", "marketing agency", "content marketing",
|
||||||
|
"programmatic seo", "founder", "startup growth", "saas", "ai search",
|
||||||
|
"ai automation", "marketing automation", "creator economy",
|
||||||
|
"digital marketing agency", "ai seo",
|
||||||
|
]
|
||||||
|
|
||||||
|
MEDIUM_RELEVANCE_KEYWORDS = [
|
||||||
|
"ai", "marketing", "google", "search", "business", "revenue",
|
||||||
|
"growth", "startup", "entrepreneur", "automation", "llm", "gpt",
|
||||||
|
"chatgpt", "social media", "advertising", "content",
|
||||||
|
"digital marketing",
|
||||||
|
]
|
||||||
|
|
||||||
|
LOW_RELEVANCE_KEYWORDS = [
|
||||||
|
"tech", "digital", "platform", "data", "analytics", "strategy",
|
||||||
|
]
|
||||||
|
|
||||||
|
# Override with env vars (JSON arrays)
|
||||||
|
_high_env = os.environ.get("HIGH_RELEVANCE_KEYWORDS_JSON")
|
||||||
|
if _high_env:
|
||||||
|
try:
|
||||||
|
HIGH_RELEVANCE_KEYWORDS = json.loads(_high_env)
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────
|
||||||
|
# Data Sources
|
||||||
|
# ─────────────────────────────────────────────
|
||||||
|
|
||||||
|
def get_google_trends():
|
||||||
|
"""Pull trending searches from Google Trends RSS."""
|
||||||
|
url = "https://trends.google.com/trending/rss?geo=US"
|
||||||
|
try:
|
||||||
|
req = urllib.request.Request(url, headers={"User-Agent": "Mozilla/5.0"})
|
||||||
|
with urllib.request.urlopen(req, timeout=15) as response:
|
||||||
|
data = response.read().decode("utf-8")
|
||||||
|
|
||||||
|
root = ET.fromstring(data)
|
||||||
|
ns = {"ht": "https://trends.google.com/trending/rss"}
|
||||||
|
trends = []
|
||||||
|
for item in root.findall(".//item")[:20]:
|
||||||
|
title = item.find("title")
|
||||||
|
traffic = item.find("ht:approx_traffic", ns)
|
||||||
|
news_items = item.findall("ht:news_item", ns)
|
||||||
|
|
||||||
|
news_titles = []
|
||||||
|
news_urls = []
|
||||||
|
for ni in news_items[:2]:
|
||||||
|
nt = ni.find("ht:news_item_title", ns)
|
||||||
|
nu = ni.find("ht:news_item_url", ns)
|
||||||
|
if nt is not None:
|
||||||
|
news_titles.append(nt.text)
|
||||||
|
if nu is not None:
|
||||||
|
news_urls.append(nu.text)
|
||||||
|
|
||||||
|
trends.append({
|
||||||
|
"topic": title.text if title is not None else "Unknown",
|
||||||
|
"traffic": traffic.text if traffic is not None else "N/A",
|
||||||
|
"news_titles": news_titles,
|
||||||
|
"news_urls": news_urls,
|
||||||
|
})
|
||||||
|
return trends
|
||||||
|
except Exception as e:
|
||||||
|
print(f"⚠️ Google Trends fetch failed: {e}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
|
||||||
|
def get_hackernews_top():
|
||||||
|
"""Pull top HN stories filtered for relevance."""
|
||||||
|
try:
|
||||||
|
url = "https://hacker-news.firebaseio.com/v0/topstories.json"
|
||||||
|
req = urllib.request.Request(url, headers={"User-Agent": "Mozilla/5.0"})
|
||||||
|
with urllib.request.urlopen(req, timeout=10) as response:
|
||||||
|
ids = json.loads(response.read().decode("utf-8"))[:30]
|
||||||
|
|
||||||
|
stories = []
|
||||||
|
# Use all relevance keywords for filtering
|
||||||
|
keywords = set()
|
||||||
|
for kw in HIGH_RELEVANCE_KEYWORDS + MEDIUM_RELEVANCE_KEYWORDS:
|
||||||
|
keywords.update(kw.lower().split())
|
||||||
|
|
||||||
|
for story_id in ids:
|
||||||
|
try:
|
||||||
|
surl = f"https://hacker-news.firebaseio.com/v0/item/{story_id}.json"
|
||||||
|
sreq = urllib.request.Request(surl, headers={"User-Agent": "Mozilla/5.0"})
|
||||||
|
with urllib.request.urlopen(sreq, timeout=5) as sr:
|
||||||
|
story = json.loads(sr.read().decode("utf-8"))
|
||||||
|
|
||||||
|
title = story.get("title", "").lower()
|
||||||
|
if any(kw in title for kw in keywords):
|
||||||
|
stories.append({
|
||||||
|
"title": story.get("title"),
|
||||||
|
"url": story.get("url", f"https://news.ycombinator.com/item?id={story_id}"),
|
||||||
|
"score": story.get("score", 0),
|
||||||
|
"comments": story.get("descendants", 0),
|
||||||
|
})
|
||||||
|
except:
|
||||||
|
continue
|
||||||
|
|
||||||
|
if len(stories) >= 10:
|
||||||
|
break
|
||||||
|
|
||||||
|
return stories
|
||||||
|
except Exception as e:
|
||||||
|
print(f"⚠️ HN fetch failed: {e}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
|
||||||
|
def get_reddit_trending():
|
||||||
|
"""Pull trending posts from configured subreddits."""
|
||||||
|
posts = []
|
||||||
|
|
||||||
|
for sub in SUBREDDITS:
|
||||||
|
try:
|
||||||
|
url = f"https://www.reddit.com/r/{sub}/hot.json?limit=5"
|
||||||
|
req = urllib.request.Request(url, headers={"User-Agent": "TrendScout/1.0"})
|
||||||
|
with urllib.request.urlopen(req, timeout=10) as response:
|
||||||
|
data = json.loads(response.read().decode("utf-8"))
|
||||||
|
|
||||||
|
for child in data.get("data", {}).get("children", []):
|
||||||
|
post = child.get("data", {})
|
||||||
|
if post.get("score", 0) > 50:
|
||||||
|
posts.append({
|
||||||
|
"title": post.get("title"),
|
||||||
|
"subreddit": sub,
|
||||||
|
"score": post.get("score"),
|
||||||
|
"comments": post.get("num_comments"),
|
||||||
|
"url": f"https://reddit.com{post.get('permalink', '')}",
|
||||||
|
})
|
||||||
|
except Exception as e:
|
||||||
|
print(f"⚠️ Reddit r/{sub} failed: {e}")
|
||||||
|
continue
|
||||||
|
|
||||||
|
posts.sort(key=lambda x: x.get("score", 0), reverse=True)
|
||||||
|
return posts[:10]
|
||||||
|
|
||||||
|
|
||||||
|
def get_x_twitter_trending():
|
||||||
|
"""Pull trending X/Twitter discussions via Brave Search."""
|
||||||
|
if not BRAVE_API_KEY:
|
||||||
|
print(" ⚠️ No BRAVE_API_KEY — skipping X/Twitter scan")
|
||||||
|
return []
|
||||||
|
|
||||||
|
# Build search queries from your verticals
|
||||||
|
queries = []
|
||||||
|
for vertical in VERTICALS[:4]:
|
||||||
|
queries.append(f'site:twitter.com OR site:x.com "{vertical}"')
|
||||||
|
|
||||||
|
posts = []
|
||||||
|
for query in queries:
|
||||||
|
try:
|
||||||
|
encoded_q = urllib.request.quote(query)
|
||||||
|
url = f"https://api.search.brave.com/res/v1/web/search?q={encoded_q}&count=5&freshness=pd"
|
||||||
|
req = urllib.request.Request(url, headers={
|
||||||
|
"Accept": "application/json",
|
||||||
|
"Accept-Encoding": "gzip",
|
||||||
|
"X-Subscription-Token": BRAVE_API_KEY,
|
||||||
|
})
|
||||||
|
with urllib.request.urlopen(req, timeout=10) as response:
|
||||||
|
data = json.loads(response.read().decode("utf-8"))
|
||||||
|
|
||||||
|
for result in data.get("web", {}).get("results", []):
|
||||||
|
if "twitter.com" in result.get("url", "") or "x.com" in result.get("url", ""):
|
||||||
|
posts.append({
|
||||||
|
"title": result.get("title", ""),
|
||||||
|
"url": result.get("url", ""),
|
||||||
|
"description": result.get("description", "")[:200],
|
||||||
|
"source": "X/Twitter",
|
||||||
|
"query": query.replace("site:twitter.com OR site:x.com ", ""),
|
||||||
|
})
|
||||||
|
except Exception as e:
|
||||||
|
print(f" ⚠️ X search failed: {e}")
|
||||||
|
continue
|
||||||
|
|
||||||
|
return posts[:10]
|
||||||
|
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────
|
||||||
|
# Scoring & Analysis
|
||||||
|
# ─────────────────────────────────────────────
|
||||||
|
|
||||||
|
def score_trend(trend_title):
|
||||||
|
"""Score how relevant a trend is to your content verticals (0-100)."""
|
||||||
|
title_lower = trend_title.lower()
|
||||||
|
score = 0
|
||||||
|
|
||||||
|
for kw in HIGH_RELEVANCE_KEYWORDS:
|
||||||
|
if kw in title_lower:
|
||||||
|
score += 25
|
||||||
|
for kw in MEDIUM_RELEVANCE_KEYWORDS:
|
||||||
|
if kw in title_lower:
|
||||||
|
score += 10
|
||||||
|
for kw in LOW_RELEVANCE_KEYWORDS:
|
||||||
|
if kw in title_lower:
|
||||||
|
score += 5
|
||||||
|
|
||||||
|
return min(score, 100)
|
||||||
|
|
||||||
|
|
||||||
|
def generate_content_angles(trends_data):
|
||||||
|
"""Generate content angle suggestions based on trends."""
|
||||||
|
angles = []
|
||||||
|
|
||||||
|
for trend in trends_data.get("google_trends", [])[:5]:
|
||||||
|
relevance = score_trend(trend["topic"])
|
||||||
|
if relevance >= 20:
|
||||||
|
angles.append({
|
||||||
|
"source": "Google Trends",
|
||||||
|
"topic": trend["topic"],
|
||||||
|
"traffic": trend["traffic"],
|
||||||
|
"relevance_score": relevance,
|
||||||
|
"angle_suggestion": f"Your take on '{trend['topic']}' — tie to your niche angle",
|
||||||
|
"platforms": ["X", "LinkedIn", "Short-form video"],
|
||||||
|
})
|
||||||
|
|
||||||
|
for story in trends_data.get("hackernews", [])[:5]:
|
||||||
|
relevance = score_trend(story["title"])
|
||||||
|
if relevance >= 15:
|
||||||
|
angles.append({
|
||||||
|
"source": "Hacker News",
|
||||||
|
"topic": story["title"],
|
||||||
|
"score": story["score"],
|
||||||
|
"relevance_score": relevance,
|
||||||
|
"url": story["url"],
|
||||||
|
"angle_suggestion": f"Expert perspective on '{story['title']}'",
|
||||||
|
"platforms": ["X", "YouTube", "LinkedIn"],
|
||||||
|
})
|
||||||
|
|
||||||
|
for post in trends_data.get("reddit", [])[:5]:
|
||||||
|
relevance = score_trend(post["title"])
|
||||||
|
if relevance >= 15:
|
||||||
|
angles.append({
|
||||||
|
"source": f"Reddit r/{post['subreddit']}",
|
||||||
|
"topic": post["title"],
|
||||||
|
"engagement": f"{post['score']} upvotes, {post['comments']} comments",
|
||||||
|
"relevance_score": relevance,
|
||||||
|
"url": post["url"],
|
||||||
|
"angle_suggestion": f"Address this conversation from your expertise",
|
||||||
|
"platforms": ["X", "LinkedIn", "Short-form video"],
|
||||||
|
})
|
||||||
|
|
||||||
|
for post in trends_data.get("x_twitter", [])[:5]:
|
||||||
|
relevance = score_trend(post["title"])
|
||||||
|
if relevance >= 15:
|
||||||
|
angles.append({
|
||||||
|
"source": "X/Twitter",
|
||||||
|
"topic": post["title"][:100],
|
||||||
|
"relevance_score": relevance,
|
||||||
|
"url": post.get("url", ""),
|
||||||
|
"angle_suggestion": f"Jump into this conversation with your take",
|
||||||
|
"platforms": ["X", "LinkedIn"],
|
||||||
|
})
|
||||||
|
|
||||||
|
angles.sort(key=lambda x: x.get("relevance_score", 0), reverse=True)
|
||||||
|
return angles[:10]
|
||||||
|
|
||||||
|
|
||||||
|
def format_output(trends_data, angles):
|
||||||
|
"""Format for human-readable markdown output."""
|
||||||
|
today = datetime.now().strftime("%Y-%m-%d")
|
||||||
|
|
||||||
|
lines = [f"# 🔥 Trend Scout — {today}\n"]
|
||||||
|
|
||||||
|
if angles:
|
||||||
|
lines.append("## Top Content Opportunities\n")
|
||||||
|
for i, angle in enumerate(angles, 1):
|
||||||
|
lines.append(f"### {i}. {angle['topic']}")
|
||||||
|
lines.append(f"**Source:** {angle['source']} | **Relevance:** {angle['relevance_score']}/100")
|
||||||
|
if angle.get("traffic"):
|
||||||
|
lines.append(f"**Search volume:** {angle['traffic']}")
|
||||||
|
if angle.get("engagement"):
|
||||||
|
lines.append(f"**Engagement:** {angle['engagement']}")
|
||||||
|
lines.append(f"**Angle:** {angle['angle_suggestion']}")
|
||||||
|
lines.append(f"**Best for:** {', '.join(angle['platforms'])}")
|
||||||
|
if angle.get("url"):
|
||||||
|
lines.append(f"**Ref:** {angle['url']}")
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
lines.append("## 📊 Raw Signals\n")
|
||||||
|
|
||||||
|
gt = trends_data.get("google_trends", [])
|
||||||
|
if gt:
|
||||||
|
lines.append("**Google Trends (US):**")
|
||||||
|
for t in gt[:8]:
|
||||||
|
lines.append(f"- {t['topic']} ({t['traffic']})")
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
hn = trends_data.get("hackernews", [])
|
||||||
|
if hn:
|
||||||
|
lines.append("**Hacker News (filtered):**")
|
||||||
|
for s in hn[:5]:
|
||||||
|
lines.append(f"- [{s['title']}]({s['url']}) — {s['score']}pts, {s['comments']} comments")
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
rd = trends_data.get("reddit", [])
|
||||||
|
if rd:
|
||||||
|
lines.append("**Reddit Hot Posts:**")
|
||||||
|
for p in rd[:5]:
|
||||||
|
lines.append(f"- r/{p['subreddit']}: {p['title']} ({p['score']}↑)")
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
xt = trends_data.get("x_twitter", [])
|
||||||
|
if xt:
|
||||||
|
lines.append("**X/Twitter Trending:**")
|
||||||
|
for p in xt[:5]:
|
||||||
|
lines.append(f"- [{p.get('query','')}] {p['title'][:80]}")
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────
|
||||||
|
# Main
|
||||||
|
# ─────────────────────────────────────────────
|
||||||
|
|
||||||
|
def main():
|
||||||
|
print("🔥 Trend Scout starting...")
|
||||||
|
print(f" Verticals: {', '.join(VERTICALS[:5])}{'...' if len(VERTICALS) > 5 else ''}")
|
||||||
|
print(f" Subreddits: {', '.join(SUBREDDITS)}")
|
||||||
|
print()
|
||||||
|
|
||||||
|
# Gather signals
|
||||||
|
print(" 📡 Fetching Google Trends...")
|
||||||
|
google_trends = get_google_trends()
|
||||||
|
|
||||||
|
print(" 📡 Fetching Hacker News...")
|
||||||
|
hackernews = get_hackernews_top()
|
||||||
|
|
||||||
|
print(" 📡 Fetching Reddit...")
|
||||||
|
reddit = get_reddit_trending()
|
||||||
|
|
||||||
|
print(" 📡 Fetching X/Twitter...")
|
||||||
|
x_twitter = get_x_twitter_trending()
|
||||||
|
|
||||||
|
trends_data = {
|
||||||
|
"timestamp": datetime.now().isoformat(),
|
||||||
|
"verticals": VERTICALS,
|
||||||
|
"google_trends": google_trends,
|
||||||
|
"hackernews": hackernews,
|
||||||
|
"reddit": reddit,
|
||||||
|
"x_twitter": x_twitter,
|
||||||
|
}
|
||||||
|
|
||||||
|
# Generate content angles
|
||||||
|
print(" 🧠 Generating content angles...")
|
||||||
|
angles = generate_content_angles(trends_data)
|
||||||
|
|
||||||
|
# Save raw data (JSON)
|
||||||
|
json_path = OUTPUT_DIR / "flash-trends-latest.json"
|
||||||
|
with open(json_path, "w") as f:
|
||||||
|
json.dump({"trends": trends_data, "angles": angles}, f, indent=2)
|
||||||
|
print(f" 💾 Saved to {json_path}")
|
||||||
|
|
||||||
|
# Save formatted output (Markdown)
|
||||||
|
today = datetime.now().strftime("%Y-%m-%d")
|
||||||
|
md_path = OUTPUT_DIR / f"flash-trends-{today}.md"
|
||||||
|
formatted = format_output(trends_data, angles)
|
||||||
|
with open(md_path, "w") as f:
|
||||||
|
f.write(formatted)
|
||||||
|
print(f" 📝 Saved to {md_path}")
|
||||||
|
|
||||||
|
# Print summary
|
||||||
|
print(f"\n✅ Trend Scout complete:")
|
||||||
|
print(f" - Google Trends: {len(google_trends)} trends")
|
||||||
|
print(f" - Hacker News: {len(hackernews)} relevant stories")
|
||||||
|
print(f" - Reddit: {len(reddit)} hot posts")
|
||||||
|
print(f" - X/Twitter: {len(x_twitter)} discussions")
|
||||||
|
print(f" - Content angles: {len(angles)} opportunities")
|
||||||
|
|
||||||
|
if angles:
|
||||||
|
print(f"\n🎯 Top 3 angles:")
|
||||||
|
for i, a in enumerate(angles[:3], 1):
|
||||||
|
print(f" {i}. [{a['relevance_score']}/100] {a['topic']} ({a['source']})")
|
||||||
|
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
sys.exit(main())
|
||||||
Loading…
Add table
Add a link
Reference in a new issue