Alfred Claw 64d5dd430c Wire telemetry preamble into all 10 SKILL.md files + sanitizer allow_patterns

- All 10 SKILL.md files now run version_check + telemetry_init on start
- Sanitizer now supports allow_patterns from config (for example emails, docs)
- Config updated with safe patterns for documentation content
- Fixed example email to use example.com domain

2026-03-31 10:00:33 -07:00

10 KiB

Raw Blame History

name	description
podcast-pipeline	Podcast-to-Everything content pipeline. Takes a podcast RSS feed or raw transcript and generates a full cross-platform content calendar: short-form video clips, Twitter/X threads, LinkedIn articles, newsletter sections, quote cards, blog outlines with SEO keywords, and YouTube Shorts/TikTok scripts. Scores each piece by viral potential (novelty × controversy × utility) and deduplicates against recent output. Use when asked to: "repurpose this podcast", "turn this episode into content", "podcast content calendar", "extract clips from this episode", "podcast to social", "content from RSS feed", "batch process episodes", or any request to turn podcast/audio content into a multi-platform content plan.

name

description

podcast-pipeline

Podcast-to-Everything content pipeline. Takes a podcast RSS feed or raw transcript and generates a full cross-platform content calendar: short-form video clips, Twitter/X threads, LinkedIn articles, newsletter sections, quote cards, blog outlines with SEO keywords, and YouTube Shorts/TikTok scripts. Scores each piece by viral potential (novelty × controversy × utility) and deduplicates against recent output. Use when asked to: "repurpose this podcast", "turn this episode into content", "podcast content calendar", "extract clips from this episode", "podcast to social", "content from RSS feed", "batch process episodes", or any request to turn podcast/audio content into a multi-platform content plan.

Preamble (runs on skill start)

# Version check (silent if up to date)
python3 telemetry/version_check.py 2>/dev/null || true

# Telemetry opt-in (first run only, then remembers your choice)
python3 telemetry/telemetry_init.py 2>/dev/null || true

Privacy: This skill logs usage locally to ~/.ai-marketing-skills/analytics/. Remote telemetry is opt-in only. No code, file paths, or repo content is ever collected. See telemetry/README.md.

Podcast-to-Everything Pipeline

Turns podcast episodes into a full content calendar across every platform. One episode in, 15-20 content pieces out — scored, deduplicated, and scheduled.

Step 1: Ingest — Get the Transcript

Determine the input source and obtain a clean transcript.

Option A: RSS Feed (`--rss <url>`)

Fetch the RSS feed XML
Extract the latest episode's audio URL (or use --episodes N for batch)
Download the audio file
Transcribe via OpenAI Whisper API (with timestamps)
Store transcript with episode metadata (title, date, description, duration)

Option B: Raw Transcript (`--transcript <file>`)

Read the transcript file (plain text, SRT, or VTT)
Parse timestamps if present
Extract episode metadata from filename or prompt user

Option C: Batch Mode (`--batch <rss_url> --episodes N`)

Fetch RSS feed
Extract the last N episodes
Process each through the full pipeline
Deduplicate across all episodes in the batch

Transcript cleanup

Remove filler words (um, uh, like, you know) for written content
Preserve original with timestamps for video clip suggestions
Split into logical segments by topic shift

Step 2: Editorial Brain — Deep Analysis

Feed the full transcript to the LLM with this extraction framework:

Extract these content atoms:

Narrative Arcs — Complete story segments with setup → tension → resolution. Tag with start/end timestamps.
Quotable Moments — Punchy, shareable statements. One-liners that stand alone. Must pass the "would someone screenshot this?" test.
Controversial Takes — Opinions that go against conventional wisdom. The stuff that makes people reply "hard disagree" or "finally someone said it."
Data Points — Specific numbers, percentages, dollar amounts, timeframes. Concrete proof points that add credibility.
Stories — Personal anecdotes, case studies, client examples. Must have a character, a problem, and an outcome.
Frameworks — Step-by-step processes, mental models, decision matrices. Anything structured that people would save or bookmark.
Predictions — Forward-looking claims about trends, markets, technology. Hot takes about where things are going.

Output format per atom:

- Type: [narrative_arc | quote | controversial_take | data_point | story | framework | prediction]
- Content: [extracted text]
- Timestamp: [start - end, if available]
- Context: [what was being discussed]
- Viral Score: [0-100, see Step 4]
- Suggested platforms: [where this atom works best]

Step 3: Content Generation — One Episode, Many Pieces

For each episode, generate ALL of these from the extracted atoms:

3a. Short-Form Video Clips (3-5 per episode)

- Hook: [First 3 seconds — pattern interrupt or bold claim]
- Clip segment: [Timestamp range from transcript]
- Caption overlay: [Text for the screen]
- Platform: [YouTube Shorts / TikTok / Instagram Reels]
- Why it works: [What makes this clippable]

Prioritize: controversial takes > stories with payoffs > surprising data points

3b. Twitter/X Threads (2-3 per episode)

- Thread hook (tweet 1): [Curiosity gap or bold opener]
- Thread body (5-10 tweets): [Each tweet is one complete thought]
- Thread closer: [CTA — follow, reply, retweet trigger]
- Source atoms: [Which content atoms feed this thread]

Rules: No tweet over 280 chars. Each tweet must stand alone. Use data points as proof.

3c. LinkedIn Article Draft (1 per episode)

- Headline: [Specific, benefit-driven]
- Hook paragraph: [Before the "see more" fold — must earn the click]
- Body: [3-5 sections with headers, 800-1200 words]
- CTA: [Engagement driver — question, not link]
- Hashtags: [3-5 relevant, not spammy]

Voice: Professional but not corporate. First-person. Story-driven.

- Section headline: [Scannable, specific]
- TL;DR: [One sentence, the core insight]
- Body: [3-5 bullet points, each with a takeaway]
- Pull quote: [The most shareable line from the episode]
- Link: [Back to full episode]

3e. Quote Cards (3-5 per episode)

- Quote text: [Max 20 words — must work as text overlay]
- Attribution: [Speaker name]
- Background suggestion: [Color/mood that matches the tone]
- Platform sizing: [1080x1080 for IG, 1200x675 for Twitter, 1080x1920 for Stories]

3f. Blog Post Outline (1 per episode)

- Title: [SEO-optimized, includes primary keyword]
- Primary keyword: [Search volume + difficulty estimate]
- Secondary keywords: [3-5 related terms]
- Meta description: [155 chars max]
- H2 sections: [5-7, each maps to a content atom]
- Internal linking opportunities: [Topics that connect to existing content]
- Estimated word count: [1500-2500]

3g. YouTube Shorts / TikTok Script (1 per episode)

- HOOK (0-3s): [Pattern interrupt — question, bold claim, or visual]
- SETUP (3-15s): [Context — why should they care]
- PAYOFF (15-45s): [The insight, data, or story resolution]
- CTA (45-60s): [Follow, comment prompt, or part 2 tease]
- On-screen text: [Key phrases to overlay]
- B-roll suggestions: [Visual ideas if not talking-head]

Step 4: Content Scoring — Viral Potential

Score every generated piece on three dimensions (each 0-100):

Dimension	What It Measures	Signals
Novelty	Is this new or surprising?	Contrarian takes, unexpected data, first-to-say
Controversy	Will people argue about this?	Strong opinions, challenges norms, picks a side
Utility	Can someone use this immediately?	Frameworks, how-tos, templates, specific numbers

Viral Score = (Novelty × 0.4) + (Controversy × 0.3) + (Utility × 0.3)

Score thresholds:

80+ → Priority publish. Schedule for peak engagement windows.
60-79 → Solid content. Fill the calendar.
40-59 → Filler. Use only if calendar has gaps.
Below 40 → Cut it. Not worth the publish slot.

Step 5: Dedup Engine

Before finalizing, check all generated content against:

This batch — No two pieces should cover the same angle
Recent history — Compare against last N days of output (default: 30)
Similarity threshold — Flag any pair with >70% semantic overlap

Dedup rules:

If two pieces overlap >70%: keep the higher-scored one, cut the other
If a piece overlaps with recently published content: flag with ⚠️ and suggest a differentiation angle
Track all published content hashes in output/content_history.json

Step 6: Calendar Generation (`--calendar`)

Assemble scored, deduplicated content into a weekly publish calendar.

Scheduling rules:

Twitter/X: 1-2 per day, peak hours (8-10am, 12-1pm, 5-7pm ET)
LinkedIn: 1 per day max, Tuesday-Thursday mornings
YouTube Shorts/TikTok: 1 per day, evenings
Newsletter: Weekly, same day each week
Blog: 1-2 per week
Quote cards: Intersperse on low-content days

Calendar output format:

{
  "week_of": "2024-01-15",
  "episode_source": "Episode Title - Guest Name",
  "content_pieces": [
    {
      "date": "2024-01-15",
      "time": "09:00 ET",
      "platform": "twitter",
      "type": "thread",
      "content": "...",
      "viral_score": 85,
      "status": "draft"
    }
  ],
  "total_pieces": 18,
  "avg_viral_score": 72,
  "coverage": {
    "twitter": 6,
    "linkedin": 3,
    "youtube_shorts": 3,
    "newsletter": 1,
    "blog": 1,
    "quote_cards": 4
  }
}

Step 7: Output

All output goes to output/ directory:

output/
├── episodes/
│   ├── YYYY-MM-DD-episode-slug/
│   │   ├── transcript.txt
│   │   ├── atoms.json          # Extracted content atoms
│   │   ├── content_pieces.json # All generated content
│   │   └── calendar.json       # Scheduled calendar
│   └── ...
├── calendar/
│   └── week-YYYY-WNN.json     # Aggregated weekly calendar
├── content_history.json        # Dedup tracking
└── pipeline_log.json           # Run history and stats

CLI Reference

# Process latest episode from RSS feed
python podcast_pipeline.py --rss "https://feeds.example.com/podcast.xml"

# Process a local transcript
python podcast_pipeline.py --transcript episode-42.txt

# Batch process last 5 episodes
python podcast_pipeline.py --batch "https://feeds.example.com/podcast.xml" --episodes 5

# Generate weekly calendar from existing outputs
python podcast_pipeline.py --calendar

# Process with custom dedup window
python podcast_pipeline.py --rss "https://feeds.example.com/podcast.xml" --dedup-days 60

# Process and only keep 80+ viral score content
python podcast_pipeline.py --rss "https://feeds.example.com/podcast.xml" --min-score 80

Environment Variables

Variable	Required	Description
`OPENAI_API_KEY`	Yes (for Whisper)	OpenAI API key for audio transcription
`ANTHROPIC_API_KEY`	Yes (for generation)	Anthropic API key for content generation
`OPENAI_LLM_KEY`	Optional	Separate OpenAI key if using GPT for generation instead

Reference Files

File	Purpose
`podcast_pipeline.py`	Main pipeline script
`requirements.txt`	Python dependencies
`README.md`	Setup and usage guide

10 KiB Raw Blame History Unescape Escape