# /transcribe

> For input `~/Downloads/sales-workshop.mp4` with `--topic hormozi --slug sales-workshop`:


# Transcribe — Long-Form Video/Audio → Knowledge Base

## Triggers

- `/transcribe <file> --topic <topic>`
- "transcribe this video / audio / workshop / interview / audiobook"
- "ingest this Hormozi video / Brunson talk / sales training into the knowledge base"
- "add this to the hormozi knowledge base"

## What it produces

For input `~/Downloads/sales-workshop.mp4` with `--topic hormozi --slug sales-workshop`:

```
.claude/context/hormozi/
├── README.md                     # Index — script appends a row here
├── transcripts/
│   └── sales-workshop.md         # Verbatim, segment-timestamped (Whisper)
├── synthesis/
│   └── sales-workshop.md         # Structured synthesis (Sonnet, mimics existing book format)
└── slides/sales-workshop/        # Only if --with-slides
    └── slide_0001.png ...
```

The **transcript** file is archival — large, never auto-loaded into agents.
The **book** file is what agents `Read` on demand for retrieval. Match this distinction when deciding what flags to pass.

## Inputs accepted

- Local file path: `.mp4`, `.mkv`, `.mov`, `.m4a`, `.mp3`, `.wav`, anything ffmpeg reads
- `--topic <topic>` (required) — folder under `.claude/context/`. Existing topics: `hormozi`, `copywriting-refs`, `media-buying-refs`. Create a new one if needed.
- `--slug <slug>` (optional) — output basename. Defaults to sanitized input filename.
- `--with-slides` — run scene detection + Claude vision captions. **Default OFF.** Turn ON for slide-deck-heavy content (workshops with frameworks/diagrams). Turn OFF for talking-head, interview, podcast, audiobook.
- `--no-synthesis` — skip the Claude book pass. Useful if you only want the raw transcript (e.g. for verbatim quote retrieval).
- `--style-ref <path>` — point at a specific existing book file for synthesis to mimic. Auto-detects an existing book in the topic if absent.

## Cost (rough)

| Component | 4hr video |
|-----------|-----------|
| Whisper API | ~$1.45 |
| Vision (slides, ~50 imgs) | ~$0.05 |
| Synthesis (Sonnet 4.6) | ~$0.50 |
| **Total** | **~$2** |

Wall time: ~5-10 min for a 4hr video (Whisper runs 4-way parallel).

## Workflow

### Step 1 — Confirm intent with the user

Before running, confirm:

- **Input file path** (must exist locally)
- **Topic folder** — where it goes. Default to existing topics rather than creating new ones unless the content really doesn't fit.
- **Slides or not** — ask if the source has slides/diagrams worth capturing. Defaults: workshop/training/presentation → yes, interview/podcast/audiobook → no.
- **Synthesis or not** — almost always yes. Skip only when the user wants raw text only.

### Step 2 — Run

```bash
node shared/transcribe.mjs "<input_file>" --topic <topic> --slug <slug> [--with-slides]
```

Run from the repo root (`c:\Users\faris\agency-os`). The script:

1. Extracts audio with ffmpeg (16kHz mono mp3)
2. Splits into ~10-min chunks
3. Sends chunks to Whisper API in parallel (4 simultaneous)
4. Stitches segments back with timestamps
5. (if `--with-slides`) detects slide changes via ffmpeg scene detection, screenshots each, captions each via Claude Haiku
6. Writes transcript markdown with inline screenshots
7. (if synthesis enabled) Calls Claude Sonnet 4.6 with the topic's existing synthesis file as a style reference, produces a structured synthesis, writes to `synthesis/<slug>.md`
8. Appends a row to `<topic>/README.md`

### Step 3 — Hand back to the user

After the script finishes, output:

- Full path to the transcript file
- Full path to the book file (if synthesis ran)
- A one-line "what's next" — usually: open the book file and refine the front-matter "core thesis" line in the README index, since the script leaves it as `_fill in core thesis_`.

### Step 4 — Refine the synthesis

The script's Sonnet pass is a strong V1 but won't match a hand-edited book like `100m-money-models.md` in depth. After the run, suggest:

> "Open `synthesis/<slug>.md` — the synthesis is solid but a hand pass to add cross-references to the Cross-Book Playbook (in the topic README) and to flesh out the Application Map will make this much more useful long-term."

Don't do this pass automatically — it's slow and judgment-heavy. Let Faris choose to do it.

## Topic-specific notes

### `hormozi`

- The existing `synthesis/100m-money-models.md` is the gold standard the synthesis tries to mimic.
- After ingesting, the user typically wants to:
  1. Update the **Cross-Book Playbook** in the topic README with new principles
  2. Update the **Application Map** with new tactics for Faris's businesses
- These are hand-edits, not script work. Suggest them but don't do them automatically.

### Other topics

- If the topic folder has no existing book, synthesis still runs but with no style reference — output will be more generic. The first ingestion sets the style for the rest.

## Common pitfalls

- **Wrong path quoting on Windows** — wrap input file in double quotes when it contains spaces.
- **Very long videos (>6hr)** — synthesis truncates to 350K chars (~5hr of speech). For 8hr+ content, split into two sessions and run twice with different slugs.
- **Slide detection too aggressive/loose** — threshold is 0.35. If you get hundreds of screenshots from a talking-head clip, re-run without `--with-slides`. If you miss obvious slide changes in a deck-heavy video, edit `SCENE_THRESHOLD` in `shared/transcribe.mjs` to 0.25 and re-run.
- **Whisper rate limits** — script runs 4 chunks in parallel. If you hit 429s, lower `PARALLEL` in the script.
- **Existing slug** — script writes to `transcripts/<slug>.md` and `synthesis/<slug>.md` and overwrites without asking. Use a different `--slug` to keep both.

## Why this exists

Long-form video is the highest-density learning material Faris consumes (Hormozi workshops, Brunson talks, sales training, prospect deep-dives), but it's the hardest to retrieve from. This skill turns that material into structured, agent-readable knowledge that the copywriter, media-buyer, and sales-ops agents can pull from when designing offers, ad copy, and money models.