[ REPORTING & OPS ]
/transcribe
For input `~/Downloads/sales-workshop.mp4` with `--topic hormozi --slug sales-workshop`:
ACME Agency, <id> and you@example.com mark values that are per-agency — your install fills them with YOUR clients and accounts. If a section references a helper script you don't have yet, it ships with that workflow's install.Transcribe — Long-Form Video/Audio → Knowledge Base
Triggers
/transcribe <file> --topic <topic>- "transcribe this video / audio / workshop / interview / audiobook"
- "ingest this Hormozi video / Brunson talk / sales training into the knowledge base"
- "add this to the hormozi knowledge base"
What it produces
For input ~/Downloads/sales-workshop.mp4 with --topic hormozi --slug sales-workshop:
.claude/context/hormozi/
├── README.md # Index — script appends a row here
├── transcripts/
│ └── sales-workshop.md # Verbatim, segment-timestamped (Whisper)
├── synthesis/
│ └── sales-workshop.md # Structured synthesis (Sonnet, mimics existing book format)
└── slides/sales-workshop/ # Only if --with-slides
└── slide_0001.png ...
The transcript file is archival — large, never auto-loaded into agents. The book file is what agents Read on demand for retrieval. Match this distinction when deciding what flags to pass.
Inputs accepted
- Local file path:
.mp4,.mkv,.mov,.m4a,.mp3,.wav, anything ffmpeg reads --topic <topic>(required) — folder under.claude/context/. Existing topics:hormozi,copywriting-refs,media-buying-refs. Create a new one if needed.--slug <slug>(optional) — output basename. Defaults to sanitized input filename.--with-slides— run scene detection + Claude vision captions. Default OFF. Turn ON for slide-deck-heavy content (workshops with frameworks/diagrams). Turn OFF for talking-head, interview, podcast, audiobook.--no-synthesis— skip the Claude book pass. Useful if you only want the raw transcript (e.g. for verbatim quote retrieval).--style-ref <path>— point at a specific existing book file for synthesis to mimic. Auto-detects an existing book in the topic if absent.
Cost (rough)
| Component | 4hr video |
|---|---|
| Whisper API | ~$1.45 |
| Vision (slides, ~50 imgs) | ~$0.05 |
| Synthesis (Sonnet 4.6) | ~$0.50 |
| Total | ~$2 |
Wall time: ~5-10 min for a 4hr video (Whisper runs 4-way parallel).
Workflow
Step 1 — Confirm intent with the user
Before running, confirm:
- Input file path (must exist locally)
- Topic folder — where it goes. Default to existing topics rather than creating new ones unless the content really doesn't fit.
- Slides or not — ask if the source has slides/diagrams worth capturing. Defaults: workshop/training/presentation → yes, interview/podcast/audiobook → no.
- Synthesis or not — almost always yes. Skip only when the user wants raw text only.
Step 2 — Run
node shared/transcribe.mjs "<input_file>" --topic <topic> --slug <slug> [--with-slides]
Run from the repo root (c:\Users\faris\agency-os). The script:
- Extracts audio with ffmpeg (16kHz mono mp3)
- Splits into ~10-min chunks
- Sends chunks to Whisper API in parallel (4 simultaneous)
- Stitches segments back with timestamps
- (if
--with-slides) detects slide changes via ffmpeg scene detection, screenshots each, captions each via Claude Haiku - Writes transcript markdown with inline screenshots
- (if synthesis enabled) Calls Claude Sonnet 4.6 with the topic's existing synthesis file as a style reference, produces a structured synthesis, writes to
synthesis/<slug>.md - Appends a row to
<topic>/README.md
Step 3 — Hand back to the user
After the script finishes, output:
- Full path to the transcript file
- Full path to the book file (if synthesis ran)
- A one-line "what's next" — usually: open the book file and refine the front-matter "core thesis" line in the README index, since the script leaves it as
_fill in core thesis_.
Step 4 — Refine the synthesis
The script's Sonnet pass is a strong V1 but won't match a hand-edited book like 100m-money-models.md in depth. After the run, suggest:
"Open synthesis/<slug>.md — the synthesis is solid but a hand pass to add cross-references to the Cross-Book Playbook (in the topic README) and to flesh out the Application Map will make this much more useful long-term."
Don't do this pass automatically — it's slow and judgment-heavy. Let Faris choose to do it.
Topic-specific notes
hormozi
- The existing
synthesis/100m-money-models.mdis the gold standard the synthesis tries to mimic. - After ingesting, the user typically wants to:
- Update the Cross-Book Playbook in the topic README with new principles
- Update the Application Map with new tactics for Faris's businesses
- These are hand-edits, not script work. Suggest them but don't do them automatically.
Other topics
- If the topic folder has no existing book, synthesis still runs but with no style reference — output will be more generic. The first ingestion sets the style for the rest.
Common pitfalls
- Wrong path quoting on Windows — wrap input file in double quotes when it contains spaces.
- Very long videos (>6hr) — synthesis truncates to 350K chars (~5hr of speech). For 8hr+ content, split into two sessions and run twice with different slugs.
- Slide detection too aggressive/loose — threshold is 0.35. If you get hundreds of screenshots from a talking-head clip, re-run without
--with-slides. If you miss obvious slide changes in a deck-heavy video, editSCENE_THRESHOLDinshared/transcribe.mjsto 0.25 and re-run. - Whisper rate limits — script runs 4 chunks in parallel. If you hit 429s, lower
PARALLELin the script. - Existing slug — script writes to
transcripts/<slug>.mdandsynthesis/<slug>.mdand overwrites without asking. Use a different--slugto keep both.
Why this exists
Long-form video is the highest-density learning material Faris consumes (Hormozi workshops, Brunson talks, sales training, prospect deep-dives), but it's the hardest to retrieve from. This skill turns that material into structured, agent-readable knowledge that the copywriter, media-buyer, and sales-ops agents can pull from when designing offers, ad copy, and money models.