# /heygen-ad-generator

> Generates a finished HeyGen Video Agent ad — no CapCut, no ElevenLabs, no Krea.

# Skill: `/heygen-ad-generator`

## Overview

Generates a finished HeyGen Video Agent ad — no CapCut, no ElevenLabs, no Krea. One prompt → HeyGen handles voice, b-roll, editing, text overlays, and music.

### `scenes` mode is the DEFAULT first choice for video ads (avatar-less)

For a normal video ad, use **`--mode scenes`**: a pure b-roll + kinetic-text + voiceover montage with **NO avatar**. It's cheap (**~$0.50/video**, billed to the API dollar balance — see cost note below), fast, scalable, and pulls the client's **brand colors automatically** from their `design-theme.json`. This is what to reach for unless the brief specifically wants a presenter speaking to camera. (Validated on ACME Agency 2026-06-23 — `ACME Agency/clients/ACME Agency/video-ads/heygen-scenes-batch/`.)

Use the avatar modes (`agent` / `multi-scene`) only when the brief explicitly wants a person talking to camera (testimonial / founder pitch / explainer).

**Cost model (important):** in the API beta, Video Agent bills the **API dollar balance**, not plan credits. `getRemainingQuota()` reads `GET /v2/user/remaining_quota` where **60 units = $1**; `plan_credit` (e.g. 200) is a separate counter and is untouched. A ~14–20s scenes video ≈ **$0.47–0.66**. The `scenes` pipeline prints the per-video cost and remaining balance, and puts the cost in the manifest + Slack post.

**What this skill produces:**
- Finished MP4 (1080×1920, 9:16) — ready to upload directly to Meta/TikTok/Stories
- AI avatar talking-head mixed with b-roll cutaways
- Brand-colored text overlays synced to speech
- Background music at -20dB
- All files in Google Drive + Slack report with download link

**What this skill does NOT do:**
- Guarantee a specific voice (Video Agent picks autonomously — guide it via `voiceDescription`)
- Guarantee avatar appearance (guide it via `avatarPersona` — works well for ethnicity/age/clothing)
- Use ElevenLabs, Krea, or Creatomate

**When to use:**
- Any client needing an AI presenter video ad (Croatian, German, English, any language)
- Fast-turnaround UGC-style ads (same day delivery)
- When cinematic Krea footage is overkill — avatar + b-roll is enough
- Multiple angles/scripts for the same campaign (different hooks, A/B testing)

**Trigger:**
- `/heygen-ad-generator ClientName`
- `/heygen-ad-generator ACME Agency --brief "GKV → PKV switching, employed Germans, 30s"`
- `/heygen-ad-generator ACME Agency --script "ACME Agency/clients/ACME Agency/video-ads/campaign/script.json"`

---

## Critical Files

- `ACME Agency/scripts/heygen_ads_generate.mjs` — pipeline script
- `ACME Agency/scripts/lib/heygen.mjs` — HeyGen API wrapper (`buildVideoAgentPrompt`, `generateVideoAgentAd`)
- `ACME Agency/scripts/lib/google_drive.mjs` — Drive upload
- `ACME Agency/scripts/lib/slack.mjs` — Slack reporting
- `ACME Agency/clients/clients.json` — client registry
- `ACME Agency/clients/<ClientName>/brand-dna.md` — brand context
- `ACME Agency/clients/<ClientName>/video-ads/<campaign>/script.json` — script

---

## Scenes Script Format (DEFAULT for video — avatar-less)

Write this to `ACME Agency/clients/<Client>/video-ads/<campaign-slug>/script.json`:

```json
{
  "client": "ACME Agency",
  "campaign": "<id>",
  "mode": "scenes",
  "language": "hr",
  "duration": 20,
  "vo": "Inspektor je na vratima. A evidencije? ... Pošaljite upit za besplatnu provjeru.",
  "scenes": [
    "[0-2s] HOOK: a restaurant glass door opens and a food-safety inspector with a clipboard steps in, tense cinematic lighting. Boxed overlay (top): \"Inspektor je stigao.\"",
    "[2-6s] frantic hands flipping through messy paper binders on a kitchen counter. Boxed overlay: \"Gdje su evidencije?\"",
    "[6-10s] calm close-up of a hand holding a smartphone with coral checkmarks. Boxed overlay: \"Sve na mobitelu.\"",
    "[10-15s] confident owner relaxed in a spotless modern kitchen. Boxed overlay: \"Spremni za inspekciju.\"",
    "[15-19s] solid navy end card. Centered boxed overlay: \"Pošaljite upit\""
  ]
}
```

- `vo` = the **exact** voiceover words (target language). Word budget ≈ `duration × 2.3` (see VO formula below). HeyGen often shortens to ~14s even if you ask 20s — keep the VO tight.
- `scenes` = an array of **scene-direction strings**, each ending with `Boxed overlay: "<short Croatian line>"`. Keep overlays 3–6 words.
- **Brand colors auto-resolve** from `ACME Agency/clients/<Client>/design-theme.json` (navy=`brand`, coral=`accent`). Override per-campaign with `"brandColors": { "primary": "#your-channel", "accent": "#your-channel" }`.
- **Casting / ethnicity auto-localizes to the market.** HeyGen's b-roll otherwise defaults to a globally-diverse mix, which reads wrong for a local ad (e.g. Black/Asian people in a Croatian B2B ad). `buildScenesPrompt()` now injects a `CASTING` directive from the `language` (hr/bs/sr/sl → white Southern/Central European locals; de → Central/Northern European; en → generic Western). Override per-campaign with a `"casting"` string in `script.json` when the audience isn't the default for that language. See memory `<id>`.
- The proven reliability rules (boxed captions, hard no-English, centered/capped CTA) are baked into `buildScenesPrompt()` — you do NOT repeat them in the script.

Run it:
```bash
node ACME Agency/scripts/heygen_ads_generate.mjs "ACME Agency" \
  --script "ACME Agency/clients/ACME Agency/video-ads/<campaign>/script.json" \
  --mode scenes [--no-slack] [--drive]
```
Prints the per-video cost + remaining balance. Generate 3–4 variants (different hooks) by writing 3–4 script.json files and running each — each ≈ $0.50.

---

## Script Format (Avatar Video Agent mode — only when a presenter is wanted)

Write this to `ACME Agency/clients/<ClientName>/video-ads/<campaign-slug>/script.json`:

```json
{
  "client": "ACME Agency",
  "campaign": "<id>",
  "adType": "heygen",
  "mode": "agent",
  "language": "de",
  "duration": 30,
  "avatarGender": "male",
  "avatarPersona": "Northern European male, 40s, dark business suit, clean-shaven or light stubble, confident and authoritative",
  "voiceDescription": "German male, 40s, clear northern German accent, calm and authoritative",
  "brandColors": {
    "primary": "#your-channel",
    "accent": "#C9A84C",
    "text": "#FFFFFF"
  },
  "musicMood": "professional",
  "fullScript": "Seit Ihrem letzten Jobwechsel zahlen Sie den GKV-Höchstbeitrag...",
  "sceneGuidance": [
    { "timing": "0-8s", "mediaType": "stock footage", "visual": "German professional at laptop in bright office", "overlay": "Sie zahlen GKV-Maximum" },
    { "timing": "8-18s", "mediaType": "motion graphics", "visual": "Animated bar chart: GKV €900 vs PKV €600 monthly", "overlay": "Sparen Sie bis zu €300/Monat" },
    { "timing": "18-25s", "mediaType": "stock footage", "visual": "Person ACME Agencywing documents at home desk, relaxed", "overlay": "" },
    { "timing": "25-30s", "mediaType": "avatar closeup", "visual": "Avatar speaks CTA directly to camera", "overlay": "Jetzt kostenlosen Vergleich anfragen" }
  ]
}
```

**Campaign slug:** `<keyword>-<audience/offer>-<YYYY-MM>` e.g. `<id>`

---

## VO Word Count Formula — CRITICAL

```
HeyGen speech rate ≈ 2.3–2.5 words/second
Formula: duration × 2.4 = target word count
Example: 30s = 72 words | 25s = 60 words | 20s = 48 words
```

Count words in `fullScript` BEFORE saving. **Stay at or under the target — never go over.**
- Too short → HeyGen fills remaining time with b-roll (fine)
- Too long → audio gets cut off or rushed at the end ← most common problem
- Croatian/German tends to run slightly slower than English — use 2.3 as the multiplier for safety: `duration × 2.3`

---

## Avatar Persona Guide

| Market | Language | Default Persona |
|--------|----------|----------------|
| Croatia / Bosnia | hr / bs | Southern European female, 35-45, warm and professional, business casual |
| Germany | de | Northern European male, 40-50, dark business suit, clean-shaven, authoritative |
| Austria | de | Central European male/female, 35-50, serious and trustworthy |
| English B2B | en | Professional Western, 35-50, confident expert, business formal |
| English B2C | en | Relatable, matches target demographic (age/style) |

**Tips:**
- Describe ethnicity + age + clothing — this directly shapes HeyGen's avatar and b-roll choices
- For trust-based offers (insurance, medical, finance) → male presenter + formal clothing
- For lifestyle/wellness/consumer → female presenter + warm business casual
- Avoid vague descriptions like "professional" alone — be specific: "Northern European male, dark suit, clean-shaven"

---

## Music Moods

| Mood | Use when |
|------|----------|
| `professional` | Insurance, B2B, clinics, authority services |
| `energetic` | Fitness, youth, product reveals, urgency |
| `warm` | Lifestyle, family, wellness, emotional appeal |
| `emotional` | Testimonials, transformation stories |
| `neutral` | Luxury, real estate, minimalist brands |

---

## Scene Guidance Tips

Write **3–5 scene guidance blocks**. Each block tells HeyGen what to show and when.

- `mediaType` options: `stock footage` / `motion graphics` / `avatar closeup` / `AI generated`
- Use `motion graphics` for numbers, stats, comparisons (employer contribution, savings amounts)
- Use `stock footage` for lifestyle/context shots (office, family, product use)
- Use `avatar closeup` for CTA scene — avatar speaking directly to camera works best for CTAs
- `overlay` = text that appears on screen. Keep short: max 5-7 words
- Empty `overlay` means no text overlay for that scene

---

## Preflight (run BEFORE any HeyGen API call)

HeyGen credits are expensive and the Video Agent has a long polling timeout — failing fast saves real money. Validate everything BEFORE submitting the job.

1. **Client exists** in `clients.json`. Resolve canonical key.
2. **`HEYGEN_API_KEY` set** in `.env`. If missing → abort.
3. **`orientation` is `portrait` or `landscape`** — NEVER `9:16` (HeyGen API silently rejects shortcut ratios). See CLAUDE.md `## AI Generation API Constraints`.
4. **Multi-scene mode**: every scene has a valid `speaker_id`. Multi-scene `text_overlay` requires ALL fields: `type`, `font_family`, `font_size`, `font_weight`, `color`, `line_height`, `position`, `text_align` — missing fields = silent failure.
5. **Polling timeout is ≥ 20 minutes** in the script call. Less than 20 min = false negative on slow renders.
6. **Language is supported** by HeyGen's voice library for the selected voice. List with `--list-voices <lang>` if uncertain.
7. **VO word count matches duration** per the calibration table in this SKILL.md (`## VO Word Count Formula`). If mismatch → trim or extend before submission.
8. **`drive_folder_id` reachable** if `--drive` flag passed.
9. **Slack channel resolves** if reporting enabled.
10. **HeyGen API dollar balance** check — the Video Agent (`scenes`/`agent`) endpoint bills the **API dollar balance** (`getRemainingQuota().usd`, 60 units = $1), **NOT** `plan_credit`. The scenes path now **hard-aborts** when balance < $0.70/video (`--ignore-balance` to override). For a batch of N videos, ensure balance ≥ N × $0.70 first. A near-empty wallet otherwise fails mid-render as a generic "unknown error"; the 200 `plan_credit` counter is a red herring (untouched, never billed, no fallback). See memory `<id>`.

If all checks pass, log "preflight: OK (mode=<agent|multi-scene>, duration=<n>s, language=<x>)" and proceed.

---

## Workflow

### Step 0 — Client lookup

Read `ACME Agency/clients/clients.json`. Extract:
- `drive_folder_id` — Drive upload target
- `slack_channel` — Slack report destination
- `market` — language context (Croatia, Germany, etc.)
- `heygen_voice_id` — pinned voice if set (multi-scene mode only)

---

### Step 1 — Brand research

Check `ACME Agency/clients/<ClientName>/brand-dna.md`. Need at minimum:
- Accent hex color, primary/background color
- Brand tone (for music mood selection)
- Language/market

If brand-dna.md doesn't exist: scrape website via Firecrawl, extract colors and tone, write brand-dna.md first.

---

### Step 2 — Script

**If `--script` provided:** read, validate, show summary.

**If `--brief` provided:** use it directly to write the script.

**If neither:** ask 4 questions:
1. What's this campaign about? (offer, key message, hook angle)
2. Target audience? (age, job, situation)
3. Duration? (20s / 25s / 30s) — default: 30s
4. Tone? (authoritative / warm / energetic) — informs musicMood + avatarPersona

**Write script.json** to `video-ads/<campaign>/script.json`.

**Count words in fullScript** — verify against `duration × 2.4`.

**Show confirmation before running:**
```
Campaign:  <id> | Duration: 30s | Music: professional
Avatar:    Northern European male, 40s, dark suit | Language: German
Script:    72 words ✓ (30s × 2.4)
Scenes:    4 (stock footage → motion graphics → stock footage → avatar CTA)
Proceed?
```

---

### Step 3 — Run the pipeline

```bash
node ACME Agency/scripts/heygen_ads_generate.mjs "ClientName" \
  --script "ACME Agency/clients/ClientName/video-ads/<campaign>/script.json" \
  --mode agent \
  [--avatar <avatar_id>]   # optional pin
  [--no-slack] [--drive]
```

**Drive upload is OFF by default.** Video goes local + HeyGen link only. Add `--drive` only when the video is final and ready to archive to `Klijenti/<ClientName>/`.

**Pipeline phases:**
- **Phase A** — `buildVideoAgentPrompt(script)` → structured prompt string
- **Phase B** — `generateVideoAgentAd()` → POST /v1/video_agent/generate → polls ~10-15 min → downloads MP4
- **Phase C** — Drive upload only if `--drive` flag passed
- **Phase D** — Slack: campaign info + HeyGen edit link (+ Drive link if uploaded)

**Render time:** ~10-15 minutes per video. Use `--no-slack` for silent local testing.

---

## Output

```
ACME Agency/clients/<ClientName>/video-ads/<campaign>/
├── script.json
├── final-ad.mp4       ← finished 1080×1920 MP4, upload-ready
└── manifest.json

Drive: Klijenti/<ClientName>/Video Ads/<Year>/<campaign>/
Slack: video ready + Drive folder URL + direct download link
```

## Verification (run AFTER pipeline completes — confirm the video shipped)

Check ALL of these before declaring done:

- [ ] `final-ad.mp4` exists locally AND file size > 500 KB (HeyGen sometimes returns broken stubs on credit exhaustion)
- [ ] `manifest.json` records: `videoId`, `mode`, `duration`, `language`, `aspectRatio`, scene count
- [ ] If `--drive`: Drive folder created at the expected path AND `final-ad.mp4` uploaded with public read URL
- [ ] HeyGen task status = `completed` (not `failed`, not `processing` swallowed)
- [ ] Slack post via slack-reporter returned `ts` non-null
- [ ] If multi-scene: every scene's `speaker_id` rendered (no silent fallback to default avatar)
- [ ] Reported language matches the requested language
- [ ] If `--no-drive` was passed but a Drive URL was somehow generated, that's fine — log it
- [ ] If multiple videos were generated in one run (multi-hook campaign): every video has its own HeyGen URL in the report (NEVER local paths in the Slack summary)

If any check fails, name the gap explicitly. Never claim success when verification fails.

---

## Multi-Video Summary Report

When generating multiple videos (e.g. 3 hooks for one campaign), the final summary you post **must use HeyGen URLs, never local file paths**. Local paths are inaccessible to team members on Slack.

**After all videos finish:**
1. Read each campaign's `manifest.json` → get `_assets.heygenUrl`
2. Build the summary table with HeyGen links:

```
| # | Hook | HeyGen Link |
|---|------|-------------|
| 1 | Pain Point — "Dok vi ovo gledate..." | https://app.heygen.com/video/xxx |
| 2 | Social Proof — "687 novih upita..." | https://app.heygen.com/video/yyy |
| 3 | Us vs Them — "Platili ste agenciju..." | https://app.heygen.com/video/zzz |
```

3. If `--drive` was used, add the Drive folder link below the table.

**Rule:** Never include local file paths (like `ACME Agency/clients/.../final-ad.mp4`) in the report — they mean nothing to Slack users. Use `heygenUrl` from the manifest for every video.

---

## Tips

- **Test locally first:** `--no-slack` — skips Slack, Drive is already off by default
- **Voice not consistent?** This is Video Agent behavior — guide via `voiceDescription`. For guaranteed voice consistency, use `--mode multi-scene` with per-scene `voiceId` (see `heygen_ads_generate.mjs`)
- **Avatar looks wrong?** Be more specific in `avatarPersona` — add ethnicity, exact age range, clothing color
- **Too much talking head?** The pacing instructions in the prompt (`cut every 3-4 seconds, ≤50% avatar screen time`) help — but HeyGen has creative freedom. If still too static, try regenerating
- **Multiple variants:** Generate 2-3 variants of the same script by running pipeline multiple times (each generation is slightly different) — good for A/B testing
- **Credits:** ~2 credits/min for Video Agent. 30s video ≈ 1 credit. Monitor at app.heygen.com

---

## Multi-Scene Mode (fallback)

Use when you need guaranteed voice consistency (pinned `voice_id` per scene). Produces stiffer lip sync but never switches voices.

Script format: use `scenes[]` array instead of `fullScript` + `sceneGuidance`.

```bash
node ACME Agency/scripts/heygen_ads_generate.mjs "ACME Agency" \
  --script "ACME Agency/clients/ACME Agency/video-ads/<campaign>/script.json" \
  --mode multi-scene
```

See `ACME Agency/clients/ACME Agency/video-ads/<id>/script.json` for example.
