[ CREATIVE ]
/heygen-ad-generator
Generates a finished HeyGen Video Agent ad — no CapCut, no ElevenLabs, no Krea.
ACME Agency, <id> and you@example.com mark values that are per-agency — your install fills them with YOUR clients and accounts. If a section references a helper script you don't have yet, it ships with that workflow's install.Skill: /heygen-ad-generator
Overview
Generates a finished HeyGen Video Agent ad — no CapCut, no ElevenLabs, no Krea. One prompt → HeyGen handles voice, b-roll, editing, text overlays, and music.
scenes mode is the DEFAULT first choice for video ads (avatar-less)
For a normal video ad, use --mode scenes: a pure b-roll + kinetic-text + voiceover montage with NO avatar. It's cheap (~$0.50/video, billed to the API dollar balance — see cost note below), fast, scalable, and pulls the client's brand colors automatically from their design-theme.json. This is what to reach for unless the brief specifically wants a presenter speaking to camera. (Validated on ACME Agency 2026-06-23 — ACME Agency/clients/ACME Agency/video-ads/heygen-scenes-batch/.)
Use the avatar modes (agent / multi-scene) only when the brief explicitly wants a person talking to camera (testimonial / founder pitch / explainer).
Cost model (important): in the API beta, Video Agent bills the API dollar balance, not plan credits. getRemainingQuota() reads GET /v2/user/remaining_quota where 60 units = $1; plan_credit (e.g. 200) is a separate counter and is untouched. A ~14–20s scenes video ≈ $0.47–0.66. The scenes pipeline prints the per-video cost and remaining balance, and puts the cost in the manifest + Slack post.
What this skill produces:
- Finished MP4 (1080×1920, 9:16) — ready to upload directly to Meta/TikTok/Stories
- AI avatar talking-head mixed with b-roll cutaways
- Brand-colored text overlays synced to speech
- Background music at -20dB
- All files in Google Drive + Slack report with download link
What this skill does NOT do:
- Guarantee a specific voice (Video Agent picks autonomously — guide it via
voiceDescription) - Guarantee avatar appearance (guide it via
avatarPersona— works well for ethnicity/age/clothing) - Use ElevenLabs, Krea, or Creatomate
When to use:
- Any client needing an AI presenter video ad (Croatian, German, English, any language)
- Fast-turnaround UGC-style ads (same day delivery)
- When cinematic Krea footage is overkill — avatar + b-roll is enough
- Multiple angles/scripts for the same campaign (different hooks, A/B testing)
Trigger:
/heygen-ad-generator ClientName/heygen-ad-generator ACME Agency --brief "GKV → PKV switching, employed Germans, 30s"/heygen-ad-generator ACME Agency --script "ACME Agency/clients/ACME Agency/video-ads/campaign/script.json"
Critical Files
ACME Agency/scripts/heygen_ads_generate.mjs— pipeline scriptACME Agency/scripts/lib/heygen.mjs— HeyGen API wrapper (buildVideoAgentPrompt,generateVideoAgentAd)ACME Agency/scripts/lib/google_drive.mjs— Drive uploadACME Agency/scripts/lib/slack.mjs— Slack reportingACME Agency/clients/clients.json— client registryACME Agency/clients/<ClientName>/brand-dna.md— brand contextACME Agency/clients/<ClientName>/video-ads/<campaign>/script.json— script
Scenes Script Format (DEFAULT for video — avatar-less)
Write this to ACME Agency/clients/<Client>/video-ads/<campaign-slug>/script.json:
{
"client": "ACME Agency",
"campaign": "<id>",
"mode": "scenes",
"language": "hr",
"duration": 20,
"vo": "Inspektor je na vratima. A evidencije? ... Pošaljite upit za besplatnu provjeru.",
"scenes": [
"[0-2s] HOOK: a restaurant glass door opens and a food-safety inspector with a clipboard steps in, tense cinematic lighting. Boxed overlay (top): \"Inspektor je stigao.\"",
"[2-6s] frantic hands flipping through messy paper binders on a kitchen counter. Boxed overlay: \"Gdje su evidencije?\"",
"[6-10s] calm close-up of a hand holding a smartphone with coral checkmarks. Boxed overlay: \"Sve na mobitelu.\"",
"[10-15s] confident owner relaxed in a spotless modern kitchen. Boxed overlay: \"Spremni za inspekciju.\"",
"[15-19s] solid navy end card. Centered boxed overlay: \"Pošaljite upit\""
]
}
vo= the exact voiceover words (target language). Word budget ≈duration × 2.3(see VO formula below). HeyGen often shortens to ~14s even if you ask 20s — keep the VO tight.scenes= an array of scene-direction strings, each ending withBoxed overlay: "<short Croatian line>". Keep overlays 3–6 words.- Brand colors auto-resolve from
ACME Agency/clients/<Client>/design-theme.json(navy=brand, coral=accent). Override per-campaign with"brandColors": { "primary": "#your-channel", "accent": "#your-channel" }. - Casting / ethnicity auto-localizes to the market. HeyGen's b-roll otherwise defaults to a globally-diverse mix, which reads wrong for a local ad (e.g. Black/Asian people in a Croatian B2B ad).
buildScenesPrompt()now injects aCASTINGdirective from thelanguage(hr/bs/sr/sl → white Southern/Central European locals; de → Central/Northern European; en → generic Western). Override per-campaign with a"casting"string inscript.jsonwhen the audience isn't the default for that language. See memory<id>. - The proven reliability rules (boxed captions, hard no-English, centered/capped CTA) are baked into
buildScenesPrompt()— you do NOT repeat them in the script.
Run it:
node ACME Agency/scripts/heygen_ads_generate.mjs "ACME Agency" \
--script "ACME Agency/clients/ACME Agency/video-ads/<campaign>/script.json" \
--mode scenes [--no-slack] [--drive]
Prints the per-video cost + remaining balance. Generate 3–4 variants (different hooks) by writing 3–4 script.json files and running each — each ≈ $0.50.
Script Format (Avatar Video Agent mode — only when a presenter is wanted)
Write this to ACME Agency/clients/<ClientName>/video-ads/<campaign-slug>/script.json:
{
"client": "ACME Agency",
"campaign": "<id>",
"adType": "heygen",
"mode": "agent",
"language": "de",
"duration": 30,
"avatarGender": "male",
"avatarPersona": "Northern European male, 40s, dark business suit, clean-shaven or light stubble, confident and authoritative",
"voiceDescription": "German male, 40s, clear northern German accent, calm and authoritative",
"brandColors": {
"primary": "#your-channel",
"accent": "#C9A84C",
"text": "#FFFFFF"
},
"musicMood": "professional",
"fullScript": "Seit Ihrem letzten Jobwechsel zahlen Sie den GKV-Höchstbeitrag...",
"sceneGuidance": [
{ "timing": "0-8s", "mediaType": "stock footage", "visual": "German professional at laptop in bright office", "overlay": "Sie zahlen GKV-Maximum" },
{ "timing": "8-18s", "mediaType": "motion graphics", "visual": "Animated bar chart: GKV €900 vs PKV €600 monthly", "overlay": "Sparen Sie bis zu €300/Monat" },
{ "timing": "18-25s", "mediaType": "stock footage", "visual": "Person ACME Agencywing documents at home desk, relaxed", "overlay": "" },
{ "timing": "25-30s", "mediaType": "avatar closeup", "visual": "Avatar speaks CTA directly to camera", "overlay": "Jetzt kostenlosen Vergleich anfragen" }
]
}
Campaign slug: <keyword>-<audience/offer>-<YYYY-MM> e.g. <id>
VO Word Count Formula — CRITICAL
HeyGen speech rate ≈ 2.3–2.5 words/second
Formula: duration × 2.4 = target word count
Example: 30s = 72 words | 25s = 60 words | 20s = 48 words
Count words in fullScript BEFORE saving. Stay at or under the target — never go over.
- Too short → HeyGen fills remaining time with b-roll (fine)
- Too long → audio gets cut off or rushed at the end ← most common problem
- Croatian/German tends to run slightly slower than English — use 2.3 as the multiplier for safety:
duration × 2.3
Avatar Persona Guide
| Market | Language | Default Persona |
|---|---|---|
| Croatia / Bosnia | hr / bs | Southern European female, 35-45, warm and professional, business casual |
| Germany | de | Northern European male, 40-50, dark business suit, clean-shaven, authoritative |
| Austria | de | Central European male/female, 35-50, serious and trustworthy |
| English B2B | en | Professional Western, 35-50, confident expert, business formal |
| English B2C | en | Relatable, matches target demographic (age/style) |
Tips:
- Describe ethnicity + age + clothing — this directly shapes HeyGen's avatar and b-roll choices
- For trust-based offers (insurance, medical, finance) → male presenter + formal clothing
- For lifestyle/wellness/consumer → female presenter + warm business casual
- Avoid vague descriptions like "professional" alone — be specific: "Northern European male, dark suit, clean-shaven"
Music Moods
| Mood | Use when |
|---|---|
professional | Insurance, B2B, clinics, authority services |
energetic | Fitness, youth, product reveals, urgency |
warm | Lifestyle, family, wellness, emotional appeal |
emotional | Testimonials, transformation stories |
neutral | Luxury, real estate, minimalist brands |
Scene Guidance Tips
Write 3–5 scene guidance blocks. Each block tells HeyGen what to show and when.
mediaTypeoptions:stock footage/motion graphics/avatar closeup/AI generated- Use
motion graphicsfor numbers, stats, comparisons (employer contribution, savings amounts) - Use
stock footagefor lifestyle/context shots (office, family, product use) - Use
avatar closeupfor CTA scene — avatar speaking directly to camera works best for CTAs overlay= text that appears on screen. Keep short: max 5-7 words- Empty
overlaymeans no text overlay for that scene
Preflight (run BEFORE any HeyGen API call)
HeyGen credits are expensive and the Video Agent has a long polling timeout — failing fast saves real money. Validate everything BEFORE submitting the job.
- Client exists in
clients.json. Resolve canonical key. HEYGEN_API_KEYset in.env. If missing → abort.orientationisportraitorlandscape— NEVER9:16(HeyGen API silently rejects shortcut ratios). See CLAUDE.md## AI Generation API Constraints.- Multi-scene mode: every scene has a valid
speaker_id. Multi-scenetext_overlayrequires ALL fields:type,font_family,font_size,font_weight,color,line_height,position,text_align— missing fields = silent failure. - Polling timeout is ≥ 20 minutes in the script call. Less than 20 min = false negative on slow renders.
- Language is supported by HeyGen's voice library for the selected voice. List with
--list-voices <lang>if uncertain. - VO word count matches duration per the calibration table in this SKILL.md (
## VO Word Count Formula). If mismatch → trim or extend before submission. drive_folder_idreachable if--driveflag passed.- Slack channel resolves if reporting enabled.
- HeyGen API dollar balance check — the Video Agent (
scenes/agent) endpoint bills the API dollar balance (getRemainingQuota().usd, 60 units = $1), NOTplan_credit. The scenes path now hard-aborts when balance < $0.70/video (--ignore-balanceto override). For a batch of N videos, ensure balance ≥ N × $0.70 first. A near-empty wallet otherwise fails mid-render as a generic "unknown error"; the 200plan_creditcounter is a red herring (untouched, never billed, no fallback). See memory<id>.
If all checks pass, log "preflight: OK (mode=<agent|multi-scene>, duration=<n>s, language=<x>)" and proceed.
Workflow
Step 0 — Client lookup
Read ACME Agency/clients/clients.json. Extract:
drive_folder_id— Drive upload targetslack_channel— Slack report destinationmarket— language context (Croatia, Germany, etc.)heygen_voice_id— pinned voice if set (multi-scene mode only)
Step 1 — Brand research
Check ACME Agency/clients/<ClientName>/brand-dna.md. Need at minimum:
- Accent hex color, primary/background color
- Brand tone (for music mood selection)
- Language/market
If brand-dna.md doesn't exist: scrape website via Firecrawl, extract colors and tone, write brand-dna.md first.
Step 2 — Script
If --script provided: read, validate, show summary.
If --brief provided: use it directly to write the script.
If neither: ask 4 questions:
- What's this campaign about? (offer, key message, hook angle)
- Target audience? (age, job, situation)
- Duration? (20s / 25s / 30s) — default: 30s
- Tone? (authoritative / warm / energetic) — informs musicMood + avatarPersona
Write script.json to video-ads/<campaign>/script.json.
Count words in fullScript — verify against duration × 2.4.
Show confirmation before running:
Campaign: <id> | Duration: 30s | Music: professional
Avatar: Northern European male, 40s, dark suit | Language: German
Script: 72 words ✓ (30s × 2.4)
Scenes: 4 (stock footage → motion graphics → stock footage → avatar CTA)
Proceed?
Step 3 — Run the pipeline
node ACME Agency/scripts/heygen_ads_generate.mjs "ClientName" \
--script "ACME Agency/clients/ClientName/video-ads/<campaign>/script.json" \
--mode agent \
[--avatar <avatar_id>] # optional pin
[--no-slack] [--drive]
Drive upload is OFF by default. Video goes local + HeyGen link only. Add --drive only when the video is final and ready to archive to Klijenti/<ClientName>/.
Pipeline phases:
- Phase A —
buildVideoAgentPrompt(script)→ structured prompt string - Phase B —
generateVideoAgentAd()→ POST /v1/video_agent/generate → polls ~10-15 min → downloads MP4 - Phase C — Drive upload only if
--driveflag passed - Phase D — Slack: campaign info + HeyGen edit link (+ Drive link if uploaded)
Render time: ~10-15 minutes per video. Use --no-slack for silent local testing.
Output
ACME Agency/clients/<ClientName>/video-ads/<campaign>/
├── script.json
├── final-ad.mp4 ← finished 1080×1920 MP4, upload-ready
└── manifest.json
Drive: Klijenti/<ClientName>/Video Ads/<Year>/<campaign>/
Slack: video ready + Drive folder URL + direct download link
Verification (run AFTER pipeline completes — confirm the video shipped)
Check ALL of these before declaring done:
- [ ]
final-ad.mp4exists locally AND file size > 500 KB (HeyGen sometimes returns broken stubs on credit exhaustion) - [ ]
manifest.jsonrecords:videoId,mode,duration,language,aspectRatio, scene count - [ ] If
--drive: Drive folder created at the expected path ANDfinal-ad.mp4uploaded with public read URL - [ ] HeyGen task status =
completed(notfailed, notprocessingswallowed) - [ ] Slack post via slack-reporter returned
tsnon-null - [ ] If multi-scene: every scene's
speaker_idrendered (no silent fallback to default avatar) - [ ] Reported language matches the requested language
- [ ] If
--no-drivewas passed but a Drive URL was somehow generated, that's fine — log it - [ ] If multiple videos were generated in one run (multi-hook campaign): every video has its own HeyGen URL in the report (NEVER local paths in the Slack summary)
If any check fails, name the gap explicitly. Never claim success when verification fails.
Multi-Video Summary Report
When generating multiple videos (e.g. 3 hooks for one campaign), the final summary you post must use HeyGen URLs, never local file paths. Local paths are inaccessible to team members on Slack.
After all videos finish:
- Read each campaign's
manifest.json→ get_assets.heygenUrl - Build the summary table with HeyGen links:
| # | Hook | HeyGen Link |
|---|------|-------------|
| 1 | Pain Point — "Dok vi ovo gledate..." | https://app.heygen.com/video/xxx |
| 2 | Social Proof — "687 novih upita..." | https://app.heygen.com/video/yyy |
| 3 | Us vs Them — "Platili ste agenciju..." | https://app.heygen.com/video/zzz |
- If
--drivewas used, add the Drive folder link below the table.
Rule: Never include local file paths (like ACME Agency/clients/.../final-ad.mp4) in the report — they mean nothing to Slack users. Use heygenUrl from the manifest for every video.
Tips
- Test locally first:
--no-slack— skips Slack, Drive is already off by default - Voice not consistent? This is Video Agent behavior — guide via
voiceDescription. For guaranteed voice consistency, use--mode multi-scenewith per-scenevoiceId(seeheygen_ads_generate.mjs) - Avatar looks wrong? Be more specific in
avatarPersona— add ethnicity, exact age range, clothing color - Too much talking head? The pacing instructions in the prompt (
cut every 3-4 seconds, ≤50% avatar screen time) help — but HeyGen has creative freedom. If still too static, try regenerating - Multiple variants: Generate 2-3 variants of the same script by running pipeline multiple times (each generation is slightly different) — good for A/B testing
- Credits: ~2 credits/min for Video Agent. 30s video ≈ 1 credit. Monitor at app.heygen.com
Multi-Scene Mode (fallback)
Use when you need guaranteed voice consistency (pinned voice_id per scene). Produces stiffer lip sync but never switches voices.
Script format: use scenes[] array instead of fullScript + sceneGuidance.
node ACME Agency/scripts/heygen_ads_generate.mjs "ACME Agency" \
--script "ACME Agency/clients/ACME Agency/video-ads/<campaign>/script.json" \
--mode multi-scene
See ACME Agency/clients/ACME Agency/video-ads/<id>/script.json for example.