PORTAL / LIBRARY / heygen-ad-generator

[ CREATIVE ]

`/heygen-ad-generator`

Generates a finished HeyGen Video Agent ad — no CapCut, no ElevenLabs, no Krea.

Placeholders like ACME Agency, <id> and you@example.com mark values that are per-agency — your install fills them with YOUR clients and accounts. If a section references a helper script you don't have yet, it ships with that workflow's install.

Skill: `/heygen-ad-generator`

Overview

Generates a finished HeyGen Video Agent ad — no CapCut, no ElevenLabs, no Krea. One prompt → HeyGen handles voice, b-roll, editing, text overlays, and music.

`scenes` mode is the DEFAULT first choice for video ads (avatar-less)

For a normal video ad, use --mode scenes: a pure b-roll + kinetic-text + voiceover montage with NO avatar. It's cheap (~$0.50/video, billed to the API dollar balance — see cost note below), fast, scalable, and pulls the client's brand colors automatically from their design-theme.json. This is what to reach for unless the brief specifically wants a presenter speaking to camera. (Validated on ACME Agency 2026-06-23 — ACME Agency/clients/ACME Agency/video-ads/heygen-scenes-batch/.)

Use the avatar modes (agent / multi-scene) only when the brief explicitly wants a person talking to camera (testimonial / founder pitch / explainer).

Cost model (important): in the API beta, Video Agent bills the API dollar balance, not plan credits. getRemainingQuota() reads GET /v2/user/remaining_quota where 60 units = $1; plan_credit (e.g. 200) is a separate counter and is untouched. A ~14–20s scenes video ≈ $0.47–0.66. The scenes pipeline prints the per-video cost and remaining balance, and puts the cost in the manifest + Slack post.

What this skill produces:

Finished MP4 (1080×1920, 9:16) — ready to upload directly to Meta/TikTok/Stories
AI avatar talking-head mixed with b-roll cutaways
Brand-colored text overlays synced to speech
Background music at -20dB
All files in Google Drive + Slack report with download link

What this skill does NOT do:

Guarantee a specific voice (Video Agent picks autonomously — guide it via voiceDescription)
Guarantee avatar appearance (guide it via avatarPersona — works well for ethnicity/age/clothing)
Use ElevenLabs, Krea, or Creatomate

When to use:

Any client needing an AI presenter video ad (Croatian, German, English, any language)
Fast-turnaround UGC-style ads (same day delivery)
When cinematic Krea footage is overkill — avatar + b-roll is enough
Multiple angles/scripts for the same campaign (different hooks, A/B testing)

Trigger:

/heygen-ad-generator ClientName
/heygen-ad-generator ACME Agency --brief "GKV → PKV switching, employed Germans, 30s"
/heygen-ad-generator ACME Agency --script "ACME Agency/clients/ACME Agency/video-ads/campaign/script.json"

Critical Files

ACME Agency/scripts/heygen_ads_generate.mjs — pipeline script
ACME Agency/scripts/lib/heygen.mjs — HeyGen API wrapper (buildVideoAgentPrompt, generateVideoAgentAd)
ACME Agency/scripts/lib/google_drive.mjs — Drive upload
ACME Agency/scripts/lib/slack.mjs — Slack reporting
ACME Agency/clients/clients.json — client registry
ACME Agency/clients/<ClientName>/brand-dna.md — brand context
ACME Agency/clients/<ClientName>/video-ads/<campaign>/script.json — script

Scenes Script Format (DEFAULT for video — avatar-less)

Write this to ACME Agency/clients/<Client>/video-ads/<campaign-slug>/script.json:

{
  "client": "ACME Agency",
  "campaign": "<id>",
  "mode": "scenes",
  "language": "hr",
  "duration": 20,
  "vo": "Inspektor je na vratima. A evidencije? ... Pošaljite upit za besplatnu provjeru.",
  "scenes": [
    "[0-2s] HOOK: a restaurant glass door opens and a food-safety inspector with a clipboard steps in, tense cinematic lighting. Boxed overlay (top): \"Inspektor je stigao.\"",
    "[2-6s] frantic hands flipping through messy paper binders on a kitchen counter. Boxed overlay: \"Gdje su evidencije?\"",
    "[6-10s] calm close-up of a hand holding a smartphone with coral checkmarks. Boxed overlay: \"Sve na mobitelu.\"",
    "[10-15s] confident owner relaxed in a spotless modern kitchen. Boxed overlay: \"Spremni za inspekciju.\"",
    "[15-19s] solid navy end card. Centered boxed overlay: \"Pošaljite upit\""
  ]
}

vo = the exact voiceover words (target language). Word budget ≈ duration × 2.3 (see VO formula below). HeyGen often shortens to ~14s even if you ask 20s — keep the VO tight.
scenes = an array of scene-direction strings, each ending with Boxed overlay: "<short Croatian line>". Keep overlays 3–6 words.
Brand colors auto-resolve from ACME Agency/clients/<Client>/design-theme.json (navy=brand, coral=accent). Override per-campaign with "brandColors": { "primary": "#your-channel", "accent": "#your-channel" }.
Casting / ethnicity auto-localizes to the market. HeyGen's b-roll otherwise defaults to a globally-diverse mix, which reads wrong for a local ad (e.g. Black/Asian people in a Croatian B2B ad). buildScenesPrompt() now injects a CASTING directive from the language (hr/bs/sr/sl → white Southern/Central European locals; de → Central/Northern European; en → generic Western). Override per-campaign with a "casting" string in script.json when the audience isn't the default for that language. See memory <id>.
The proven reliability rules (boxed captions, hard no-English, centered/capped CTA) are baked into buildScenesPrompt() — you do NOT repeat them in the script.

Run it:

node ACME Agency/scripts/heygen_ads_generate.mjs "ACME Agency" \
  --script "ACME Agency/clients/ACME Agency/video-ads/<campaign>/script.json" \
  --mode scenes [--no-slack] [--drive]

Prints the per-video cost + remaining balance. Generate 3–4 variants (different hooks) by writing 3–4 script.json files and running each — each ≈ $0.50.

Script Format (Avatar Video Agent mode — only when a presenter is wanted)

Write this to ACME Agency/clients/<ClientName>/video-ads/<campaign-slug>/script.json:

{
  "client": "ACME Agency",
  "campaign": "<id>",
  "adType": "heygen",
  "mode": "agent",
  "language": "de",
  "duration": 30,
  "avatarGender": "male",
  "avatarPersona": "Northern European male, 40s, dark business suit, clean-shaven or light stubble, confident and authoritative",
  "voiceDescription": "German male, 40s, clear northern German accent, calm and authoritative",
  "brandColors": {
    "primary": "#your-channel",
    "accent": "#C9A84C",
    "text": "#FFFFFF"
  },
  "musicMood": "professional",
  "fullScript": "Seit Ihrem letzten Jobwechsel zahlen Sie den GKV-Höchstbeitrag...",
  "sceneGuidance": [
    { "timing": "0-8s", "mediaType": "stock footage", "visual": "German professional at laptop in bright office", "overlay": "Sie zahlen GKV-Maximum" },
    { "timing": "8-18s", "mediaType": "motion graphics", "visual": "Animated bar chart: GKV €900 vs PKV €600 monthly", "overlay": "Sparen Sie bis zu €300/Monat" },
    { "timing": "18-25s", "mediaType": "stock footage", "visual": "Person ACME Agencywing documents at home desk, relaxed", "overlay": "" },
    { "timing": "25-30s", "mediaType": "avatar closeup", "visual": "Avatar speaks CTA directly to camera", "overlay": "Jetzt kostenlosen Vergleich anfragen" }
  ]
}

Campaign slug: <keyword>-<audience/offer>-<YYYY-MM> e.g. <id>

VO Word Count Formula — CRITICAL

HeyGen speech rate ≈ 2.3–2.5 words/second
Formula: duration × 2.4 = target word count
Example: 30s = 72 words | 25s = 60 words | 20s = 48 words

Count words in fullScript BEFORE saving. Stay at or under the target — never go over.

Too short → HeyGen fills remaining time with b-roll (fine)
Too long → audio gets cut off or rushed at the end ← most common problem
Croatian/German tends to run slightly slower than English — use 2.3 as the multiplier for safety: duration × 2.3

Avatar Persona Guide

Market	Language	Default Persona
Croatia / Bosnia	hr / bs	Southern European female, 35-45, warm and professional, business casual
Germany	de	Northern European male, 40-50, dark business suit, clean-shaven, authoritative
Austria	de	Central European male/female, 35-50, serious and trustworthy
English B2B	en	Professional Western, 35-50, confident expert, business formal
English B2C	en	Relatable, matches target demographic (age/style)

Tips:

Describe ethnicity + age + clothing — this directly shapes HeyGen's avatar and b-roll choices
For trust-based offers (insurance, medical, finance) → male presenter + formal clothing
For lifestyle/wellness/consumer → female presenter + warm business casual
Avoid vague descriptions like "professional" alone — be specific: "Northern European male, dark suit, clean-shaven"

Music Moods

Mood	Use when
`professional`	Insurance, B2B, clinics, authority services
`energetic`	Fitness, youth, product reveals, urgency
`warm`	Lifestyle, family, wellness, emotional appeal
`emotional`	Testimonials, transformation stories
`neutral`	Luxury, real estate, minimalist brands

Scene Guidance Tips

Write 3–5 scene guidance blocks. Each block tells HeyGen what to show and when.

mediaType options: stock footage / motion graphics / avatar closeup / AI generated
Use motion graphics for numbers, stats, comparisons (employer contribution, savings amounts)
Use stock footage for lifestyle/context shots (office, family, product use)
Use avatar closeup for CTA scene — avatar speaking directly to camera works best for CTAs
overlay = text that appears on screen. Keep short: max 5-7 words
Empty overlay means no text overlay for that scene

Preflight (run BEFORE any HeyGen API call)

HeyGen credits are expensive and the Video Agent has a long polling timeout — failing fast saves real money. Validate everything BEFORE submitting the job.

Client exists in clients.json. Resolve canonical key.
HEYGEN_API_KEY set in .env. If missing → abort.
orientation is portrait or landscape — NEVER 9:16 (HeyGen API silently rejects shortcut ratios). See CLAUDE.md ## AI Generation API Constraints.
Multi-scene mode: every scene has a valid speaker_id. Multi-scene text_overlay requires ALL fields: type, font_family, font_size, font_weight, color, line_height, position, text_align — missing fields = silent failure.
Polling timeout is ≥ 20 minutes in the script call. Less than 20 min = false negative on slow renders.
Language is supported by HeyGen's voice library for the selected voice. List with --list-voices <lang> if uncertain.
VO word count matches duration per the calibration table in this SKILL.md (## VO Word Count Formula). If mismatch → trim or extend before submission.
drive_folder_id reachable if --drive flag passed.
Slack channel resolves if reporting enabled.
HeyGen API dollar balance check — the Video Agent (scenes/agent) endpoint bills the API dollar balance (getRemainingQuota().usd, 60 units = $1), NOT plan_credit. The scenes path now hard-aborts when balance < $0.70/video (--ignore-balance to override). For a batch of N videos, ensure balance ≥ N × $0.70 first. A near-empty wallet otherwise fails mid-render as a generic "unknown error"; the 200 plan_credit counter is a red herring (untouched, never billed, no fallback). See memory <id>.

If all checks pass, log "preflight: OK (mode=<agent|multi-scene>, duration=<n>s, language=<x>)" and proceed.

Workflow

Step 0 — Client lookup

Read ACME Agency/clients/clients.json. Extract:

drive_folder_id — Drive upload target
slack_channel — Slack report destination
market — language context (Croatia, Germany, etc.)
heygen_voice_id — pinned voice if set (multi-scene mode only)

Step 1 — Brand research

Check ACME Agency/clients/<ClientName>/brand-dna.md. Need at minimum:

Accent hex color, primary/background color
Brand tone (for music mood selection)
Language/market

If brand-dna.md doesn't exist: scrape website via Firecrawl, extract colors and tone, write brand-dna.md first.

Step 2 — Script

If --script provided: read, validate, show summary.

If --brief provided: use it directly to write the script.

If neither: ask 4 questions:

What's this campaign about? (offer, key message, hook angle)
Target audience? (age, job, situation)
Duration? (20s / 25s / 30s) — default: 30s
Tone? (authoritative / warm / energetic) — informs musicMood + avatarPersona

Write script.json to video-ads/<campaign>/script.json.

Count words in fullScript — verify against duration × 2.4.

Show confirmation before running:

Campaign:  <id> | Duration: 30s | Music: professional
Avatar:    Northern European male, 40s, dark suit | Language: German
Script:    72 words ✓ (30s × 2.4)
Scenes:    4 (stock footage → motion graphics → stock footage → avatar CTA)
Proceed?

Step 3 — Run the pipeline

node ACME Agency/scripts/heygen_ads_generate.mjs "ClientName" \
  --script "ACME Agency/clients/ClientName/video-ads/<campaign>/script.json" \
  --mode agent \
  [--avatar <avatar_id>]   # optional pin
  [--no-slack] [--drive]

Drive upload is OFF by default. Video goes local + HeyGen link only. Add --drive only when the video is final and ready to archive to Klijenti/<ClientName>/.

Pipeline phases:

Phase A — buildVideoAgentPrompt(script) → structured prompt string
Phase B — generateVideoAgentAd() → POST /v1/video_agent/generate → polls ~10-15 min → downloads MP4
Phase C — Drive upload only if --drive flag passed
Phase D — Slack: campaign info + HeyGen edit link (+ Drive link if uploaded)

Render time: ~10-15 minutes per video. Use --no-slack for silent local testing.

Output

ACME Agency/clients/<ClientName>/video-ads/<campaign>/
├── script.json
├── final-ad.mp4       ← finished 1080×1920 MP4, upload-ready
└── manifest.json

Drive: Klijenti/<ClientName>/Video Ads/<Year>/<campaign>/
Slack: video ready + Drive folder URL + direct download link

Verification (run AFTER pipeline completes — confirm the video shipped)

Check ALL of these before declaring done:

[ ] final-ad.mp4 exists locally AND file size > 500 KB (HeyGen sometimes returns broken stubs on credit exhaustion)
[ ] manifest.json records: videoId, mode, duration, language, aspectRatio, scene count
[ ] If --drive: Drive folder created at the expected path AND final-ad.mp4 uploaded with public read URL
[ ] HeyGen task status = completed (not failed, not processing swallowed)
[ ] Slack post via slack-reporter returned ts non-null
[ ] If multi-scene: every scene's speaker_id rendered (no silent fallback to default avatar)
[ ] Reported language matches the requested language
[ ] If --no-drive was passed but a Drive URL was somehow generated, that's fine — log it
[ ] If multiple videos were generated in one run (multi-hook campaign): every video has its own HeyGen URL in the report (NEVER local paths in the Slack summary)

If any check fails, name the gap explicitly. Never claim success when verification fails.

Multi-Video Summary Report

When generating multiple videos (e.g. 3 hooks for one campaign), the final summary you post must use HeyGen URLs, never local file paths. Local paths are inaccessible to team members on Slack.

After all videos finish:

Read each campaign's manifest.json → get _assets.heygenUrl
Build the summary table with HeyGen links:

| # | Hook | HeyGen Link |
|---|------|-------------|
| 1 | Pain Point — "Dok vi ovo gledate..." | https://app.heygen.com/video/xxx |
| 2 | Social Proof — "687 novih upita..." | https://app.heygen.com/video/yyy |
| 3 | Us vs Them — "Platili ste agenciju..." | https://app.heygen.com/video/zzz |

If --drive was used, add the Drive folder link below the table.

Rule: Never include local file paths (like ACME Agency/clients/.../final-ad.mp4) in the report — they mean nothing to Slack users. Use heygenUrl from the manifest for every video.

Tips

Test locally first: --no-slack — skips Slack, Drive is already off by default
Voice not consistent? This is Video Agent behavior — guide via voiceDescription. For guaranteed voice consistency, use --mode multi-scene with per-scene voiceId (see heygen_ads_generate.mjs)
Avatar looks wrong? Be more specific in avatarPersona — add ethnicity, exact age range, clothing color
Too much talking head? The pacing instructions in the prompt (cut every 3-4 seconds, ≤50% avatar screen time) help — but HeyGen has creative freedom. If still too static, try regenerating
Multiple variants: Generate 2-3 variants of the same script by running pipeline multiple times (each generation is slightly different) — good for A/B testing
Credits: ~2 credits/min for Video Agent. 30s video ≈ 1 credit. Monitor at app.heygen.com

Multi-Scene Mode (fallback)

Use when you need guaranteed voice consistency (pinned voice_id per scene). Produces stiffer lip sync but never switches voices.

Script format: use scenes[] array instead of fullScript + sceneGuidance.

node ACME Agency/scripts/heygen_ads_generate.mjs "ACME Agency" \
  --script "ACME Agency/clients/ACME Agency/video-ads/<campaign>/script.json" \
  --mode multi-scene

See ACME Agency/clients/ACME Agency/video-ads/<id>/script.json for example.

/heygen-ad-generator

Skill: /heygen-ad-generator