PORTAL / LIBRARY / video-ad-generator

[ CREATIVE ]

`/video-ad-generator`

This skill is the legacy video ad generator.

Placeholders like ACME Agency, <id> and you@example.com mark values that are per-agency — your install fills them with YOUR clients and accounts. If a section references a helper script you don't have yet, it ships with that workflow's install.

Skill: `/video-ad-generator`

⚠️ DEPRECATED — Use `/<id>` instead

This skill is the legacy video ad generator. The proven, production-tested flow now lives at /<id> which uses:

Per-scene NB2 → Kling 3.0 image-to-video (action-only, no talking heads)
catbox.moe upload (Krea CDN URLs are unreliable for Kling)
ElevenLabs German VO as separate track
CapCut assembly for final polish

Use this legacy skill ONLY if: the user explicitly requests Veo 3.1 cinematic style (text-to-video, no startImage), or if /<id> doesn't fit the task.

For all other video ad work, use /<id>.

Overview (legacy)

Full pipeline for generating a production-ready video ad clip package for a ACME Agency client.

What this skill produces:

Video clips per scene (generated via Krea.ai — Kling 2.6 for UGC/animated, Veo 3.1 for product/cinematic)
Voiceover audio (ElevenLabs TTS, per-scene + full track)
capcut-guide.md — scene order, transitions, music, captions, export settings
All files uploaded to Drive, report posted to client Slack channel

What this skill does NOT do:

Auto-stitch clips (human edits in CapCut/DaVinci using the guide)
Record real video (everything is AI-generated)

Ad types supported:

ugc — AI influencer / person talking to camera (Kling 2.6 + native audio)
product — cinematic product shots, no person (Veo 3.1)
animated — animate a real client photo or logo (Kling 2.6 + startImage)
mixed — combination of the above in the same video
cinematic — all scenes via Veo 3.1, no talking heads, VO narrates over product/lifestyle/text

Style modes (--style flag):

Mode	Flag	Models	Use when
cinematic	`--style cinematic`	All Veo 3.1	Default. Medical, dental, service, product authority ads. No faces, no lip sync issues.
ugc	`--style ugc` (Phase 2)	Kling 2.6 + HeyGen	Testimonials, direct-response. Requires HeyGen integration (not yet built).

Default style: cinematic — use this unless the brief explicitly requires a talking head.

Why cinematic is the right default:

Kling 2.6 UGC has no audio-synced lip sync (mouth moves randomly regardless of audio)
Character identity doesn't persist across Kling jobs without startImageUrl (which fails the API)
Veo 3.1 + ElevenLabs VO produces reliable, professional results with zero AI-look issues

Trigger:

/video-ad-generator [ClientName]
/video-ad-generator ACME Agency --brief "mini implants, seniors audience"
/video-ad-generator ACME Agency --style cinematic ← all scenes Veo 3.1 Fast (default)
/video-ad-generator ACME Agency --style cinematic --hq ← full Veo 3.1 (explicit request only)
/video-ad-generator ACME Agency --draft ← draft run, same Fast model but good for quick test

Critical Files

ACME Agency/scripts/video_ads_generate.mjs — pipeline script (Phases 3–5)
ACME Agency/scripts/lib/krea.mjs — Krea.ai API (image + video)
ACME Agency/scripts/lib/elevenlabs.mjs — ElevenLabs TTS
ACME Agency/scripts/lib/google_drive.mjs — Drive upload
ACME Agency/scripts/lib/slack.mjs — Slack reporting
ACME Agency/clients/clients.json — client registry
ACME Agency/clients/<ClientName>/brand-dna.md — brand context (auto-created if missing)
ACME Agency/clients/<ClientName>/video-ads/<campaign>/script.json — Claude-written scene script
ACME Agency/clients/<ClientName>/video-ads/<campaign>/clips/ — generated MP4s
ACME Agency/clients/<ClientName>/video-ads/<campaign>/voiceover/ — MP3 files
ACME Agency/clients/<ClientName>/video-ads/<campaign>/capcut-guide.md — editing guide

Workflow

Step 0 — Identify Client

Look up the client in ACME Agency/clients/clients.json using the name provided. Extract:

drive_folder_id — for Drive upload
slack_channel — for report
market — for language decisions
elevenlabs_voice_id — if set, use this voice (skip voice selection)

If client not found: tell the user and stop.

Step 1 — Brand Research (Phase 0)

Check for cached brand DNA first:

Read ACME Agency/clients/<ClientName>/brand-dna.md if it exists
Check if it has a Video Character and Video Tone section
If both exist: use it as-is, skip to Step 2

If brand-dna.md is missing or lacks video sections:

Read ACME Agency/clients/<ClientName>/CLIENT.md for base brand context
Scrape the client's website using Firecrawl (ACME Agency/scripts/lib/firecrawl.mjs) — extract messaging, colors, visual style, target audience
Screenshot the logo using ACME Agency/scripts/lib/logo_prepare.mjs if reference_assets.logo is not in clients.json
Inspect ACME Agency/clients/<ClientName>/product-images/ for visual reference
Write or append to brand-dna.md — must include these video-specific sections:

## Video Character
[Describe the ideal on-camera subject — age, gender, ethnicity, wardrobe, style, energy.
Example: Croatian woman, 35-45, warm and approachable, wearing casual-professional clothing,
natural makeup, speaking directly to camera, relatable not polished. OR: no person needed —
product-only cinematic style.]

## Video Tone
[Pace: fast/punchy | slow/aspirational | conversational]
[Music style: upbeat pop | cinematic | lo-fi | none]
[Emotion target: trust | urgency | aspiration | warmth]
[Language: Croatian | English | German]

Save updated brand-dna.md locally
Upload to client's Drive folder

Step 2 — Script & Storyboard (Phase 1)

Brief collection:

If --brief was provided: use it
If not: ask the user 4 quick questions:

What's this campaign about? (product, offer, key message)
Target audience? (who are we speaking to)
Ad length target? (15s Reel / 30s Reel / 60s longer-form) — default: 30s
Ad type? (ugc / product / animated / mixed) — default: mixed

Script writing rules:

15s = 2–3 scenes (5–7s each)
30s = 3–5 scenes (5–10s each)
60s = 5–8 scenes (5–10s each)
Every ad MUST have a strong hook scene (first 3 seconds)
UGC scenes: character speaks directly, natural motion, one clear sentence per scene (say it aloud — if it takes >8s to speak, it's too long)
Product scenes: no dialogue, strong visual action (orbit, pour, closeup, reveal)
Animated scenes: use startImageUrl pointing to a real client photo URL

VO calibration — CRITICAL:

ElevenLabs speaks Croatian at ~2.3–2.5 words/second (faster than you expect).
Formula: voScript word count = total_video_seconds × 2.4
Example: 5 scenes × 5s + 2 Veo scenes × 6s = 27s total → write 65 words in voScript.
Per-scene dialogue: scene_duration × 2.4 = max words that character can say in that clip.
5s UGC scene → max 12 words. 6s scene → max 14 words.
Write the voScript FIRST, count the words, verify against the formula before proceeding.
A mismatch means the audio ends well before the video does — the #1 editing failure.

No real person impersonation — REQUIRED:

NEVER write a scene prompt that tries to show a named real person's face talking.
"Dr. [Name] speaking" or using a reference photo of a real person to generate their face = fails visually + is ethically wrong for paid ads.
AI video cannot replicate a specific real person. Do not attempt it.

Instead, for authority/clinic scenes:
- Clinic interior shot (no face)
- White coat from behind, side, or hands-only
- Dental model close-up
- Text/graphic card (stat, CTA, brand)
- Generic invented doctor character (no name, no reference image)

The real doctor's voice CAN appear in the voiceover narration — just not their AI-generated face.

Two prompts per scene — nbPrompt (image) + prompt (animation):

Every scene MUST have both:

nbPrompt — describes the static start frame for Nano Banana 2 to generate. Think: what does the photograph look like? Composition, subjects, lighting, colors, style.
prompt — describes the animation/motion for Kling to apply to that image. Think: what HAPPENS? Camera movement, hand gestures, object interactions.

nbPrompt formula (image — what the frame looks like):

[Composition/angle] of [subject/object description] —
[environment/surface/background] —
[lighting style] — [brand colors if relevant] —
photorealistic, [aspect ratio]

prompt formula (animation — what happens in the clip):

[Camera movement] — [subject action + motion description] —
[dialogue in quotes if speaking] —
[ambient sound/audio direction]

Example UGC scene:

{
  "nbPrompt": "Croatian woman in her 40s, warm smile, casual beige top, standing in bright Zagreb street, facing camera, warm natural light, photorealistic, 9:16 vertical",
  "prompt": "Handheld tracking shot — woman walks forward speaking directly to camera — says 'Jeste li umorni od proteza koje ispadaju?' — soft natural window light, warm color grade, no background music"
}

Example product/cinematic scene:

{
  "nbPrompt": "Polished titanium mini implant on clean white clinical surface, warm amber accent lighting, shallow depth of field, product photography, ACME Agency amber #D59C44 accent glow, photorealistic, 9:16 vertical",
  "prompt": "Cinematic slow 360° orbit around the implant — hero product reveal — slight camera drift, shallow depth of field — no speech, no music"
}

Write the script.json file to ACME Agency/clients/<ClientName>/video-ads/<campaign-slug>/script.json:

{
  "client": "<ClientName>",
  "campaign": "<campaign-slug>",
  "adType": "mixed",
  "targetDuration": 30,
  "language": "Croatian",
  "voScript": "<Full narration text in order — every line of dialogue from every scene>",
  "scenes": [
    {
      "index": 1,
      "type": "ugc",
      "duration": 5,
      "aspectRatio": "9:16",
      "prompt": "<full Krea video animation prompt — what Kling should do with the image>",
      "nbPrompt": "<Nano Banana image prompt — detailed description of the start frame image to generate>",
      "dialogue": "<spoken text for this scene, or empty string>",
      "startImageUrl": null,
      "notes": "<optional human note for the editor>"
    },
    {
      "index": 2,
      "type": "product",
      "duration": 7,
      "aspectRatio": "9:16",
      "prompt": "<full Krea video animation prompt>",
      "nbPrompt": "<Nano Banana image prompt for this scene's start frame>",
      "dialogue": "",
      "startImageUrl": null,
      "notes": ""
    }
  ]
}

nbPrompt vs prompt — IMPORTANT distinction:

nbPrompt = the image prompt. Nano Banana 2 generates a photorealistic start frame from this. Describe the static scene: who/what is in frame, composition, lighting, style. This is what the viewer sees at frame 0.
prompt = the animation prompt. Kling 2.6 takes the generated image and animates it. Describe the motion: camera movement, hand gestures, object interactions, transitions. This is what HAPPENS in the clip.
The pipeline generates the nbPrompt image first, then passes it to Kling as startImage for animation.
Always include nbPrompt for every scene. This is the default workflow — it produces dramatically better results than text-to-video alone.

Example:

{
  "nbPrompt": "Overhead view of a formal German salary document on oak desk, number 77.400 EUR clearly visible, warm office light, photorealistic, 9:16",
  "prompt": "Cinematic close-up — hands slowly unfold the salary document, camera racks focus to the number, slight camera push-in, warm golden light"
}

Campaign slug format: <keyword>-<audience>-<YYYY-MM> e.g. <id>

After writing script.json, show the user a summary table before proceeding:

Scene | Type     | Duration | Hook / Dialogue
------|----------|----------|------------------
  1   | UGC      |   5s     | "Jeste li umorni..."
  2   | Product  |   7s     | [titanium implant orbit]
  3   | UGC      |   8s     | "Za samo 2 sata..."
  4   | Product  |   5s     | [before/after closeup]
  5   | UGC      |   5s     | "Zovite nas danas."

Ask: "Proceed with video generation, or make changes to any scene?"

Step 2b — Reference Images (Now Automated)

Image generation is now handled automatically by the pipeline. When a scene has nbPrompt set (which should be every scene), the script generates the Nano Banana 2 image and passes it to Kling as startImage — no manual step needed.

You only need to manually set startImageUrl for:

Animated scenes using real client photos — look in ACME Agency/clients/<ClientName>/product-images/
Or use the logo reference from reference_assets.logo in clients.json
Upload to Drive and use the public URL

Step 3–5 — Video, Voice, Package (run the script)

Once script.json is written and reference images are set, run the pipeline:

node ACME Agency/scripts/video_ads_generate.mjs "<ClientName>" \
  --script "ACME Agency/clients/<ClientName>/video-ads/<campaign>/script.json" \
  [--voice <voice_id_or_name>] \
  [--resolution 1080p] \
  [--draft] \
  [--persona-from-hero] \
  [--clone-voice] \
  [--no-slack] \
  [--no-drive] \
  [--no-voice]

Image-to-video (default): Every scene with nbPrompt automatically generates a Nano Banana 2 image first, then Kling animates it. No flag needed — this happens by default.

--persona-from-hero: After scene 1 renders, extracts a frame (ffmpeg) and uses it as a reference image for all subsequent Nano Banana prompts. Use for UGC ads where a consistent person must appear across scenes.

--clone-voice: Extracts audio from the hero clip (scene 1) and clones it via ElevenLabs Instant Voice Cloning. All scene voiceovers then use the cloned voice. Use when the hero clip has native audio with a character voice you want to match.

--provider krea|kling: Video generation provider (default: krea).

krea — uses Krea proxy API with Kling 2.6 / Veo 3.1. Default, proven in production.
kling — uses KlingAI direct API with Kling 3.0. Better motion quality, 3-15s durations, pro mode. Requires KLING_ACCESS_KEY + KLING_SECRET_KEY in .env.
Image generation (Nano Banana 2) always stays on Krea regardless of provider.

Voice selection:

If client.elevenlabs_voice_id is set in clients.json → pass it as --voice
If not set: run node ACME Agency/scripts/video_ads_generate.mjs "ClientName" --list-voices first and inspect the output
For Croatian scripts: <id> model handles pronunciation IF the text has correct diacritics (š, đ, č, ž, ć). Always verify diacritics in the script before running.
Voice character matters for Croatian: prefer voices labeled "middle-aged", "warm", "conversational", or European-sounding names (Charlotte, Matilda, Freya, Liam). Avoid voices designed exclusively for American English.
DO NOT assume any voice will "just work" for Croatian — test with --no-drive --no-slack on a single scene first if unsure.

Draft mode (--draft):

Uses google/veo-3.1-fast for all scenes instead of Kling 2.6 / Veo 3.1
Faster and cheaper — good for testing the script before a full run
Output quality is lower but sufficient for script ACME Agencyw

Resolution:

Default: 1080p — always use for final delivery
720p for draft/test runs to save credits

Output Structure

ACME Agency/clients/<ClientName>/video-ads/<campaign>/
├── script.json                ← Claude-written scene script (input to the script)
├── clips/
│   ├── scene-01-ugc.mp4
│   ├── scene-02-product.mp4
│   └── ...
├── voiceover/
│   ├── voiceover-full.mp3     ← full narration track
│   ├── scene-01-vo.mp3        ← per-scene splits
│   └── ...
├── reference-images/
│   ├── character-front.png
│   └── character-34.png
└── capcut-guide.md            ← editing guide for CapCut/DaVinci

ACME Agency/clients/<ClientName>/video-ads/
└── video_ads_manifest.json    ← generation history

Drive folder: <ClientDriveFolder>/Video Ads/<Year>/<Month>/<campaign>/

Tips

Test before full batch: add --draft --no-slack --no-drive for a fast test run on 1 scene (modify script.json to 1 scene temporarily)
Re-run single scene: edit script.json to only include that scene's entry, run again with --no-drive then merge manually
Voice not right? Run --list-voices, try a different voice. For Croatian, any multilingual voice works — adjust stability (lower = more expressive, higher = more controlled)
Credit cost estimate: Kling 2.6 ≈ 20–40 units per 5s clip; Veo 3.1 ≈ 80–120 units per 5s clip; ElevenLabs ≈ 1 credit per 1000 characters
Character continuity: using the same startImageUrl for all UGC scenes is the strongest consistency mechanism. Generate the reference image first (Step 2b) and reuse it across all UGC scenes
Audio in UGC clips: generateAudio: true is passed for ugc/animated scenes — Kling generates ambient sound/lip-sync audio native to the clip. ElevenLabs VO goes on top. When editing, mute the clip audio track and use only the ElevenLabs VO track for clean narration
Richer brand DNA = better prompts: if results look off-brand, improve the Video Character and Video Tone sections and re-run

Enhanced UGC Mode — Persona-Driven Pipeline

An advanced workflow for creating realistic UGC-style ads with consistent character identity across all scenes. Inspired by the "hero clip → persona reference → scene-by-scene animation" approach used by top e-commerce brands.

When to use: When the brief requires a consistent AI person across multiple scenes (testimonials, product ACME Agencyws, lifestyle content) and you want full creative control over every scene (vs. HeyGen's black-box approach).

How It Works

Scene 1 = Hero Clip — generated via Kling 2.6 with generateAudio: true. This is the opening "talking to camera" hook (5–15s).
Persona Extraction (--persona-from-hero) — ffmpeg extracts a clear frame from the hero clip at 2s. This PNG becomes the character reference for all subsequent Nano Banana image prompts.
Scene-by-Scene Images — for scenes 2+, if nbPrompt is set in script.json, Nano Banana 2 generates each scene's start frame using the persona PNG as imageUrls[0]. This locks the character's face/appearance.
Scene-by-Scene Animation — Kling 2.6 animates each Nano Banana image via startImage, producing 5s clips.
Voice Cloning (--clone-voice) — ffmpeg extracts the audio from the hero clip. ElevenLabs clones this voice (Instant Voice Cloning). All subsequent scene voiceovers use the cloned voice for consistency.
CapCut Assembly — you get separate clips + per-scene VO files + a CapCut guide.

Script Format for Persona Mode

Each scene in script.json can now include an nbPrompt field — the Nano Banana image prompt that will be generated with persona reference before video generation:

{
  "scenes": [
    {
      "index": 1,
      "type": "ugc",
      "duration": 10,
      "prompt": "Handheld selfie — Croatian woman in 30s, just finished running, whips out phone, speaks to camera...",
      "dialogue": "Upravo sam trčala 5 kilometara i nisam umorna...",
      "startImageUrl": null,
      "notes": "Hero clip — persona source"
    },
    {
      "index": 2,
      "type": "ugc",
      "duration": 5,
      "prompt": "Animate the person in the image — she picks up the product bottle and looks at it with curiosity...",
      "nbPrompt": "The person in the reference image, same face and hair, standing in a modern kitchen, holding a green supplement bottle, warm morning light through window, photorealistic",
      "dialogue": "Ovaj napitak pijem svaki dan već tjedan dana...",
      "startImageUrl": null,
      "notes": "Scene image auto-generated from hero persona"
    }
  ]
}

CLI Flags

node ACME Agency/scripts/video_ads_generate.mjs "ClientName" \
  --script path/to/script.json \
  --persona-from-hero \    # Extract frame from scene 1 → ref for all NB images
  --clone-voice \          # Clone voice from scene 1 audio → use for all VO
  [--no-drive] [--no-slack]

Key Constraints

Kling 2.6 has no lip sync — mouth movements are random. This is fine because ElevenLabs VO is overlaid and the b-roll scenes are action-focused, not direct-to-camera talking (except the hero clip).
Character drift — Nano Banana + persona reference gets you ~80-90% consistency. Slight variations in hair, clothing, or skin tone may occur. The hero clip sets the tone; b-roll scenes are more forgiving.
Voice cloning needs ~10s of speech — the hero clip should have at least 10s of clear spoken audio for good clone quality.
Kling 3.0 not yet on Krea — currently using Kling 2.6 via Krea API. Kling 3.0 (better motion, multi-shot) is available via direct KlingAI API but not yet integrated. Upgrade path: add kling/kling-3.0 to VIDEO_MODELS in krea.mjs when Krea supports it.

Comparison: HeyGen vs Persona Pipeline

	HeyGen (`/heygen-ad-generator`)	Persona Pipeline (`/video-ad-generator --persona-from-hero`)
Control	Low — HeyGen chooses b-roll, pacing, transitions	Full — you define every scene, image, and cut
Output	Single finished MP4	Separate clips + VO → CapCut assembly
Lip sync	Excellent	None (VO overlay)
Character consistency	Handled by HeyGen	Persona reference + Nano Banana
Editing time	0 min	10-15 min in CapCut
Best for	Quick talking-head presenter ads	Cinematic UGC hybrids, lifestyle, testimonials

/video-ad-generator

Skill: /video-ad-generator

⚠️ DEPRECATED — Use /<id> instead