PORTAL / LIBRARY / video-ad-generator

[ CREATIVE ]

/video-ad-generator

This skill is the legacy video ad generator.

Download the skill file (.md)

Placeholders like ACME Agency, <id> and you@example.com mark values that are per-agency — your install fills them with YOUR clients and accounts. If a section references a helper script you don't have yet, it ships with that workflow's install.

Skill: /video-ad-generator

⚠️ DEPRECATED — Use /<id> instead

This skill is the legacy video ad generator. The proven, production-tested flow now lives at /<id> which uses:

Use this legacy skill ONLY if: the user explicitly requests Veo 3.1 cinematic style (text-to-video, no startImage), or if /<id> doesn't fit the task.

For all other video ad work, use /<id>.


Overview (legacy)

Full pipeline for generating a production-ready video ad clip package for a ACME Agency client.

What this skill produces:

What this skill does NOT do:

Ad types supported:

Style modes (--style flag):

ModeFlagModelsUse when
cinematic--style cinematicAll Veo 3.1Default. Medical, dental, service, product authority ads. No faces, no lip sync issues.
ugc--style ugc (Phase 2)Kling 2.6 + HeyGenTestimonials, direct-response. Requires HeyGen integration (not yet built).

Default style: cinematic — use this unless the brief explicitly requires a talking head.

Why cinematic is the right default:

Trigger:


Critical Files


Workflow

Step 0 — Identify Client

Look up the client in ACME Agency/clients/clients.json using the name provided. Extract:

If client not found: tell the user and stop.


Step 1 — Brand Research (Phase 0)

Check for cached brand DNA first:

If brand-dna.md is missing or lacks video sections:

  1. Read ACME Agency/clients/<ClientName>/CLIENT.md for base brand context
  2. Scrape the client's website using Firecrawl (ACME Agency/scripts/lib/firecrawl.mjs) — extract messaging, colors, visual style, target audience
  3. Screenshot the logo using ACME Agency/scripts/lib/logo_prepare.mjs if reference_assets.logo is not in clients.json
  4. Inspect ACME Agency/clients/<ClientName>/product-images/ for visual reference
  5. Write or append to brand-dna.md — must include these video-specific sections:
## Video Character
[Describe the ideal on-camera subject — age, gender, ethnicity, wardrobe, style, energy.
Example: Croatian woman, 35-45, warm and approachable, wearing casual-professional clothing,
natural makeup, speaking directly to camera, relatable not polished. OR: no person needed —
product-only cinematic style.]

## Video Tone
[Pace: fast/punchy | slow/aspirational | conversational]
[Music style: upbeat pop | cinematic | lo-fi | none]
[Emotion target: trust | urgency | aspiration | warmth]
[Language: Croatian | English | German]
  1. Save updated brand-dna.md locally
  2. Upload to client's Drive folder

Step 2 — Script & Storyboard (Phase 1)

Brief collection:

  1. What's this campaign about? (product, offer, key message)
  2. Target audience? (who are we speaking to)
  3. Ad length target? (15s Reel / 30s Reel / 60s longer-form) — default: 30s
  4. Ad type? (ugc / product / animated / mixed) — default: mixed

Script writing rules:

VO calibration — CRITICAL:

ElevenLabs speaks Croatian at ~2.3–2.5 words/second (faster than you expect).
Formula: voScript word count = total_video_seconds × 2.4
Example: 5 scenes × 5s + 2 Veo scenes × 6s = 27s total → write 65 words in voScript.
Per-scene dialogue: scene_duration × 2.4 = max words that character can say in that clip.
5s UGC scene → max 12 words. 6s scene → max 14 words.
Write the voScript FIRST, count the words, verify against the formula before proceeding.
A mismatch means the audio ends well before the video does — the #1 editing failure.

No real person impersonation — REQUIRED:

NEVER write a scene prompt that tries to show a named real person's face talking.
"Dr. [Name] speaking" or using a reference photo of a real person to generate their face = fails visually + is ethically wrong for paid ads.
AI video cannot replicate a specific real person. Do not attempt it.

Instead, for authority/clinic scenes:
- Clinic interior shot (no face)
- White coat from behind, side, or hands-only
- Dental model close-up
- Text/graphic card (stat, CTA, brand)
- Generic invented doctor character (no name, no reference image)

The real doctor's voice CAN appear in the voiceover narration — just not their AI-generated face.

Two prompts per scene — nbPrompt (image) + prompt (animation):

Every scene MUST have both:

nbPrompt formula (image — what the frame looks like):

[Composition/angle] of [subject/object description] —
[environment/surface/background] —
[lighting style] — [brand colors if relevant] —
photorealistic, [aspect ratio]

prompt formula (animation — what happens in the clip):

[Camera movement] — [subject action + motion description] —
[dialogue in quotes if speaking] —
[ambient sound/audio direction]

Example UGC scene:

{
  "nbPrompt": "Croatian woman in her 40s, warm smile, casual beige top, standing in bright Zagreb street, facing camera, warm natural light, photorealistic, 9:16 vertical",
  "prompt": "Handheld tracking shot — woman walks forward speaking directly to camera — says 'Jeste li umorni od proteza koje ispadaju?' — soft natural window light, warm color grade, no background music"
}

Example product/cinematic scene:

{
  "nbPrompt": "Polished titanium mini implant on clean white clinical surface, warm amber accent lighting, shallow depth of field, product photography, ACME Agency amber #D59C44 accent glow, photorealistic, 9:16 vertical",
  "prompt": "Cinematic slow 360° orbit around the implant — hero product reveal — slight camera drift, shallow depth of field — no speech, no music"
}

Write the script.json file to ACME Agency/clients/<ClientName>/video-ads/<campaign-slug>/script.json:

{
  "client": "<ClientName>",
  "campaign": "<campaign-slug>",
  "adType": "mixed",
  "targetDuration": 30,
  "language": "Croatian",
  "voScript": "<Full narration text in order — every line of dialogue from every scene>",
  "scenes": [
    {
      "index": 1,
      "type": "ugc",
      "duration": 5,
      "aspectRatio": "9:16",
      "prompt": "<full Krea video animation prompt — what Kling should do with the image>",
      "nbPrompt": "<Nano Banana image prompt — detailed description of the start frame image to generate>",
      "dialogue": "<spoken text for this scene, or empty string>",
      "startImageUrl": null,
      "notes": "<optional human note for the editor>"
    },
    {
      "index": 2,
      "type": "product",
      "duration": 7,
      "aspectRatio": "9:16",
      "prompt": "<full Krea video animation prompt>",
      "nbPrompt": "<Nano Banana image prompt for this scene's start frame>",
      "dialogue": "",
      "startImageUrl": null,
      "notes": ""
    }
  ]
}

nbPrompt vs prompt — IMPORTANT distinction:

Example:

{
  "nbPrompt": "Overhead view of a formal German salary document on oak desk, number 77.400 EUR clearly visible, warm office light, photorealistic, 9:16",
  "prompt": "Cinematic close-up — hands slowly unfold the salary document, camera racks focus to the number, slight camera push-in, warm golden light"
}

Campaign slug format: <keyword>-<audience>-<YYYY-MM> e.g. <id>

After writing script.json, show the user a summary table before proceeding:

Scene | Type     | Duration | Hook / Dialogue
------|----------|----------|------------------
  1   | UGC      |   5s     | "Jeste li umorni..."
  2   | Product  |   7s     | [titanium implant orbit]
  3   | UGC      |   8s     | "Za samo 2 sata..."
  4   | Product  |   5s     | [before/after closeup]
  5   | UGC      |   5s     | "Zovite nas danas."

Ask: "Proceed with video generation, or make changes to any scene?"


Step 2b — Reference Images (Now Automated)

Image generation is now handled automatically by the pipeline. When a scene has nbPrompt set (which should be every scene), the script generates the Nano Banana 2 image and passes it to Kling as startImage — no manual step needed.

You only need to manually set startImageUrl for:


Step 3–5 — Video, Voice, Package (run the script)

Once script.json is written and reference images are set, run the pipeline:

node ACME Agency/scripts/video_ads_generate.mjs "<ClientName>" \
  --script "ACME Agency/clients/<ClientName>/video-ads/<campaign>/script.json" \
  [--voice <voice_id_or_name>] \
  [--resolution 1080p] \
  [--draft] \
  [--persona-from-hero] \
  [--clone-voice] \
  [--no-slack] \
  [--no-drive] \
  [--no-voice]

Image-to-video (default): Every scene with nbPrompt automatically generates a Nano Banana 2 image first, then Kling animates it. No flag needed — this happens by default.

--persona-from-hero: After scene 1 renders, extracts a frame (ffmpeg) and uses it as a reference image for all subsequent Nano Banana prompts. Use for UGC ads where a consistent person must appear across scenes.

--clone-voice: Extracts audio from the hero clip (scene 1) and clones it via ElevenLabs Instant Voice Cloning. All scene voiceovers then use the cloned voice. Use when the hero clip has native audio with a character voice you want to match.

--provider krea|kling: Video generation provider (default: krea).

Voice selection:

Draft mode (--draft):

Resolution:


Output Structure

ACME Agency/clients/<ClientName>/video-ads/<campaign>/
├── script.json                ← Claude-written scene script (input to the script)
├── clips/
│   ├── scene-01-ugc.mp4
│   ├── scene-02-product.mp4
│   └── ...
├── voiceover/
│   ├── voiceover-full.mp3     ← full narration track
│   ├── scene-01-vo.mp3        ← per-scene splits
│   └── ...
├── reference-images/
│   ├── character-front.png
│   └── character-34.png
└── capcut-guide.md            ← editing guide for CapCut/DaVinci

ACME Agency/clients/<ClientName>/video-ads/
└── video_ads_manifest.json    ← generation history

Drive folder: <ClientDriveFolder>/Video Ads/<Year>/<Month>/<campaign>/


Tips


Enhanced UGC Mode — Persona-Driven Pipeline

An advanced workflow for creating realistic UGC-style ads with consistent character identity across all scenes. Inspired by the "hero clip → persona reference → scene-by-scene animation" approach used by top e-commerce brands.

When to use: When the brief requires a consistent AI person across multiple scenes (testimonials, product ACME Agencyws, lifestyle content) and you want full creative control over every scene (vs. HeyGen's black-box approach).

How It Works

  1. Scene 1 = Hero Clip — generated via Kling 2.6 with generateAudio: true. This is the opening "talking to camera" hook (5–15s).
  2. Persona Extraction (--persona-from-hero) — ffmpeg extracts a clear frame from the hero clip at 2s. This PNG becomes the character reference for all subsequent Nano Banana image prompts.
  3. Scene-by-Scene Images — for scenes 2+, if nbPrompt is set in script.json, Nano Banana 2 generates each scene's start frame using the persona PNG as imageUrls[0]. This locks the character's face/appearance.
  4. Scene-by-Scene Animation — Kling 2.6 animates each Nano Banana image via startImage, producing 5s clips.
  5. Voice Cloning (--clone-voice) — ffmpeg extracts the audio from the hero clip. ElevenLabs clones this voice (Instant Voice Cloning). All subsequent scene voiceovers use the cloned voice for consistency.
  6. CapCut Assembly — you get separate clips + per-scene VO files + a CapCut guide.

Script Format for Persona Mode

Each scene in script.json can now include an nbPrompt field — the Nano Banana image prompt that will be generated with persona reference before video generation:

{
  "scenes": [
    {
      "index": 1,
      "type": "ugc",
      "duration": 10,
      "prompt": "Handheld selfie — Croatian woman in 30s, just finished running, whips out phone, speaks to camera...",
      "dialogue": "Upravo sam trčala 5 kilometara i nisam umorna...",
      "startImageUrl": null,
      "notes": "Hero clip — persona source"
    },
    {
      "index": 2,
      "type": "ugc",
      "duration": 5,
      "prompt": "Animate the person in the image — she picks up the product bottle and looks at it with curiosity...",
      "nbPrompt": "The person in the reference image, same face and hair, standing in a modern kitchen, holding a green supplement bottle, warm morning light through window, photorealistic",
      "dialogue": "Ovaj napitak pijem svaki dan već tjedan dana...",
      "startImageUrl": null,
      "notes": "Scene image auto-generated from hero persona"
    }
  ]
}

CLI Flags

node ACME Agency/scripts/video_ads_generate.mjs "ClientName" \
  --script path/to/script.json \
  --persona-from-hero \    # Extract frame from scene 1 → ref for all NB images
  --clone-voice \          # Clone voice from scene 1 audio → use for all VO
  [--no-drive] [--no-slack]

Key Constraints

Comparison: HeyGen vs Persona Pipeline

HeyGen (/heygen-ad-generator)Persona Pipeline (/video-ad-generator --persona-from-hero)
ControlLow — HeyGen chooses b-roll, pacing, transitionsFull — you define every scene, image, and cut
OutputSingle finished MP4Separate clips + VO → CapCut assembly
Lip syncExcellentNone (VO overlay)
Character consistencyHandled by HeyGenPersona reference + Nano Banana
Editing time0 min10-15 min in CapCut
Best forQuick talking-head presenter adsCinematic UGC hybrids, lifestyle, testimonials