# /video-ad-generator

> This skill is the legacy video ad generator.

# Skill: `/video-ad-generator`

## ⚠️ DEPRECATED — Use `/<id>` instead

This skill is the legacy video ad generator. The proven, production-tested flow now lives at **`/<id>`** which uses:
- Per-scene NB2 → Kling 3.0 image-to-video (action-only, no talking heads)
- catbox.moe upload (Krea CDN URLs are unreliable for Kling)
- ElevenLabs German VO as separate track
- CapCut assembly for final polish

**Use this legacy skill ONLY if:** the user explicitly requests Veo 3.1 cinematic style (text-to-video, no startImage), or if `/<id>` doesn't fit the task.

For all other video ad work, use `/<id>`.

---

## Overview (legacy)

Full pipeline for generating a production-ready video ad clip package for a ACME Agency client.

**What this skill produces:**
- Video clips per scene (generated via Krea.ai — Kling 2.6 for UGC/animated, Veo 3.1 for product/cinematic)
- Voiceover audio (ElevenLabs TTS, per-scene + full track)
- `capcut-guide.md` — scene order, transitions, music, captions, export settings
- All files uploaded to Drive, report posted to client Slack channel

**What this skill does NOT do:**
- Auto-stitch clips (human edits in CapCut/DaVinci using the guide)
- Record real video (everything is AI-generated)

**Ad types supported:**
- `ugc` — AI influencer / person talking to camera (Kling 2.6 + native audio)
- `product` — cinematic product shots, no person (Veo 3.1)
- `animated` — animate a real client photo or logo (Kling 2.6 + startImage)
- `mixed` — combination of the above in the same video
- `cinematic` — all scenes via Veo 3.1, no talking heads, VO narrates over product/lifestyle/text

**Style modes (`--style` flag):**

| Mode | Flag | Models | Use when |
|------|------|--------|----------|
| **cinematic** | `--style cinematic` | All Veo 3.1 | Default. Medical, dental, service, product authority ads. No faces, no lip sync issues. |
| **ugc** | `--style ugc` *(Phase 2)* | Kling 2.6 + HeyGen | Testimonials, direct-response. Requires HeyGen integration (not yet built). |

**Default style: `cinematic`** — use this unless the brief explicitly requires a talking head.

Why cinematic is the right default:
- Kling 2.6 UGC has no audio-synced lip sync (mouth moves randomly regardless of audio)
- Character identity doesn't persist across Kling jobs without `startImageUrl` (which fails the API)
- Veo 3.1 + ElevenLabs VO produces reliable, professional results with zero AI-look issues

**Trigger:**
- `/video-ad-generator [ClientName]`
- `/video-ad-generator ACME Agency --brief "mini implants, seniors audience"`
- `/video-ad-generator ACME Agency --style cinematic` ← all scenes Veo 3.1 Fast (default)
- `/video-ad-generator ACME Agency --style cinematic --hq` ← full Veo 3.1 (explicit request only)
- `/video-ad-generator ACME Agency --draft` ← draft run, same Fast model but good for quick test

---

## Critical Files

- `ACME Agency/scripts/video_ads_generate.mjs` — pipeline script (Phases 3–5)
- `ACME Agency/scripts/lib/krea.mjs` — Krea.ai API (image + video)
- `ACME Agency/scripts/lib/elevenlabs.mjs` — ElevenLabs TTS
- `ACME Agency/scripts/lib/google_drive.mjs` — Drive upload
- `ACME Agency/scripts/lib/slack.mjs` — Slack reporting
- `ACME Agency/clients/clients.json` — client registry
- `ACME Agency/clients/<ClientName>/brand-dna.md` — brand context (auto-created if missing)
- `ACME Agency/clients/<ClientName>/video-ads/<campaign>/script.json` — Claude-written scene script
- `ACME Agency/clients/<ClientName>/video-ads/<campaign>/clips/` — generated MP4s
- `ACME Agency/clients/<ClientName>/video-ads/<campaign>/voiceover/` — MP3 files
- `ACME Agency/clients/<ClientName>/video-ads/<campaign>/capcut-guide.md` — editing guide

---

## Workflow

### Step 0 — Identify Client

Look up the client in `ACME Agency/clients/clients.json` using the name provided. Extract:
- `drive_folder_id` — for Drive upload
- `slack_channel` — for report
- `market` — for language decisions
- `elevenlabs_voice_id` — if set, use this voice (skip voice selection)

If client not found: tell the user and stop.

---

### Step 1 — Brand Research (Phase 0)

**Check for cached brand DNA first:**
- Read `ACME Agency/clients/<ClientName>/brand-dna.md` if it exists
- Check if it has a `Video Character` and `Video Tone` section
- If both exist: use it as-is, skip to Step 2

**If brand-dna.md is missing or lacks video sections:**

1. Read `ACME Agency/clients/<ClientName>/CLIENT.md` for base brand context
2. Scrape the client's website using Firecrawl (`ACME Agency/scripts/lib/firecrawl.mjs`) — extract messaging, colors, visual style, target audience
3. Screenshot the logo using `ACME Agency/scripts/lib/logo_prepare.mjs` if `reference_assets.logo` is not in clients.json
4. Inspect `ACME Agency/clients/<ClientName>/product-images/` for visual reference
5. Write or append to `brand-dna.md` — must include these **video-specific sections**:

```
## Video Character
[Describe the ideal on-camera subject — age, gender, ethnicity, wardrobe, style, energy.
Example: Croatian woman, 35-45, warm and approachable, wearing casual-professional clothing,
natural makeup, speaking directly to camera, relatable not polished. OR: no person needed —
product-only cinematic style.]

## Video Tone
[Pace: fast/punchy | slow/aspirational | conversational]
[Music style: upbeat pop | cinematic | lo-fi | none]
[Emotion target: trust | urgency | aspiration | warmth]
[Language: Croatian | English | German]
```

6. Save updated brand-dna.md locally
7. Upload to client's Drive folder

---

### Step 2 — Script & Storyboard (Phase 1)

**Brief collection:**
- If `--brief` was provided: use it
- If not: ask the user 4 quick questions:
  1. What's this campaign about? (product, offer, key message)
  2. Target audience? (who are we speaking to)
  3. Ad length target? (15s Reel / 30s Reel / 60s longer-form) — default: 30s
  4. Ad type? (ugc / product / animated / mixed) — default: mixed

**Script writing rules:**
- 15s = 2–3 scenes (5–7s each)
- 30s = 3–5 scenes (5–10s each)
- 60s = 5–8 scenes (5–10s each)
- Every ad MUST have a strong hook scene (first 3 seconds)
- UGC scenes: character speaks directly, natural motion, one clear sentence per scene (say it aloud — if it takes >8s to speak, it's too long)
- Product scenes: no dialogue, strong visual action (orbit, pour, closeup, reveal)
- Animated scenes: use `startImageUrl` pointing to a real client photo URL

**VO calibration — CRITICAL:**
```
ElevenLabs speaks Croatian at ~2.3–2.5 words/second (faster than you expect).
Formula: voScript word count = total_video_seconds × 2.4
Example: 5 scenes × 5s + 2 Veo scenes × 6s = 27s total → write 65 words in voScript.
Per-scene dialogue: scene_duration × 2.4 = max words that character can say in that clip.
5s UGC scene → max 12 words. 6s scene → max 14 words.
Write the voScript FIRST, count the words, verify against the formula before proceeding.
A mismatch means the audio ends well before the video does — the #1 editing failure.
```

**No real person impersonation — REQUIRED:**
```
NEVER write a scene prompt that tries to show a named real person's face talking.
"Dr. [Name] speaking" or using a reference photo of a real person to generate their face = fails visually + is ethically wrong for paid ads.
AI video cannot replicate a specific real person. Do not attempt it.

Instead, for authority/clinic scenes:
- Clinic interior shot (no face)
- White coat from behind, side, or hands-only
- Dental model close-up
- Text/graphic card (stat, CTA, brand)
- Generic invented doctor character (no name, no reference image)

The real doctor's voice CAN appear in the voiceover narration — just not their AI-generated face.
```

**Two prompts per scene — `nbPrompt` (image) + `prompt` (animation):**

Every scene MUST have both:
- `nbPrompt` — describes the **static start frame** for Nano Banana 2 to generate. Think: what does the photograph look like? Composition, subjects, lighting, colors, style.
- `prompt` — describes the **animation/motion** for Kling to apply to that image. Think: what HAPPENS? Camera movement, hand gestures, object interactions.

**nbPrompt formula (image — what the frame looks like):**
```
[Composition/angle] of [subject/object description] —
[environment/surface/background] —
[lighting style] — [brand colors if relevant] —
photorealistic, [aspect ratio]
```

**prompt formula (animation — what happens in the clip):**
```
[Camera movement] — [subject action + motion description] —
[dialogue in quotes if speaking] —
[ambient sound/audio direction]
```

Example UGC scene:
```json
{
  "nbPrompt": "Croatian woman in her 40s, warm smile, casual beige top, standing in bright Zagreb street, facing camera, warm natural light, photorealistic, 9:16 vertical",
  "prompt": "Handheld tracking shot — woman walks forward speaking directly to camera — says 'Jeste li umorni od proteza koje ispadaju?' — soft natural window light, warm color grade, no background music"
}
```

Example product/cinematic scene:
```json
{
  "nbPrompt": "Polished titanium mini implant on clean white clinical surface, warm amber accent lighting, shallow depth of field, product photography, ACME Agency amber #D59C44 accent glow, photorealistic, 9:16 vertical",
  "prompt": "Cinematic slow 360° orbit around the implant — hero product reveal — slight camera drift, shallow depth of field — no speech, no music"
}
```

**Write the `script.json` file** to `ACME Agency/clients/<ClientName>/video-ads/<campaign-slug>/script.json`:

```json
{
  "client": "<ClientName>",
  "campaign": "<campaign-slug>",
  "adType": "mixed",
  "targetDuration": 30,
  "language": "Croatian",
  "voScript": "<Full narration text in order — every line of dialogue from every scene>",
  "scenes": [
    {
      "index": 1,
      "type": "ugc",
      "duration": 5,
      "aspectRatio": "9:16",
      "prompt": "<full Krea video animation prompt — what Kling should do with the image>",
      "nbPrompt": "<Nano Banana image prompt — detailed description of the start frame image to generate>",
      "dialogue": "<spoken text for this scene, or empty string>",
      "startImageUrl": null,
      "notes": "<optional human note for the editor>"
    },
    {
      "index": 2,
      "type": "product",
      "duration": 7,
      "aspectRatio": "9:16",
      "prompt": "<full Krea video animation prompt>",
      "nbPrompt": "<Nano Banana image prompt for this scene's start frame>",
      "dialogue": "",
      "startImageUrl": null,
      "notes": ""
    }
  ]
}
```

**`nbPrompt` vs `prompt` — IMPORTANT distinction:**
- `nbPrompt` = the **image** prompt. Nano Banana 2 generates a photorealistic start frame from this. Describe the static scene: who/what is in frame, composition, lighting, style. This is what the viewer sees at frame 0.
- `prompt` = the **animation** prompt. Kling 2.6 takes the generated image and animates it. Describe the motion: camera movement, hand gestures, object interactions, transitions. This is what HAPPENS in the clip.
- The pipeline generates the `nbPrompt` image first, then passes it to Kling as `startImage` for animation.
- **Always include `nbPrompt` for every scene.** This is the default workflow — it produces dramatically better results than text-to-video alone.

Example:
```json
{
  "nbPrompt": "Overhead view of a formal German salary document on oak desk, number 77.400 EUR clearly visible, warm office light, photorealistic, 9:16",
  "prompt": "Cinematic close-up — hands slowly unfold the salary document, camera racks focus to the number, slight camera push-in, warm golden light"
}
```

**Campaign slug format:** `<keyword>-<audience>-<YYYY-MM>` e.g. `<id>`

After writing script.json, **show the user a summary table** before proceeding:

```
Scene | Type     | Duration | Hook / Dialogue
------|----------|----------|------------------
  1   | UGC      |   5s     | "Jeste li umorni..."
  2   | Product  |   7s     | [titanium implant orbit]
  3   | UGC      |   8s     | "Za samo 2 sata..."
  4   | Product  |   5s     | [before/after closeup]
  5   | UGC      |   5s     | "Zovite nas danas."
```

Ask: "Proceed with video generation, or make changes to any scene?"

---

### Step 2b — Reference Images (Now Automated)

**Image generation is now handled automatically by the pipeline.** When a scene has `nbPrompt` set (which should be every scene), the script generates the Nano Banana 2 image and passes it to Kling as `startImage` — no manual step needed.

**You only need to manually set `startImageUrl` for:**
- **Animated scenes using real client photos** — look in `ACME Agency/clients/<ClientName>/product-images/`
- Or use the logo reference from `reference_assets.logo` in clients.json
- Upload to Drive and use the public URL

---

### Step 3–5 — Video, Voice, Package (run the script)

Once script.json is written and reference images are set, run the pipeline:

```bash
node ACME Agency/scripts/video_ads_generate.mjs "<ClientName>" \
  --script "ACME Agency/clients/<ClientName>/video-ads/<campaign>/script.json" \
  [--voice <voice_id_or_name>] \
  [--resolution 1080p] \
  [--draft] \
  [--persona-from-hero] \
  [--clone-voice] \
  [--no-slack] \
  [--no-drive] \
  [--no-voice]
```

**Image-to-video (default):** Every scene with `nbPrompt` automatically generates a Nano Banana 2 image first, then Kling animates it. No flag needed — this happens by default.

**`--persona-from-hero`:** After scene 1 renders, extracts a frame (ffmpeg) and uses it as a reference image for all subsequent Nano Banana prompts. Use for UGC ads where a consistent person must appear across scenes.

**`--clone-voice`:** Extracts audio from the hero clip (scene 1) and clones it via ElevenLabs Instant Voice Cloning. All scene voiceovers then use the cloned voice. Use when the hero clip has native audio with a character voice you want to match.

**`--provider krea|kling`:** Video generation provider (default: `krea`).
- `krea` — uses Krea proxy API with Kling 2.6 / Veo 3.1. Default, proven in production.
- `kling` — uses KlingAI direct API with Kling 3.0. Better motion quality, 3-15s durations, pro mode. Requires `KLING_ACCESS_KEY` + `KLING_SECRET_KEY` in `.env`.
- Image generation (Nano Banana 2) always stays on Krea regardless of provider.

**Voice selection:**
- If `client.elevenlabs_voice_id` is set in clients.json → pass it as `--voice`
- If not set: run `node ACME Agency/scripts/video_ads_generate.mjs "ClientName" --list-voices` first and inspect the output
- For Croatian scripts: `<id>` model handles pronunciation IF the text has correct diacritics (š, đ, č, ž, ć). Always verify diacritics in the script before running.
- Voice character matters for Croatian: prefer voices labeled "middle-aged", "warm", "conversational", or European-sounding names (Charlotte, Matilda, Freya, Liam). Avoid voices designed exclusively for American English.
- DO NOT assume any voice will "just work" for Croatian — test with `--no-drive --no-slack` on a single scene first if unsure.

**Draft mode (`--draft`):**
- Uses `google/veo-3.1-fast` for all scenes instead of Kling 2.6 / Veo 3.1
- Faster and cheaper — good for testing the script before a full run
- Output quality is lower but sufficient for script ACME Agencyw

**Resolution:**
- Default: `1080p` — always use for final delivery
- `720p` for draft/test runs to save credits

---

## Output Structure

```
ACME Agency/clients/<ClientName>/video-ads/<campaign>/
├── script.json                ← Claude-written scene script (input to the script)
├── clips/
│   ├── scene-01-ugc.mp4
│   ├── scene-02-product.mp4
│   └── ...
├── voiceover/
│   ├── voiceover-full.mp3     ← full narration track
│   ├── scene-01-vo.mp3        ← per-scene splits
│   └── ...
├── reference-images/
│   ├── character-front.png
│   └── character-34.png
└── capcut-guide.md            ← editing guide for CapCut/DaVinci

ACME Agency/clients/<ClientName>/video-ads/
└── video_ads_manifest.json    ← generation history
```

**Drive folder:** `<ClientDriveFolder>/Video Ads/<Year>/<Month>/<campaign>/`

---

## Tips

- **Test before full batch**: add `--draft --no-slack --no-drive` for a fast test run on 1 scene (modify script.json to 1 scene temporarily)
- **Re-run single scene**: edit script.json to only include that scene's entry, run again with `--no-drive` then merge manually
- **Voice not right?** Run `--list-voices`, try a different voice. For Croatian, any multilingual voice works — adjust stability (lower = more expressive, higher = more controlled)
- **Credit cost estimate**: Kling 2.6 ≈ 20–40 units per 5s clip; Veo 3.1 ≈ 80–120 units per 5s clip; ElevenLabs ≈ 1 credit per 1000 characters
- **Character continuity**: using the same `startImageUrl` for all UGC scenes is the strongest consistency mechanism. Generate the reference image first (Step 2b) and reuse it across all UGC scenes
- **Audio in UGC clips**: `generateAudio: true` is passed for ugc/animated scenes — Kling generates ambient sound/lip-sync audio native to the clip. ElevenLabs VO goes on top. When editing, mute the clip audio track and use only the ElevenLabs VO track for clean narration
- **Richer brand DNA = better prompts**: if results look off-brand, improve the `Video Character` and `Video Tone` sections and re-run

---

## Enhanced UGC Mode — Persona-Driven Pipeline

An advanced workflow for creating realistic UGC-style ads with consistent character identity across all scenes. Inspired by the "hero clip → persona reference → scene-by-scene animation" approach used by top e-commerce brands.

**When to use:** When the brief requires a consistent AI person across multiple scenes (testimonials, product ACME Agencyws, lifestyle content) and you want full creative control over every scene (vs. HeyGen's black-box approach).

### How It Works

1. **Scene 1 = Hero Clip** — generated via Kling 2.6 with `generateAudio: true`. This is the opening "talking to camera" hook (5–15s).
2. **Persona Extraction** (`--persona-from-hero`) — ffmpeg extracts a clear frame from the hero clip at 2s. This PNG becomes the character reference for all subsequent Nano Banana image prompts.
3. **Scene-by-Scene Images** — for scenes 2+, if `nbPrompt` is set in script.json, Nano Banana 2 generates each scene's start frame using the persona PNG as `imageUrls[0]`. This locks the character's face/appearance.
4. **Scene-by-Scene Animation** — Kling 2.6 animates each Nano Banana image via `startImage`, producing 5s clips.
5. **Voice Cloning** (`--clone-voice`) — ffmpeg extracts the audio from the hero clip. ElevenLabs clones this voice (Instant Voice Cloning). All subsequent scene voiceovers use the cloned voice for consistency.
6. **CapCut Assembly** — you get separate clips + per-scene VO files + a CapCut guide.

### Script Format for Persona Mode

Each scene in `script.json` can now include an `nbPrompt` field — the Nano Banana image prompt that will be generated with persona reference before video generation:

```json
{
  "scenes": [
    {
      "index": 1,
      "type": "ugc",
      "duration": 10,
      "prompt": "Handheld selfie — Croatian woman in 30s, just finished running, whips out phone, speaks to camera...",
      "dialogue": "Upravo sam trčala 5 kilometara i nisam umorna...",
      "startImageUrl": null,
      "notes": "Hero clip — persona source"
    },
    {
      "index": 2,
      "type": "ugc",
      "duration": 5,
      "prompt": "Animate the person in the image — she picks up the product bottle and looks at it with curiosity...",
      "nbPrompt": "The person in the reference image, same face and hair, standing in a modern kitchen, holding a green supplement bottle, warm morning light through window, photorealistic",
      "dialogue": "Ovaj napitak pijem svaki dan već tjedan dana...",
      "startImageUrl": null,
      "notes": "Scene image auto-generated from hero persona"
    }
  ]
}
```

### CLI Flags

```bash
node ACME Agency/scripts/video_ads_generate.mjs "ClientName" \
  --script path/to/script.json \
  --persona-from-hero \    # Extract frame from scene 1 → ref for all NB images
  --clone-voice \          # Clone voice from scene 1 audio → use for all VO
  [--no-drive] [--no-slack]
```

### Key Constraints

- **Kling 2.6 has no lip sync** — mouth movements are random. This is fine because ElevenLabs VO is overlaid and the b-roll scenes are action-focused, not direct-to-camera talking (except the hero clip).
- **Character drift** — Nano Banana + persona reference gets you ~80-90% consistency. Slight variations in hair, clothing, or skin tone may occur. The hero clip sets the tone; b-roll scenes are more forgiving.
- **Voice cloning needs ~10s of speech** — the hero clip should have at least 10s of clear spoken audio for good clone quality.
- **Kling 3.0 not yet on Krea** — currently using Kling 2.6 via Krea API. Kling 3.0 (better motion, multi-shot) is available via direct KlingAI API but not yet integrated. Upgrade path: add `kling/kling-3.0` to `VIDEO_MODELS` in `krea.mjs` when Krea supports it.

### Comparison: HeyGen vs Persona Pipeline

| | HeyGen (`/heygen-ad-generator`) | Persona Pipeline (`/video-ad-generator --persona-from-hero`) |
|---|---|---|
| **Control** | Low — HeyGen chooses b-roll, pacing, transitions | Full — you define every scene, image, and cut |
| **Output** | Single finished MP4 | Separate clips + VO → CapCut assembly |
| **Lip sync** | Excellent | None (VO overlay) |
| **Character consistency** | Handled by HeyGen | Persona reference + Nano Banana |
| **Editing time** | 0 min | 10-15 min in CapCut |
| **Best for** | Quick talking-head presenter ads | Cinematic UGC hybrids, lifestyle, testimonials |
