# /cinematic-ad-generator

> Generates **cinematic action-based video ads** with strong hooks, pattern interrupts, and character consistency.

# Skill: `/<id>`

## Overview

Generates **cinematic action-based video ads** with strong hooks, pattern interrupts, and character consistency. Characters NEVER talk to camera — they perform actions, react, and the ElevenLabs voiceover narrates over the visuals.

This is the **proven flow** for generating scroll-stopping Facebook/Meta video ads. Built after extensive testing of HeyGen, multi-shot, lip sync, and Veo — none of which produced as reliable results as this per-scene cinematic approach.

**What this skill produces:**
- 5-6 cinematic video clips (5s each) generated via Kling 3.0 image-to-video
- Each clip starts from a precision-crafted Nano Banana 2 image
- ElevenLabs German voiceover (per-scene MP3s + full track)
- Character consistency across scenes via NB2 reference image
- All files in Drive ready for CapCut assembly (5 min editing)

**What this skill does NOT do:**
- Talking heads / lip sync (intentionally — looks fake without audio API)
- Multi-shot mode (loses control over dramatic visuals)
- Final video stitching (CapCut step gives you full creative control)

---

## The Golden Rules

1. **NO talking-to-camera shots.** Characters do actions: throw papers, burn money, walk, gesture, react. The voiceover is the narrator, the visuals are the metaphor.

2. **Hook in the first 3 seconds.** Pattern interrupt — something visually unexpected. Burning money. Throwing papers. Slamming a laptop. Action that stops the scroll.

3. **Character consistency via NB2 reference.** Generate one strong character portrait first, reuse as `imageUrls` reference for all scenes featuring that character.

4. **Per-scene NB2 → Kling.** Each scene starts from its OWN precision-crafted NB2 image. This is the only way to get dramatic visuals (burning money, etc.) — multi-shot won't follow text-only prompts for action shots.

5. **Catbox for image URLs.** Always upload NB2 images to catbox.moe before passing to Kling — Krea CDN URLs are unreliable for Kling's image fetcher.

6. **ElevenLabs as separate track.** Kling direct API has no audio. ElevenLabs generates the German VO. CapCut overlays both.

---

## Preflight (run BEFORE any expensive Krea/Kling/ElevenLabs call)

A cinematic-ad batch is the most expensive video pipeline in this workspace (6 NB2 images + 6 Kling 3.0 videos + ElevenLabs VO). Validate everything BEFORE spending credits.

1. **Client exists** in `clients.json` (3-step cascade). Resolve canonical key.
2. **Required env vars**: `KREA_API_KEY`, `KLING_ACCESS_KEY`, `KLING_SECRET_KEY`, `ELEVENLABS_API_KEY` all set in `.env`. If any missing → abort, name the missing ones.
3. **Catbox.moe is reachable** — quick GET request. If down → abort with "catbox.moe upload host is down, retry later".
4. **`drive_folder_id` is present and reachable** via `listFiles()`.
5. **Slack channel resolves** if reporting enabled.
6. **Kling duration cap**: scene durations must be `5` or `10` for Kling 2.6, OR `3-15` for Kling 3.0. Reject anything else.
7. **Scene count is sane**: 4-8 scenes. More than 8 = burns through credits with no quality return.
8. **Language is supported by ElevenLabs** for `<id>` (HR, BS, DE, EN, ES, FR, IT, etc.). If unsupported → abort.
9. **Disk space** ≥ 1 GB free in client folder (videos are big).
10. **CLIENT.md exists** (or proceed with brief-only and explicitly note the gap).

If all checks pass, log "preflight: OK (n scenes × {duration}s each, est cost: ~Y compute units + ~Z s of VO)" and proceed.

---

## Step-by-Step Workflow

### Step 0 — Identify Client
Look up in `ACME Agency/clients/clients.json` (or 3-step cascade per Paradox CLAUDE.md). Extract:
- `drive_folder_id` (for upload)
- `slack_channel` (for report)
- `market` and language

### Step 1 — Brief Collection

Ask up to 5 questions (skip any the user already answered in their initial message):

1. **Target audience** — who exactly? (e.g. "Selbstständige earning >75K", "Angestellte 30-50 with chronic back pain")
2. **The offer / transformation** — what they get from acting
3. **Existing script?** — paste it, or skip (default: write one for them)
4. **Hook concept?** — specific visual idea (e.g. "burning money") or skip (default: propose 2-3)
5. **Target duration?** — 15 / 20 / 30 / 45 / 60 seconds (default: 30)

Detect language from `client.market` → Germany = de, Croatia/Bosnia = hr, etc.

### Step 2 — Write the Script (the most important step)

This step has TWO paths — either the user gave you a script, or you write one.

#### Path A: User provided a script

1. Parse it for: hook, problem, mechanism, solution, result, CTA
2. Validate word count vs target duration:
   - German/Croatian: `seconds × 2.4` words max
   - English: `seconds × 2.6` words max
3. If too long, trim — cut filler, not core message
4. Split into N scenes (see "Scene Count by Duration" table below)
5. **Skip to Step 2.5 (Approval Gate)**

#### Path B: No script provided — WRITE ONE using direct-response principles

**FIRST: Read `.claude/skills/copywrite/PRINCIPLES.md`** to load the 12 direct-response copywriting principles. This is non-negotiable — that file has the frameworks you need.

Then apply this **6-beat video ad framework** (maps directly to copywriting principles):

```
BEAT 1 — HOOK (Pattern Interrupt — Principle 2 hook type #4)
  → 3-7 word VO question that names the audience by their identity
  → Must stop the scroll in first 3 seconds
  → Examples: "Selbstständig in Deutschland?", "Verdienst du über 77.000 brutto?"

BEAT 2 — PROBLEM (specificity + emotion — Principles 3 + 4)
  → 8-12 word VO with concrete number, not vague pain
  → Fear of loss > desire for gain
  → Example: "Die GKV kostet dich jeden Monat über 900 Euro."

BEAT 3 — MECHANISM REVEAL (Principle 5 — your unique explanation)
  → 8-12 word VO that names WHY it works / what most people don't know
  → This is the differentiator — gives the viewer a reason to believe
  → Example: "Was die meisten nicht wissen: Dein Arbeitgeber zahlt die Hälfte mit."

BEAT 4 — PROOF / SOLUTION (specificity — Principle 4)
  → 8-12 word VO with the product/service in action + a number
  → Example: "ACME Agency prüft in 60 Sekunden, ob du sparen kannst."

BEAT 5 — RESULT (identity payoff — Principle 3)
  → 5-10 word VO showing the new state, the transformation
  → Example: "Bis zu 400 Euro weniger. Jeden Monat."

BEAT 6 — CTA (risk removal — Principle 8)
  → 5-8 word VO with the action + risk removal
  → Example: "Jetzt kostenlos prüfen — dauert 60 Sekunden."
```

**Total VO word count must equal:** `<id> × 2.4` (de/hr) or `× 2.6` (en).
- 15s ad → ~36 words / ~6 words per beat
- 30s ad → ~72 words / ~12 words per beat
- 60s ad → ~144 words / ~24 words per beat (use longer beats or add scenes)

**For longer ads (45s+):** Add an extra PROOF beat or a TESTIMONIAL beat between MECHANISM and RESULT.

**For shorter ads (15-20s):** Compress to 4 beats: HOOK + PROBLEM + SOLUTION + CTA.

### Anti-AI sweep (Principle 12) — REQUIRED before approval gate

Before showing the script to the user, scan it for and remove:
- Em-dashes mid-sentence (—) — replace with comma or period
- "It's not X, it's Y" patterns
- Emoji clusters
- Formulaic openings ("In a world where...", "Imagine...", "Discover...")
- Corporate filler ("iskoristite prednosti", "leverage", "unlock")
- Two-word sentence fragments used as paragraphs
- Anything that sounds translated rather than natively written

Read it aloud. If it sounds like an AI wrote it, rewrite it.

### Scene Count by Duration

| Target Duration | Scenes | Per-Scene Duration | Word Budget (de/hr) |
|-----------------|--------|--------------------|----------------------|
| 15s             | 3      | 5s each            | ~36 words            |
| 20s             | 4      | 5s each            | ~48 words            |
| 25s             | 5      | 5s each            | ~60 words            |
| **30s (default)** | **6**  | **5s each**        | **~72 words**        |
| 45s             | 9      | 5s each            | ~108 words           |
| 60s             | 12     | 5s each            | ~144 words           |

Kling 3.0 image-to-video produces 5s clips reliably. Keep all scenes uniform at 5s.

### Step 2.5 — Script Approval Gate (REQUIRED)

After writing the script (Path A or B), ALWAYS show the user this breakdown BEFORE generating any images:

```
═══════════════════════════════════════════════════════════════
SCRIPT — [Client Name] | [Audience] | [Duration]s | [N] scenes
═══════════════════════════════════════════════════════════════

Scene | Visual Hook                  | Voiceover ([Lang])
------|------------------------------|--------------------------------
  1   | Burning euro bills           | "Selbstständig in Deutschland?"
  2   | Frustrated at desk           | "Die GKV kostet dich..."
  3   | Phone shows comparison       | "ACME Agency prüft in 60 Sek..."
  4   | Walking into clinic          | "Sofort zum Facharzt..."
  5   | Relaxed, smiling             | "Bis zu 400 Euro weniger..."
  6   | Branded CTA card             | "Jetzt kostenlos prüfen."

Voice: Chris Norddeutscher (German)
VO word count: 68 / 72 max (within target)
═══════════════════════════════════════════════════════════════
```

Then ask:
> "Approve this script and proceed to image generation? Or any scene you want to revise?"

Wait for explicit approval. Do not spend Krea credits without it.

### Step 3 — Generate Character Reference Image (if needed)

If the ad features a recurring person, generate ONE strong NB2 portrait first:

```javascript
import { <id> } from './ACME Agency/scripts/lib/krea.mjs';

const result = await <id>({
  prompt: 'Portrait of a [age] [nationality] [gender], [hair], [outfit], [setting], confident eye contact, photorealistic, editorial portrait, 1:1',
  aspectRatio: '1:1',
  batchSize: 2,  // generate 2 to pick the best
  resolution: '2K',
});
// Pick the best one — show user, get confirmation
```

Save the chosen CDN URL — use it as `imageUrls[characterRef]` in later NB2 calls for character consistency.

### Step 4 — Generate 6 Scene Images via NB2

For EACH scene, write a precision NB2 prompt. Use these templates:

**Hook scene (action close-up):**
```
Extreme close-up macro shot of [DRAMATIC OBJECT/ACTION]. [SPECIFIC DETAILS]. 
Held in [hand/context]. Dark moody background, dramatic low-key lighting. 
Cinematic, intense, attention-grabbing. Shallow depth of field. 9:16 vertical
```

**Character scene (use character reference):**
```
The person in the reference image [SPECIFIC ACTION/EMOTION], [POSITION/POSE], 
[ENVIRONMENT DETAILS]. [LIGHTING]. [MOOD]. Photorealistic, 9:16 vertical
→ Pass character ref via imageUrls
```

**Product/UI scene (no character):**
```
[Subject — phone/equipment/product] showing [SPECIFIC UI/DETAIL]. 
[BRAND COLORS]. [LIGHTING]. Shallow depth of field. Photorealistic, 9:16 vertical
```

**CTA card (no character):**
```
Bold [brand color] solid background. Large [accent color] text reading 
[CTA TEXT] centered. Clean geometric heavy sans-serif. [Brand logo] below. 
Minimalist premium. 9:16 vertical
```

Generate all 6 sequentially via `<id>()`. Save locally to `ACME Agency/clients/<Client>/video-ads/<campaign>/reference-images/`.

### Step 5 — Upload Images to catbox.moe (CRITICAL)

Krea CDN URLs are unreliable for Kling's image fetcher. Always upload NB2 images to catbox first:

```javascript
import { uploadToCatbox } from './ACME Agency/scripts/lib/kling.mjs';

const catboxUrls = {};
for (const [name, localPath] of Object.entries(images)) {
  catboxUrls[name] = await uploadToCatbox(localPath, 'image/png');
}
// Save catbox URLs to a JSON file in the reference-images folder for re-use
```

### Step 6 — Animate Each Scene with Kling 3.0 Direct

For each scene, call `<id>()` from `ACME Agency/scripts/lib/kling.mjs`:

```javascript
import { <id> } from './ACME Agency/scripts/lib/kling.mjs';

for (const scene of scenes) {
  const videoPath = await <id>({
    prompt: scene.animationPrompt,  // describes MOTION, not the static frame
    model: 'kling-v3',
    aspectRatio: '9:16',
    duration: 5,
    startImage: catboxUrls[scene.name],
  });
  await copyFile(videoPath, `clips/${scene.name}.mp4`);
}
```

**Animation prompt rules:**
- Describe MOTION, not the scene (the scene is already in the start image)
- Examples: "The flames intensify and crawl up the bills", "The man leans forward with sudden interest", "Slow cinematic dolly through the equipment"
- Do NOT re-describe what's in the image — Kling already sees it

### Step 7 — Generate ElevenLabs Voice (auto-selected by language)

**Voice selection priority** (highest first):
1. `--voice <id|name>` CLI flag (user override)
2. `client.elevenlabs_voice_id` from clients.json
3. Auto-selection by client language

**Voice by language table** (verified IDs from current ElevenLabs account):

| Language | Voice ID | Name | Notes |
|----------|----------|------|-------|
| `de` (German) | `j46AY0iVY3oHcnZbgEJg` | Chris Norddeutscher | North German pro, authoritative |
| `de` (alt) | `TUKJhQmz3RPYBNAgC5A1` | Clark Clear | German pro, alternative |
| `de` (alt) | `DtAQqD4yK3kXSVPx7wFc` | Pascal R | German narrator/storyteller |
| `hr` (Croatian) | `ZLYZToA7aDsMbHwM9AOr` | Luka | Croatian male, calm |
| `hr` (alt) | `FXFcxnjikw0naYO1PPrU` | Adnan | Croatian male, casual |
| `en` (English) | `JBFqnCBsd6RMkjVDRZzb` | George | British storyteller, premade (free) |
| `en` (alt) | `EXAVITQu4vr4xnSDxMaL` | Sarah | American female, mature, premade |

Note: German + Croatian voices are "professional" and require a paid ElevenLabs plan. English premade voices work on free plan.

```javascript
import { <id>, generateSpeech } from './ACME Agency/scripts/lib/elevenlabs.mjs';

// Auto-select voice from language (or override)
const VOICE_BY_LANG = {
  de: 'j46AY0iVY3oHcnZbgEJg',  // Chris Norddeutscher
  hr: 'ZLYZToA7aDsMbHwM9AOr',  // Luka
  en: 'JBFqnCBsd6RMkjVDRZzb',  // George
};

// Detect language from client.market or script.language
const lang = client.market === 'Germany' ? 'de'
           : client.market === 'Croatia' || client.market === 'Bosnia' ? 'hr'
           : 'en';

const voiceId = client.elevenlabs_voice_id || VOICE_BY_LANG[lang];

// Conversational settings (works for all 3 languages)
const voiceSettings = { stability: 0.4, similarityBoost: 0.85, style: 0.1 };

await <id>({ scenes, voiceId, outputDir, voiceSettings });

// Also generate full track
const fullScript = scenes.map(s => s.text).join(' ');
await generateSpeech({ text: fullScript, voiceId, destPath: 'voiceover-full.mp3', ...voiceSettings });
```

### Step 8 — Upload Everything to Drive

Folder structure: `Klijenti/<Client>/Video Ads/<Year>/<Month>/<campaign>/`

Upload: clips, voiceover MP3s, reference-images PNGs.

### Step 9 — Post Slack Report

**This is the ONLY Slack message for the entire execution.** Do NOT post scene-by-scene status, script breakdowns, retry notifications, or any intermediate updates to Slack during Steps 1-8. All progress goes to stdout (console.log) only. The team reads this one final report, not a play-by-play.

Incident reference: 2026-04-10 ACME Agency — the subprocess posted 8+ separate messages to the client channel during execution. Do not repeat.

Use the standard format:
- Campaign name + duration + scene count
- Brief script overview
- Drive folder link
- "Next step: import to CapCut, overlay VO, add music + SFX, export"

## Verification (run AFTER Step 9 — confirm all assets actually shipped)

Cinematic ads have many moving parts that can silently fail. Check ALL of these before declaring done:

- [ ] Each scene has a `clips/scene-NN-*.mp4` file locally AND in Drive (count = scene count from script, no missing scenes)
- [ ] Each MP4 file size > 100 KB (Kling sometimes returns broken stubs)
- [ ] Voiceover MP3 exists locally AND in Drive AND duration ≈ target (within ±20%)
- [ ] Reference images PNGs uploaded if generated (so user can see source frames)
- [ ] `capcut-guide.md` written and uploaded
- [ ] `manifest.json` records: scene count, durations, prompts used, Kling task IDs, VO voice ID, language, Drive URLs
- [ ] Slack post via slack-reporter returned `ts` non-null
- [ ] No silent Krea→Kling fallback unless explicitly logged (the fallback is fine, but it must be reported, not hidden)
- [ ] Kling tasks all reported `succeed` status (not `failed` or `unknown` swallowed)

If any scene failed mid-batch, list the failed scene number(s) explicitly in the report. Never claim a 6-scene success when only 5 actually rendered.

---

## CapCut Assembly Guide (delivered to user)

Include this in the Drive folder as `capcut-guide.md`:

```
1. Import all 6 clips from clips/ folder in order
2. Drop voiceover-full.mp3 on audio track 2
3. Adjust clip timing if VO doesn't perfectly align
4. Audio track 3: search CapCut music library for [mood] background music, set to -18dB
5. Audio track 4: SFX from CapCut library:
   - Scene 1: [relevant SFX — fire, paper rustling, etc.]
   - Scene N: [...]
6. Captions → Auto Captions → German → ACME Agencyw
7. Export 1080×1920, MP4, H.264
```

---

## Critical Files

- `ACME Agency/scripts/lib/krea.mjs` — Krea.ai NB2 image generation
- `ACME Agency/scripts/lib/kling.mjs` — Kling 3.0 direct API + catbox upload
- `ACME Agency/scripts/lib/elevenlabs.mjs` — TTS voiceover
- `ACME Agency/scripts/lib/google_drive.mjs` — Drive upload
- `ACME Agency/scripts/lib/slack.mjs` — Slack reporting
- `shared/kling_camera_toolkit.md` — 30 cinematic camera movements vocabulary

## Reference Examples (proven campaigns)

- `ACME Agency/clients/ACME Agency/video-ads/<id>/` — burning money hook (Selbstständige)
- `ACME Agency/clients/ACME Agency/video-ads/<id>/` — throwing papers hook (Angestellte 77K+)

Both campaigns: 6 scenes, 30s, character consistency, German VO, ready for CapCut.

---

## Hard Constraints

| Rule | Why |
|------|-----|
| **Always use Kling 3.0 direct (`kling-v3`) for video** | Veo 3.1 doesn't support image-to-video. Kling 2.6 via Krea is unstable. |
| **Always upload NB2 images to catbox before Kling** | Krea CDN URLs fail unpredictably for Kling's fetcher |
| **Never use `--mode multishot` for action ads** | Multi-shot can't render dramatic single-frame hooks (burning money etc.) |
| **Never use lip sync endpoint** | Adds black artifacts in mouth area |
| **Never describe characters as "talking to camera"** | No audio API = looks fake. Always action-based shots. |
| **Auto-select voice from client.market (de/hr/en)** | See "Voice by language table" — overridable with `--voice` flag |
| **Always show script approval gate before image generation** | Krea credits cost money. 30-second confirmation saves rework |
| **Max 12 scenes / 60s total** | Beyond this, viewers drop off. 30s is the sweet spot for Meta. |

---

## Why This Skill Exists

Built after testing every alternative:

| Approach | Problem |
|----------|---------|
| HeyGen avatar | Black-box, no creative control, character can't do dramatic actions |
| Kling multi-shot | Can't render dramatic single-frame visuals like burning money |
| Kling direct UGC + lip sync | Lip sync adds artifacts (black spots in mouth) |
| Veo 3.1 via Krea | Doesn't support image-to-video — text-only |
| Per-scene NB2 → Kling 3.0 + ElevenLabs VO | **WORKS** — what this skill does |

This skill is the answer to the question: "How do I generate scroll-stopping cinematic Facebook ads with hooks that actually work?"