[ CREATIVE ]
/cinematic-ad-generator
Generates **cinematic action-based video ads** with strong hooks, pattern interrupts, and character consistency.
ACME Agency, <id> and you@example.com mark values that are per-agency — your install fills them with YOUR clients and accounts. If a section references a helper script you don't have yet, it ships with that workflow's install.Skill: /<id>
Overview
Generates cinematic action-based video ads with strong hooks, pattern interrupts, and character consistency. Characters NEVER talk to camera — they perform actions, react, and the ElevenLabs voiceover narrates over the visuals.
This is the proven flow for generating scroll-stopping Facebook/Meta video ads. Built after extensive testing of HeyGen, multi-shot, lip sync, and Veo — none of which produced as reliable results as this per-scene cinematic approach.
What this skill produces:
- 5-6 cinematic video clips (5s each) generated via Kling 3.0 image-to-video
- Each clip starts from a precision-crafted Nano Banana 2 image
- ElevenLabs German voiceover (per-scene MP3s + full track)
- Character consistency across scenes via NB2 reference image
- All files in Drive ready for CapCut assembly (5 min editing)
What this skill does NOT do:
- Talking heads / lip sync (intentionally — looks fake without audio API)
- Multi-shot mode (loses control over dramatic visuals)
- Final video stitching (CapCut step gives you full creative control)
The Golden Rules
- NO talking-to-camera shots. Characters do actions: throw papers, burn money, walk, gesture, react. The voiceover is the narrator, the visuals are the metaphor.
- Hook in the first 3 seconds. Pattern interrupt — something visually unexpected. Burning money. Throwing papers. Slamming a laptop. Action that stops the scroll.
- Character consistency via NB2 reference. Generate one strong character portrait first, reuse as
imageUrlsreference for all scenes featuring that character.
- Per-scene NB2 → Kling. Each scene starts from its OWN precision-crafted NB2 image. This is the only way to get dramatic visuals (burning money, etc.) — multi-shot won't follow text-only prompts for action shots.
- Catbox for image URLs. Always upload NB2 images to catbox.moe before passing to Kling — Krea CDN URLs are unreliable for Kling's image fetcher.
- ElevenLabs as separate track. Kling direct API has no audio. ElevenLabs generates the German VO. CapCut overlays both.
Preflight (run BEFORE any expensive Krea/Kling/ElevenLabs call)
A cinematic-ad batch is the most expensive video pipeline in this workspace (6 NB2 images + 6 Kling 3.0 videos + ElevenLabs VO). Validate everything BEFORE spending credits.
- Client exists in
clients.json(3-step cascade). Resolve canonical key. - Required env vars:
KREA_API_KEY,KLING_ACCESS_KEY,KLING_SECRET_KEY,ELEVENLABS_API_KEYall set in.env. If any missing → abort, name the missing ones. - Catbox.moe is reachable — quick GET request. If down → abort with "catbox.moe upload host is down, retry later".
drive_folder_idis present and reachable vialistFiles().- Slack channel resolves if reporting enabled.
- Kling duration cap: scene durations must be
5or10for Kling 2.6, OR3-15for Kling 3.0. Reject anything else. - Scene count is sane: 4-8 scenes. More than 8 = burns through credits with no quality return.
- Language is supported by ElevenLabs for
<id>(HR, BS, DE, EN, ES, FR, IT, etc.). If unsupported → abort. - Disk space ≥ 1 GB free in client folder (videos are big).
- CLIENT.md exists (or proceed with brief-only and explicitly note the gap).
If all checks pass, log "preflight: OK (n scenes × {duration}s each, est cost: ~Y compute units + ~Z s of VO)" and proceed.
Step-by-Step Workflow
Step 0 — Identify Client
Look up in ACME Agency/clients/clients.json (or 3-step cascade per Paradox CLAUDE.md). Extract:
drive_folder_id(for upload)slack_channel(for report)marketand language
Step 1 — Brief Collection
Ask up to 5 questions (skip any the user already answered in their initial message):
- Target audience — who exactly? (e.g. "Selbstständige earning >75K", "Angestellte 30-50 with chronic back pain")
- The offer / transformation — what they get from acting
- Existing script? — paste it, or skip (default: write one for them)
- Hook concept? — specific visual idea (e.g. "burning money") or skip (default: propose 2-3)
- Target duration? — 15 / 20 / 30 / 45 / 60 seconds (default: 30)
Detect language from client.market → Germany = de, Croatia/Bosnia = hr, etc.
Step 2 — Write the Script (the most important step)
This step has TWO paths — either the user gave you a script, or you write one.
Path A: User provided a script
- Parse it for: hook, problem, mechanism, solution, result, CTA
- Validate word count vs target duration:
- German/Croatian:
seconds × 2.4words max - English:
seconds × 2.6words max
- If too long, trim — cut filler, not core message
- Split into N scenes (see "Scene Count by Duration" table below)
- Skip to Step 2.5 (Approval Gate)
Path B: No script provided — WRITE ONE using direct-response principles
FIRST: Read .claude/skills/copywrite/PRINCIPLES.md to load the 12 direct-response copywriting principles. This is non-negotiable — that file has the frameworks you need.
Then apply this 6-beat video ad framework (maps directly to copywriting principles):
BEAT 1 — HOOK (Pattern Interrupt — Principle 2 hook type #4)
→ 3-7 word VO question that names the audience by their identity
→ Must stop the scroll in first 3 seconds
→ Examples: "Selbstständig in Deutschland?", "Verdienst du über 77.000 brutto?"
BEAT 2 — PROBLEM (specificity + emotion — Principles 3 + 4)
→ 8-12 word VO with concrete number, not vague pain
→ Fear of loss > desire for gain
→ Example: "Die GKV kostet dich jeden Monat über 900 Euro."
BEAT 3 — MECHANISM REVEAL (Principle 5 — your unique explanation)
→ 8-12 word VO that names WHY it works / what most people don't know
→ This is the differentiator — gives the viewer a reason to believe
→ Example: "Was die meisten nicht wissen: Dein Arbeitgeber zahlt die Hälfte mit."
BEAT 4 — PROOF / SOLUTION (specificity — Principle 4)
→ 8-12 word VO with the product/service in action + a number
→ Example: "ACME Agency prüft in 60 Sekunden, ob du sparen kannst."
BEAT 5 — RESULT (identity payoff — Principle 3)
→ 5-10 word VO showing the new state, the transformation
→ Example: "Bis zu 400 Euro weniger. Jeden Monat."
BEAT 6 — CTA (risk removal — Principle 8)
→ 5-8 word VO with the action + risk removal
→ Example: "Jetzt kostenlos prüfen — dauert 60 Sekunden."
Total VO word count must equal: <id> × 2.4 (de/hr) or × 2.6 (en).
- 15s ad → ~36 words / ~6 words per beat
- 30s ad → ~72 words / ~12 words per beat
- 60s ad → ~144 words / ~24 words per beat (use longer beats or add scenes)
For longer ads (45s+): Add an extra PROOF beat or a TESTIMONIAL beat between MECHANISM and RESULT.
For shorter ads (15-20s): Compress to 4 beats: HOOK + PROBLEM + SOLUTION + CTA.
Anti-AI sweep (Principle 12) — REQUIRED before approval gate
Before showing the script to the user, scan it for and remove:
- Em-dashes mid-sentence (—) — replace with comma or period
- "It's not X, it's Y" patterns
- Emoji clusters
- Formulaic openings ("In a world where...", "Imagine...", "Discover...")
- Corporate filler ("iskoristite prednosti", "leverage", "unlock")
- Two-word sentence fragments used as paragraphs
- Anything that sounds translated rather than natively written
Read it aloud. If it sounds like an AI wrote it, rewrite it.
Scene Count by Duration
| Target Duration | Scenes | Per-Scene Duration | Word Budget (de/hr) |
|---|---|---|---|
| 15s | 3 | 5s each | ~36 words |
| 20s | 4 | 5s each | ~48 words |
| 25s | 5 | 5s each | ~60 words |
| 30s (default) | 6 | 5s each | ~72 words |
| 45s | 9 | 5s each | ~108 words |
| 60s | 12 | 5s each | ~144 words |
Kling 3.0 image-to-video produces 5s clips reliably. Keep all scenes uniform at 5s.
Step 2.5 — Script Approval Gate (REQUIRED)
After writing the script (Path A or B), ALWAYS show the user this breakdown BEFORE generating any images:
═══════════════════════════════════════════════════════════════
SCRIPT — [Client Name] | [Audience] | [Duration]s | [N] scenes
═══════════════════════════════════════════════════════════════
Scene | Visual Hook | Voiceover ([Lang])
------|------------------------------|--------------------------------
1 | Burning euro bills | "Selbstständig in Deutschland?"
2 | Frustrated at desk | "Die GKV kostet dich..."
3 | Phone shows comparison | "ACME Agency prüft in 60 Sek..."
4 | Walking into clinic | "Sofort zum Facharzt..."
5 | Relaxed, smiling | "Bis zu 400 Euro weniger..."
6 | Branded CTA card | "Jetzt kostenlos prüfen."
Voice: Chris Norddeutscher (German)
VO word count: 68 / 72 max (within target)
═══════════════════════════════════════════════════════════════
Then ask:
"Approve this script and proceed to image generation? Or any scene you want to revise?"
Wait for explicit approval. Do not spend Krea credits without it.
Step 3 — Generate Character Reference Image (if needed)
If the ad features a recurring person, generate ONE strong NB2 portrait first:
import { <id> } from './ACME Agency/scripts/lib/krea.mjs';
const result = await <id>({
prompt: 'Portrait of a [age] [nationality] [gender], [hair], [outfit], [setting], confident eye contact, photorealistic, editorial portrait, 1:1',
aspectRatio: '1:1',
batchSize: 2, // generate 2 to pick the best
resolution: '2K',
});
// Pick the best one — show user, get confirmation
Save the chosen CDN URL — use it as imageUrls[characterRef] in later NB2 calls for character consistency.
Step 4 — Generate 6 Scene Images via NB2
For EACH scene, write a precision NB2 prompt. Use these templates:
Hook scene (action close-up):
Extreme close-up macro shot of [DRAMATIC OBJECT/ACTION]. [SPECIFIC DETAILS].
Held in [hand/context]. Dark moody background, dramatic low-key lighting.
Cinematic, intense, attention-grabbing. Shallow depth of field. 9:16 vertical
Character scene (use character reference):
The person in the reference image [SPECIFIC ACTION/EMOTION], [POSITION/POSE],
[ENVIRONMENT DETAILS]. [LIGHTING]. [MOOD]. Photorealistic, 9:16 vertical
→ Pass character ref via imageUrls
Product/UI scene (no character):
[Subject — phone/equipment/product] showing [SPECIFIC UI/DETAIL].
[BRAND COLORS]. [LIGHTING]. Shallow depth of field. Photorealistic, 9:16 vertical
CTA card (no character):
Bold [brand color] solid background. Large [accent color] text reading
[CTA TEXT] centered. Clean geometric heavy sans-serif. [Brand logo] below.
Minimalist premium. 9:16 vertical
Generate all 6 sequentially via <id>(). Save locally to ACME Agency/clients/<Client>/video-ads/<campaign>/reference-images/.
Step 5 — Upload Images to catbox.moe (CRITICAL)
Krea CDN URLs are unreliable for Kling's image fetcher. Always upload NB2 images to catbox first:
import { uploadToCatbox } from './ACME Agency/scripts/lib/kling.mjs';
const catboxUrls = {};
for (const [name, localPath] of Object.entries(images)) {
catboxUrls[name] = await uploadToCatbox(localPath, 'image/png');
}
// Save catbox URLs to a JSON file in the reference-images folder for re-use
Step 6 — Animate Each Scene with Kling 3.0 Direct
For each scene, call <id>() from ACME Agency/scripts/lib/kling.mjs:
import { <id> } from './ACME Agency/scripts/lib/kling.mjs';
for (const scene of scenes) {
const videoPath = await <id>({
prompt: scene.animationPrompt, // describes MOTION, not the static frame
model: 'kling-v3',
aspectRatio: '9:16',
duration: 5,
startImage: catboxUrls[scene.name],
});
await copyFile(videoPath, `clips/${scene.name}.mp4`);
}
Animation prompt rules:
- Describe MOTION, not the scene (the scene is already in the start image)
- Examples: "The flames intensify and crawl up the bills", "The man leans forward with sudden interest", "Slow cinematic dolly through the equipment"
- Do NOT re-describe what's in the image — Kling already sees it
Step 7 — Generate ElevenLabs Voice (auto-selected by language)
Voice selection priority (highest first):
--voice <id|name>CLI flag (user override)client.elevenlabs_voice_idfrom clients.json- Auto-selection by client language
Voice by language table (verified IDs from current ElevenLabs account):
| Language | Voice ID | Name | Notes |
|---|---|---|---|
de (German) | j46AY0iVY3oHcnZbgEJg | Chris Norddeutscher | North German pro, authoritative |
de (alt) | TUKJhQmz3RPYBNAgC5A1 | Clark Clear | German pro, alternative |
de (alt) | DtAQqD4yK3kXSVPx7wFc | Pascal R | German narrator/storyteller |
hr (Croatian) | ZLYZToA7aDsMbHwM9AOr | Luka | Croatian male, calm |
hr (alt) | FXFcxnjikw0naYO1PPrU | Adnan | Croatian male, casual |
en (English) | JBFqnCBsd6RMkjVDRZzb | George | British storyteller, premade (free) |
en (alt) | EXAVITQu4vr4xnSDxMaL | Sarah | American female, mature, premade |
Note: German + Croatian voices are "professional" and require a paid ElevenLabs plan. English premade voices work on free plan.
import { <id>, generateSpeech } from './ACME Agency/scripts/lib/elevenlabs.mjs';
// Auto-select voice from language (or override)
const VOICE_BY_LANG = {
de: 'j46AY0iVY3oHcnZbgEJg', // Chris Norddeutscher
hr: 'ZLYZToA7aDsMbHwM9AOr', // Luka
en: 'JBFqnCBsd6RMkjVDRZzb', // George
};
// Detect language from client.market or script.language
const lang = client.market === 'Germany' ? 'de'
: client.market === 'Croatia' || client.market === 'Bosnia' ? 'hr'
: 'en';
const voiceId = client.elevenlabs_voice_id || VOICE_BY_LANG[lang];
// Conversational settings (works for all 3 languages)
const voiceSettings = { stability: 0.4, similarityBoost: 0.85, style: 0.1 };
await <id>({ scenes, voiceId, outputDir, voiceSettings });
// Also generate full track
const fullScript = scenes.map(s => s.text).join(' ');
await generateSpeech({ text: fullScript, voiceId, destPath: 'voiceover-full.mp3', ...voiceSettings });
Step 8 — Upload Everything to Drive
Folder structure: Klijenti/<Client>/Video Ads/<Year>/<Month>/<campaign>/
Upload: clips, voiceover MP3s, reference-images PNGs.
Step 9 — Post Slack Report
This is the ONLY Slack message for the entire execution. Do NOT post scene-by-scene status, script breakdowns, retry notifications, or any intermediate updates to Slack during Steps 1-8. All progress goes to stdout (console.log) only. The team reads this one final report, not a play-by-play.
Incident reference: 2026-04-10 ACME Agency — the subprocess posted 8+ separate messages to the client channel during execution. Do not repeat.
Use the standard format:
- Campaign name + duration + scene count
- Brief script overview
- Drive folder link
- "Next step: import to CapCut, overlay VO, add music + SFX, export"
Verification (run AFTER Step 9 — confirm all assets actually shipped)
Cinematic ads have many moving parts that can silently fail. Check ALL of these before declaring done:
- [ ] Each scene has a
clips/scene-NN-*.mp4file locally AND in Drive (count = scene count from script, no missing scenes) - [ ] Each MP4 file size > 100 KB (Kling sometimes returns broken stubs)
- [ ] Voiceover MP3 exists locally AND in Drive AND duration ≈ target (within ±20%)
- [ ] Reference images PNGs uploaded if generated (so user can see source frames)
- [ ]
capcut-guide.mdwritten and uploaded - [ ]
manifest.jsonrecords: scene count, durations, prompts used, Kling task IDs, VO voice ID, language, Drive URLs - [ ] Slack post via slack-reporter returned
tsnon-null - [ ] No silent Krea→Kling fallback unless explicitly logged (the fallback is fine, but it must be reported, not hidden)
- [ ] Kling tasks all reported
succeedstatus (notfailedorunknownswallowed)
If any scene failed mid-batch, list the failed scene number(s) explicitly in the report. Never claim a 6-scene success when only 5 actually rendered.
CapCut Assembly Guide (delivered to user)
Include this in the Drive folder as capcut-guide.md:
1. Import all 6 clips from clips/ folder in order
2. Drop voiceover-full.mp3 on audio track 2
3. Adjust clip timing if VO doesn't perfectly align
4. Audio track 3: search CapCut music library for [mood] background music, set to -18dB
5. Audio track 4: SFX from CapCut library:
- Scene 1: [relevant SFX — fire, paper rustling, etc.]
- Scene N: [...]
6. Captions → Auto Captions → German → ACME Agencyw
7. Export 1080×1920, MP4, H.264
Critical Files
ACME Agency/scripts/lib/krea.mjs— Krea.ai NB2 image generationACME Agency/scripts/lib/kling.mjs— Kling 3.0 direct API + catbox uploadACME Agency/scripts/lib/elevenlabs.mjs— TTS voiceoverACME Agency/scripts/lib/google_drive.mjs— Drive uploadACME Agency/scripts/lib/slack.mjs— Slack reportingshared/kling_camera_toolkit.md— 30 cinematic camera movements vocabulary
Reference Examples (proven campaigns)
ACME Agency/clients/ACME Agency/video-ads/<id>/— burning money hook (Selbstständige)ACME Agency/clients/ACME Agency/video-ads/<id>/— throwing papers hook (Angestellte 77K+)
Both campaigns: 6 scenes, 30s, character consistency, German VO, ready for CapCut.
Hard Constraints
| Rule | Why |
|---|---|
Always use Kling 3.0 direct (kling-v3) for video | Veo 3.1 doesn't support image-to-video. Kling 2.6 via Krea is unstable. |
| Always upload NB2 images to catbox before Kling | Krea CDN URLs fail unpredictably for Kling's fetcher |
Never use --mode multishot for action ads | Multi-shot can't render dramatic single-frame hooks (burning money etc.) |
| Never use lip sync endpoint | Adds black artifacts in mouth area |
| Never describe characters as "talking to camera" | No audio API = looks fake. Always action-based shots. |
| Auto-select voice from client.market (de/hr/en) | See "Voice by language table" — overridable with --voice flag |
| Always show script approval gate before image generation | Krea credits cost money. 30-second confirmation saves rework |
| Max 12 scenes / 60s total | Beyond this, viewers drop off. 30s is the sweet spot for Meta. |
Why This Skill Exists
Built after testing every alternative:
| Approach | Problem |
|---|---|
| HeyGen avatar | Black-box, no creative control, character can't do dramatic actions |
| Kling multi-shot | Can't render dramatic single-frame visuals like burning money |
| Kling direct UGC + lip sync | Lip sync adds artifacts (black spots in mouth) |
| Veo 3.1 via Krea | Doesn't support image-to-video — text-only |
| Per-scene NB2 → Kling 3.0 + ElevenLabs VO | WORKS — what this skill does |
This skill is the answer to the question: "How do I generate scroll-stopping cinematic Facebook ads with hooks that actually work?"