PORTAL / LIBRARY / cinematic-ad-generator

[ CREATIVE ]

`/cinematic-ad-generator`

Generates **cinematic action-based video ads** with strong hooks, pattern interrupts, and character consistency.

Placeholders like ACME Agency, <id> and you@example.com mark values that are per-agency — your install fills them with YOUR clients and accounts. If a section references a helper script you don't have yet, it ships with that workflow's install.

Skill: `/<id>`

Overview

Generates cinematic action-based video ads with strong hooks, pattern interrupts, and character consistency. Characters NEVER talk to camera — they perform actions, react, and the ElevenLabs voiceover narrates over the visuals.

This is the proven flow for generating scroll-stopping Facebook/Meta video ads. Built after extensive testing of HeyGen, multi-shot, lip sync, and Veo — none of which produced as reliable results as this per-scene cinematic approach.

What this skill produces:

5-6 cinematic video clips (5s each) generated via Kling 3.0 image-to-video
Each clip starts from a precision-crafted Nano Banana 2 image
ElevenLabs German voiceover (per-scene MP3s + full track)
Character consistency across scenes via NB2 reference image
All files in Drive ready for CapCut assembly (5 min editing)

What this skill does NOT do:

Talking heads / lip sync (intentionally — looks fake without audio API)
Multi-shot mode (loses control over dramatic visuals)
Final video stitching (CapCut step gives you full creative control)

The Golden Rules

NO talking-to-camera shots. Characters do actions: throw papers, burn money, walk, gesture, react. The voiceover is the narrator, the visuals are the metaphor.

Hook in the first 3 seconds. Pattern interrupt — something visually unexpected. Burning money. Throwing papers. Slamming a laptop. Action that stops the scroll.

Character consistency via NB2 reference. Generate one strong character portrait first, reuse as imageUrls reference for all scenes featuring that character.

Per-scene NB2 → Kling. Each scene starts from its OWN precision-crafted NB2 image. This is the only way to get dramatic visuals (burning money, etc.) — multi-shot won't follow text-only prompts for action shots.

Catbox for image URLs. Always upload NB2 images to catbox.moe before passing to Kling — Krea CDN URLs are unreliable for Kling's image fetcher.

ElevenLabs as separate track. Kling direct API has no audio. ElevenLabs generates the German VO. CapCut overlays both.

Preflight (run BEFORE any expensive Krea/Kling/ElevenLabs call)

A cinematic-ad batch is the most expensive video pipeline in this workspace (6 NB2 images + 6 Kling 3.0 videos + ElevenLabs VO). Validate everything BEFORE spending credits.

Client exists in clients.json (3-step cascade). Resolve canonical key.
Required env vars: KREA_API_KEY, KLING_ACCESS_KEY, KLING_SECRET_KEY, ELEVENLABS_API_KEY all set in .env. If any missing → abort, name the missing ones.
Catbox.moe is reachable — quick GET request. If down → abort with "catbox.moe upload host is down, retry later".
drive_folder_id is present and reachable via listFiles().
Slack channel resolves if reporting enabled.
Kling duration cap: scene durations must be 5 or 10 for Kling 2.6, OR 3-15 for Kling 3.0. Reject anything else.
Scene count is sane: 4-8 scenes. More than 8 = burns through credits with no quality return.
Language is supported by ElevenLabs for <id> (HR, BS, DE, EN, ES, FR, IT, etc.). If unsupported → abort.
Disk space ≥ 1 GB free in client folder (videos are big).
CLIENT.md exists (or proceed with brief-only and explicitly note the gap).

If all checks pass, log "preflight: OK (n scenes × {duration}s each, est cost: ~Y compute units + ~Z s of VO)" and proceed.

Step-by-Step Workflow

Step 0 — Identify Client

Look up in ACME Agency/clients/clients.json (or 3-step cascade per Paradox CLAUDE.md). Extract:

drive_folder_id (for upload)
slack_channel (for report)
market and language

Step 1 — Brief Collection

Ask up to 5 questions (skip any the user already answered in their initial message):

Target audience — who exactly? (e.g. "Selbstständige earning >75K", "Angestellte 30-50 with chronic back pain")
The offer / transformation — what they get from acting
Existing script? — paste it, or skip (default: write one for them)
Hook concept? — specific visual idea (e.g. "burning money") or skip (default: propose 2-3)
Target duration? — 15 / 20 / 30 / 45 / 60 seconds (default: 30)

Detect language from client.market → Germany = de, Croatia/Bosnia = hr, etc.

Step 2 — Write the Script (the most important step)

This step has TWO paths — either the user gave you a script, or you write one.

Path A: User provided a script

Parse it for: hook, problem, mechanism, solution, result, CTA
Validate word count vs target duration:

German/Croatian: seconds × 2.4 words max
English: seconds × 2.6 words max

If too long, trim — cut filler, not core message
Split into N scenes (see "Scene Count by Duration" table below)
Skip to Step 2.5 (Approval Gate)

Path B: No script provided — WRITE ONE using direct-response principles

FIRST: Read .claude/skills/copywrite/PRINCIPLES.md to load the 12 direct-response copywriting principles. This is non-negotiable — that file has the frameworks you need.

Then apply this 6-beat video ad framework (maps directly to copywriting principles):

BEAT 1 — HOOK (Pattern Interrupt — Principle 2 hook type #4)
  → 3-7 word VO question that names the audience by their identity
  → Must stop the scroll in first 3 seconds
  → Examples: "Selbstständig in Deutschland?", "Verdienst du über 77.000 brutto?"

BEAT 2 — PROBLEM (specificity + emotion — Principles 3 + 4)
  → 8-12 word VO with concrete number, not vague pain
  → Fear of loss > desire for gain
  → Example: "Die GKV kostet dich jeden Monat über 900 Euro."

BEAT 3 — MECHANISM REVEAL (Principle 5 — your unique explanation)
  → 8-12 word VO that names WHY it works / what most people don't know
  → This is the differentiator — gives the viewer a reason to believe
  → Example: "Was die meisten nicht wissen: Dein Arbeitgeber zahlt die Hälfte mit."

BEAT 4 — PROOF / SOLUTION (specificity — Principle 4)
  → 8-12 word VO with the product/service in action + a number
  → Example: "ACME Agency prüft in 60 Sekunden, ob du sparen kannst."

BEAT 5 — RESULT (identity payoff — Principle 3)
  → 5-10 word VO showing the new state, the transformation
  → Example: "Bis zu 400 Euro weniger. Jeden Monat."

BEAT 6 — CTA (risk removal — Principle 8)
  → 5-8 word VO with the action + risk removal
  → Example: "Jetzt kostenlos prüfen — dauert 60 Sekunden."

Total VO word count must equal: <id> × 2.4 (de/hr) or × 2.6 (en).

15s ad → ~36 words / ~6 words per beat
30s ad → ~72 words / ~12 words per beat
60s ad → ~144 words / ~24 words per beat (use longer beats or add scenes)

For longer ads (45s+): Add an extra PROOF beat or a TESTIMONIAL beat between MECHANISM and RESULT.

For shorter ads (15-20s): Compress to 4 beats: HOOK + PROBLEM + SOLUTION + CTA.

Anti-AI sweep (Principle 12) — REQUIRED before approval gate

Before showing the script to the user, scan it for and remove:

Em-dashes mid-sentence (—) — replace with comma or period
"It's not X, it's Y" patterns
Emoji clusters
Formulaic openings ("In a world where...", "Imagine...", "Discover...")
Corporate filler ("iskoristite prednosti", "leverage", "unlock")
Two-word sentence fragments used as paragraphs
Anything that sounds translated rather than natively written

Read it aloud. If it sounds like an AI wrote it, rewrite it.

Scene Count by Duration

Target Duration	Scenes	Per-Scene Duration	Word Budget (de/hr)
15s	3	5s each	~36 words
20s	4	5s each	~48 words
25s	5	5s each	~60 words
30s (default)	6	5s each	~72 words
45s	9	5s each	~108 words
60s	12	5s each	~144 words

Kling 3.0 image-to-video produces 5s clips reliably. Keep all scenes uniform at 5s.

Step 2.5 — Script Approval Gate (REQUIRED)

After writing the script (Path A or B), ALWAYS show the user this breakdown BEFORE generating any images:

═══════════════════════════════════════════════════════════════
SCRIPT — [Client Name] | [Audience] | [Duration]s | [N] scenes
═══════════════════════════════════════════════════════════════

Scene | Visual Hook                  | Voiceover ([Lang])
------|------------------------------|--------------------------------
  1   | Burning euro bills           | "Selbstständig in Deutschland?"
  2   | Frustrated at desk           | "Die GKV kostet dich..."
  3   | Phone shows comparison       | "ACME Agency prüft in 60 Sek..."
  4   | Walking into clinic          | "Sofort zum Facharzt..."
  5   | Relaxed, smiling             | "Bis zu 400 Euro weniger..."
  6   | Branded CTA card             | "Jetzt kostenlos prüfen."

Voice: Chris Norddeutscher (German)
VO word count: 68 / 72 max (within target)
═══════════════════════════════════════════════════════════════

Then ask:

"Approve this script and proceed to image generation? Or any scene you want to revise?"

Wait for explicit approval. Do not spend Krea credits without it.

Step 3 — Generate Character Reference Image (if needed)

If the ad features a recurring person, generate ONE strong NB2 portrait first:

import { <id> } from './ACME Agency/scripts/lib/krea.mjs';

const result = await <id>({
  prompt: 'Portrait of a [age] [nationality] [gender], [hair], [outfit], [setting], confident eye contact, photorealistic, editorial portrait, 1:1',
  aspectRatio: '1:1',
  batchSize: 2,  // generate 2 to pick the best
  resolution: '2K',
});
// Pick the best one — show user, get confirmation

Save the chosen CDN URL — use it as imageUrls[characterRef] in later NB2 calls for character consistency.

Step 4 — Generate 6 Scene Images via NB2

For EACH scene, write a precision NB2 prompt. Use these templates:

Hook scene (action close-up):

Extreme close-up macro shot of [DRAMATIC OBJECT/ACTION]. [SPECIFIC DETAILS]. 
Held in [hand/context]. Dark moody background, dramatic low-key lighting. 
Cinematic, intense, attention-grabbing. Shallow depth of field. 9:16 vertical

Character scene (use character reference):

The person in the reference image [SPECIFIC ACTION/EMOTION], [POSITION/POSE], 
[ENVIRONMENT DETAILS]. [LIGHTING]. [MOOD]. Photorealistic, 9:16 vertical
→ Pass character ref via imageUrls

Product/UI scene (no character):

[Subject — phone/equipment/product] showing [SPECIFIC UI/DETAIL]. 
[BRAND COLORS]. [LIGHTING]. Shallow depth of field. Photorealistic, 9:16 vertical

CTA card (no character):

Bold [brand color] solid background. Large [accent color] text reading 
[CTA TEXT] centered. Clean geometric heavy sans-serif. [Brand logo] below. 
Minimalist premium. 9:16 vertical

Generate all 6 sequentially via <id>(). Save locally to ACME Agency/clients/<Client>/video-ads/<campaign>/reference-images/.

Step 5 — Upload Images to catbox.moe (CRITICAL)

Krea CDN URLs are unreliable for Kling's image fetcher. Always upload NB2 images to catbox first:

import { uploadToCatbox } from './ACME Agency/scripts/lib/kling.mjs';

const catboxUrls = {};
for (const [name, localPath] of Object.entries(images)) {
  catboxUrls[name] = await uploadToCatbox(localPath, 'image/png');
}
// Save catbox URLs to a JSON file in the reference-images folder for re-use

Step 6 — Animate Each Scene with Kling 3.0 Direct

For each scene, call <id>() from ACME Agency/scripts/lib/kling.mjs:

import { <id> } from './ACME Agency/scripts/lib/kling.mjs';

for (const scene of scenes) {
  const videoPath = await <id>({
    prompt: scene.animationPrompt,  // describes MOTION, not the static frame
    model: 'kling-v3',
    aspectRatio: '9:16',
    duration: 5,
    startImage: catboxUrls[scene.name],
  });
  await copyFile(videoPath, `clips/${scene.name}.mp4`);
}

Animation prompt rules:

Describe MOTION, not the scene (the scene is already in the start image)
Examples: "The flames intensify and crawl up the bills", "The man leans forward with sudden interest", "Slow cinematic dolly through the equipment"
Do NOT re-describe what's in the image — Kling already sees it

Step 7 — Generate ElevenLabs Voice (auto-selected by language)

Voice selection priority (highest first):

--voice <id|name> CLI flag (user override)
client.elevenlabs_voice_id from clients.json
Auto-selection by client language

Voice by language table (verified IDs from current ElevenLabs account):

Language	Voice ID	Name	Notes
`de` (German)	`j46AY0iVY3oHcnZbgEJg`	Chris Norddeutscher	North German pro, authoritative
`de` (alt)	`TUKJhQmz3RPYBNAgC5A1`	Clark Clear	German pro, alternative
`de` (alt)	`DtAQqD4yK3kXSVPx7wFc`	Pascal R	German narrator/storyteller
`hr` (Croatian)	`ZLYZToA7aDsMbHwM9AOr`	Luka	Croatian male, calm
`hr` (alt)	`FXFcxnjikw0naYO1PPrU`	Adnan	Croatian male, casual
`en` (English)	`JBFqnCBsd6RMkjVDRZzb`	George	British storyteller, premade (free)
`en` (alt)	`EXAVITQu4vr4xnSDxMaL`	Sarah	American female, mature, premade

Note: German + Croatian voices are "professional" and require a paid ElevenLabs plan. English premade voices work on free plan.

import { <id>, generateSpeech } from './ACME Agency/scripts/lib/elevenlabs.mjs';

// Auto-select voice from language (or override)
const VOICE_BY_LANG = {
  de: 'j46AY0iVY3oHcnZbgEJg',  // Chris Norddeutscher
  hr: 'ZLYZToA7aDsMbHwM9AOr',  // Luka
  en: 'JBFqnCBsd6RMkjVDRZzb',  // George
};

// Detect language from client.market or script.language
const lang = client.market === 'Germany' ? 'de'
           : client.market === 'Croatia' || client.market === 'Bosnia' ? 'hr'
           : 'en';

const voiceId = client.elevenlabs_voice_id || VOICE_BY_LANG[lang];

// Conversational settings (works for all 3 languages)
const voiceSettings = { stability: 0.4, similarityBoost: 0.85, style: 0.1 };

await <id>({ scenes, voiceId, outputDir, voiceSettings });

// Also generate full track
const fullScript = scenes.map(s => s.text).join(' ');
await generateSpeech({ text: fullScript, voiceId, destPath: 'voiceover-full.mp3', ...voiceSettings });

Step 8 — Upload Everything to Drive

Folder structure: Klijenti/<Client>/Video Ads/<Year>/<Month>/<campaign>/

Upload: clips, voiceover MP3s, reference-images PNGs.

Step 9 — Post Slack Report

This is the ONLY Slack message for the entire execution. Do NOT post scene-by-scene status, script breakdowns, retry notifications, or any intermediate updates to Slack during Steps 1-8. All progress goes to stdout (console.log) only. The team reads this one final report, not a play-by-play.

Incident reference: 2026-04-10 ACME Agency — the subprocess posted 8+ separate messages to the client channel during execution. Do not repeat.

Use the standard format:

Campaign name + duration + scene count
Brief script overview
Drive folder link
"Next step: import to CapCut, overlay VO, add music + SFX, export"

Verification (run AFTER Step 9 — confirm all assets actually shipped)

Cinematic ads have many moving parts that can silently fail. Check ALL of these before declaring done:

[ ] Each scene has a clips/scene-NN-*.mp4 file locally AND in Drive (count = scene count from script, no missing scenes)
[ ] Each MP4 file size > 100 KB (Kling sometimes returns broken stubs)
[ ] Voiceover MP3 exists locally AND in Drive AND duration ≈ target (within ±20%)
[ ] Reference images PNGs uploaded if generated (so user can see source frames)
[ ] capcut-guide.md written and uploaded
[ ] manifest.json records: scene count, durations, prompts used, Kling task IDs, VO voice ID, language, Drive URLs
[ ] Slack post via slack-reporter returned ts non-null
[ ] No silent Krea→Kling fallback unless explicitly logged (the fallback is fine, but it must be reported, not hidden)
[ ] Kling tasks all reported succeed status (not failed or unknown swallowed)

If any scene failed mid-batch, list the failed scene number(s) explicitly in the report. Never claim a 6-scene success when only 5 actually rendered.

CapCut Assembly Guide (delivered to user)

Include this in the Drive folder as capcut-guide.md:

1. Import all 6 clips from clips/ folder in order
2. Drop voiceover-full.mp3 on audio track 2
3. Adjust clip timing if VO doesn't perfectly align
4. Audio track 3: search CapCut music library for [mood] background music, set to -18dB
5. Audio track 4: SFX from CapCut library:
   - Scene 1: [relevant SFX — fire, paper rustling, etc.]
   - Scene N: [...]
6. Captions → Auto Captions → German → ACME Agencyw
7. Export 1080×1920, MP4, H.264

Critical Files

ACME Agency/scripts/lib/krea.mjs — Krea.ai NB2 image generation
ACME Agency/scripts/lib/kling.mjs — Kling 3.0 direct API + catbox upload
ACME Agency/scripts/lib/elevenlabs.mjs — TTS voiceover
ACME Agency/scripts/lib/google_drive.mjs — Drive upload
ACME Agency/scripts/lib/slack.mjs — Slack reporting
shared/kling_camera_toolkit.md — 30 cinematic camera movements vocabulary

Reference Examples (proven campaigns)

ACME Agency/clients/ACME Agency/video-ads/<id>/ — burning money hook (Selbstständige)
ACME Agency/clients/ACME Agency/video-ads/<id>/ — throwing papers hook (Angestellte 77K+)

Both campaigns: 6 scenes, 30s, character consistency, German VO, ready for CapCut.

Hard Constraints

Rule	Why
Always use Kling 3.0 direct (`kling-v3`) for video	Veo 3.1 doesn't support image-to-video. Kling 2.6 via Krea is unstable.
Always upload NB2 images to catbox before Kling	Krea CDN URLs fail unpredictably for Kling's fetcher
Never use `--mode multishot` for action ads	Multi-shot can't render dramatic single-frame hooks (burning money etc.)
Never use lip sync endpoint	Adds black artifacts in mouth area
Never describe characters as "talking to camera"	No audio API = looks fake. Always action-based shots.
Auto-select voice from client.market (de/hr/en)	See "Voice by language table" — overridable with `--voice` flag
Always show script approval gate before image generation	Krea credits cost money. 30-second confirmation saves rework
Max 12 scenes / 60s total	Beyond this, viewers drop off. 30s is the sweet spot for Meta.

Why This Skill Exists

Built after testing every alternative:

Approach	Problem
HeyGen avatar	Black-box, no creative control, character can't do dramatic actions
Kling multi-shot	Can't render dramatic single-frame visuals like burning money
Kling direct UGC + lip sync	Lip sync adds artifacts (black spots in mouth)
Veo 3.1 via Krea	Doesn't support image-to-video — text-only
Per-scene NB2 → Kling 3.0 + ElevenLabs VO	WORKS — what this skill does

This skill is the answer to the question: "How do I generate scroll-stopping cinematic Facebook ads with hooks that actually work?"

/cinematic-ad-generator

Skill: /<id>