[ CREATIVE ]
/video-edit
Edit an existing source video (from Drive or local) into a finished ad.
ACME Agency, <id> and you@example.com mark values that are per-agency — your install fills them with YOUR clients and accounts. If a section references a helper script you don't have yet, it ships with that workflow's install.Skill: /video-edit
Overview
Takes existing footage (typically a raw 1–5 minute shoot from the client) and produces a finished ad — cut down, voiced over, reframed, delivered.
This is the only skill in the workspace that edits real-world video. All other video skills (/<id>, /heygen-ad-generator, /motion-ad-generator, /video-ad-generator, /static-ad-generator) generate new footage via Krea / Kling / Veo / Creatomate / HeyGen. If the user already has footage, this is the skill.
Built after one-off editing requests started piling up (see ACME Agency/clients/ACME Agency/video-ads/ for the prototype that motivated this skill).
What it produces:
- A cut-down MP4 per requested aspect ratio (16:9, 9:16, 1:1, 4:5)
- ElevenLabs voiceover in the target language (auto-selected from
client.market) - Drive upload under
Klijenti/<Client>/Video Ads/<Year>/<Month>/<slug>/ - Slack report with variant links
- A
plan.jsonthat documents every cut, VO line, and decision — rerunnable
What it does NOT do:
- Generate new footage (use the generators)
- Motion graphics beyond simple text + logo overlays (use
/motion-ad-generator) - Auto-transcribe the source (later — Whisper is in the FFmpeg build, add a flag when needed)
- Beat-matched cuts (no audio analysis — pacing is driven by the shot plan)
The Golden Rules
- Vision-based shot selection is YOUR job. The
preparephase extracts pACME Agencyw frames to disk — you read them with the Read tool and pick which timestamps make the cut. The Node script does not guess shot boundaries.
- Pacing is enforced. No shot longer than
shotMaxDuration(default 3.0s). The script warns and reports any shot that exceeds it.
- ElevenLabs only — no silent fallback. If ElevenLabs fails, the script errors loudly. If you must fall back, pre-generate an MP3 with another TTS, drop it in
vo/voiceover.mp3, and rerun with--no-voice. The Slack report must mention any fallback explicitly.
- Plan.json is the contract. Every decision — shot timestamps, VO script, target ratios, audio mode — goes into
plan.json. The script never asks the user a question; it reads the plan and executes.
- One source of truth for the final duration.
plan.target.durationis the contract. The script warns if the sum of shot durations deviates more than 20% from target.
Preflight (before touching Node)
- Client exists in
clients.json(3-step cascade — clients.json → registry sheet → API discovery). Resolve the canonical key. - Source video provided. Either a Drive URL (
https://drive.google.com/file/d/<ID>/view) or a local path. If the user pasted only a filename without a folder, ask which Drive folder it's in. - Required env vars:
ELEVENLABS_API_KEY,<id>. Abort if missing. - Disk space ≥ 2 GB free (source videos can be 500 MB – 1 GB for a 3-min shoot).
client.drive_folder_idis set. If not, auto-discover via Drive search (don't just skip).
Step-by-step workflow
Step 0 — Identify client + parse source
Extract from the user's message:
- Client name → match to
clients.jsonkey - Source → Drive URL, Drive file name + folder hint, or local path
- Instructions → structure, language, duration, tone, CTA text, ratios, any specific shots to include/exclude
If any of those are missing, ask before running anything. Examples of ambiguous briefs that need clarification:
- No explicit duration → ask ("15s, 25s, or 30s?")
- No explicit target ratio → default to the source's ratio, mention you'll add others if asked
- No explicit language → infer from
client.market, mention the inference
Step 1 — Run prepare phase
node ACME Agency/scripts/video_edit.mjs prepare "<ClientKey>" "<drive-url-or-path>" --slug "YYYY-MM-DD-<short-descriptor>"
Example:
node ACME Agency/scripts/video_edit.mjs prepare "ACME Agency" \
"https://drive.google.com/file/d/<id>/view" \
--slug "<id>"
What this does:
- Downloads the source to
clients/<Client>/video-ads/<slug>/source/source.mp4 - Probes the source (duration, dimensions, fps, audio)
- Extracts pACME Agencyw frames every 2s to
frames/(cap 90 — adjust with--max-frames) - Writes a stub
plan.jsonwith emptyshots, emptyvoiceover.script, default target
Step 2 — Read the frames (vision-based shot selection)
This is the judgment-heavy step. Use the Read tool to read the frames from <campaignDir>/frames/ in order. You're looking for:
- Hero/beauty shots — wide establishing shots of the finished product, cinematic angles, good lighting
- Process shots — the interesting parts of the work (assembly, drilling, lifting, installing)
- Detail shots — close-ups that show craftsmanship (materials, joins, mechanisms)
- Closing shots — the transformation, the result, people enjoying the finished thing
- Anti-picks — bland wides, safety-gear-only frames, empty tripod shots, motion-blur clips
Map frames to timestamps from the filename (frame_012_t22.0.jpg → t=22s) and build a candidate list in your head.
Pacing math. Given target.duration and shotMaxDuration, you need at minimum ceil(target / shotMaxDuration) shots. For a 25s ad at 3s max, that's ≥9 shots. Aim higher — 11–13 shots for a 25s ad gives you room for varied pacing (a 1.5s punch cut followed by a 2.5s beauty hold).
Step 3 — Write the voiceover script
Apply direct-response principles from .claude/skills/copywrite/PRINCIPLES.md if you haven't already loaded them. Beat structure for a 25s editing ad:
HOOK (3-5s) — benefit-led promise OR intrigue question
BODY (10-15s) — what the product is, why it's different, one concrete proof
CLOSE (5-7s) — the transformation + CTA
Word budget:
- German / Croatian / Bosnian:
duration × 2.4words max - English:
duration × 2.6words max
25s German → ~60 words. 15s → ~36 words.
Anti-AI sweep (required — do NOT skip):
- No em-dashes mid-sentence (—)
- No 2-word fragments ("Premium. Schnell.")
- No formulaic openings ("Stellen Sie sich vor…", "Entdecken Sie…")
- No literal translation tells
- Read it aloud — if it sounds translated, rewrite
If the user gave you 2-3 script options to choose from (like a teammate did), present them first, wait for pick, then proceed.
Step 4 — Fill plan.json
Edit the plan file the script wrote. Example for ACME Agency's Lamellenpergola ad:
{
"version": 1,
"client": "ACME Agency",
"clientKey": "ACME Agency",
"slug": "<id>",
"source": {
"path": "...",
"duration": 183.5,
"ratio": "16:9",
"width": 1920,
"height": 1080,
"fps": 30
},
"target": {
"duration": 25,
"ratios": ["16:9", "9:16"],
"language": "de",
"shotMaxDuration": 3.0
},
"shots": [
{ "id": "h1", "start": 12.5, "duration": 2.5, "tag": "hook", "note": "wide beauty shot, finished pergola w/ furniture" },
{ "id": "h2", "start": 48.0, "duration": 1.8, "tag": "hook", "note": "close angle lamel — sun through slats" },
{ "id": "m1", "start": 64.0, "duration": 2.2, "tag": "montage", "note": "aluminum profile pickup" },
{ "id": "m2", "start": 72.0, "duration": 2.0, "tag": "montage", "note": "drill into mounting bracket" },
{ "id": "m3", "start": 88.0, "duration": 2.5, "tag": "montage", "note": "lifting crossbeam into place" },
{ "id": "m4", "start": 104.0,"duration": 1.8, "tag": "montage", "note": "snap-fit assembly wide" },
{ "id": "m5", "start": 118.0,"duration": 2.0, "tag": "montage", "note": "scaffold overhead shot" },
{ "id": "c1", "start": 152.0,"duration": 2.2, "tag": "closing", "note": "lamels rotating — slow motion" },
{ "id": "c2", "start": 165.0,"duration": 2.5, "tag": "closing", "note": "view through olive trees to finished pergola" },
{ "id": "c3", "start": 172.0,"duration": 2.5, "tag": "closing", "note": "aerial top-down of closed lamels" },
{ "id": "cta","start": 178.0,"duration": 3.0, "tag": "cta", "note": "final beauty hold — VO delivers CTA over this" }
],
"voiceover": {
"enabled": true,
"language": "de",
"voiceId": "j46AY0iVY3oHcnZbgEJg",
"script": "Von der leeren Fläche zu Ihrem persönlichen Rückzugsort. In nur zwei Tagen baut unser Team Ihre Lamellenpergola auf. Präzise, wetterfest, maßgefertigt. Kein Bausatz, ein Premiumprodukt. Das Ergebnis: Ihr neuer Lieblingsplatz, bei jedem Wetter. Jetzt Angebot anfordern und zehn Prozent Rabatt sichern.",
"stability": 0.5,
"similarityBoost": 0.75
},
"audio": { "mode": "replace", "videoVolume": 0.3, "audioVolume": 1.0 },
"logo": null,
"logoPosition": "br"
}
Key fields to fill:
target.duration— integer secondstarget.ratios— array of"16:9","9:16","1:1","4:5"target.language— two-letter code (de/hr/bs/en)target.shotMaxDuration— keep at 3.0 unless the brief explicitly allows slower pacingshots[]— each withid,start(source timestamp),duration,tag, and a shortnotevoiceover.script— the full narration text, single stringvoiceover.voiceId— use the auto-selected one from the stub unless the user asked for a specific voicelogo— absolute path to a PNG if the client has a transparent logo (e.g..../<Client>/logo-ref.png). Leave null if unsure.
Audio mode choice:
replace— VO only, source audio dropped. Default for ad edits.mix— VO + source audio both audible. Use when the source has ambient (machinery, footsteps) that adds flavour.duck— source audio plays but ducks under the VO. Use when there's music or dialogue in the source that's worth preserving.
Step 5 — Script approval gate (REQUIRED)
Before calling execute, post a summary to the user:
═══════════════════════════════════════════════════════════════
VIDEO EDIT PLAN — ACME Agency | Lamellenpergola | 25s | 11 shots | DE
═══════════════════════════════════════════════════════════════
Shots: Duration: 24.5s (target 25s, within tolerance)
h1 12.5s +2.5s hook
h2 48.0s +1.8s hook
m1 64.0s +2.2s montage
... (condensed)
cta 178.0s +3.0s cta
Voiceover (DE, Chris Norddeutscher):
"Von der leeren Fläche zu Ihrem persönlichen Rückzugsort. In nur zwei Tagen..."
Variants: 16:9 (primary), 9:16 (vertical crop)
Audio mode: replace (source audio dropped)
═══════════════════════════════════════════════════════════════
Ask: "Approve and execute?" Wait for explicit go-ahead before spending ElevenLabs credits.
Step 6 — Run execute phase
node ACME Agency/scripts/video_edit.mjs execute "<ClientKey>" --plan "<path-to-plan.json>"
The script:
- Cuts each shot from source (re-encode, for uniform params)
- Concats into
master_silent.mp4 - Generates ElevenLabs VO →
vo/voiceover.mp3 - Muxes VO onto silent master →
master_vo.mp4 - Applies logo overlay if
plan.logois set →master_branded.mp4 - Exports one final MP4 per requested ratio →
final/<ClientKey>_<slug>_<ratio>.mp4 - Uploads finals + plan.json to Drive
Klijenti/<Client>/Video Ads/<YYYY>/<MM>/<slug>/ - Posts Slack report to the client's channel
- Appends to
video_edit_manifest.jsonin the client's video-ads folder - Writes a knowledge inbox entry (future — not wired yet)
Step 7 — Verification
After execute completes, confirm:
- [ ] Each
final/*.mp4exists and has non-zero size - [ ]
getMeta()duration on each final is within ±10% ofplan.target.duration - [ ] Drive upload returned URLs for every ratio (check Slack report has all links)
- [ ] No silent ElevenLabs fallback (if VO failed, the script would have errored out loudly)
- [ ] Slack report posted to the client channel (not #your-channel default)
- [ ]
plan.jsonuploaded to Drive alongside the finals (for rerunnability)
If any step failed, re-run just the failed piece — the plan is the single source of truth and intermediate files stay on disk.
Iteration patterns
The plan-based design makes iterations cheap. Common ones:
| Change | What to edit | Rerun |
|---|---|---|
| Swap a shot | Edit shots[i].start or duration in plan.json | execute |
| Rewrite VO | Edit voiceover.script | execute |
| Different voice | Change voiceover.voiceId | execute |
| Add 9:16 variant | Add "9:16" to target.ratios | execute |
| Different duration | Change target.duration, re-pick shots | execute |
| New source video | Run prepare again with new slug |
For small script tweaks, you can also skip prepare entirely and just rerun execute — the cut clips get regenerated each time, but the source is already downloaded.
Hard constraints
| Rule | Why |
|---|---|
| Vision-based shot selection happens in Claude, never in the script | The script doesn't know what "good footage" looks like — it's deterministic, not judgmental |
| ElevenLabs failures must be loud, never silent | The OpenAI TTS fallback on a teammate's first run produced robotic audio that shipped before the user caught it. Never again. |
shotMaxDuration default is 3.0s | Meta ads with >3s cuts have visibly lower retention. Override only for explicit creative direction. |
target.duration ± 20% is a warning, not a hard stop | Sometimes you want 23s or 28s — but if it's ±50%, the shot list is broken and the skill should refuse to execute |
Plan version must match PLAN_VERSION in the script | Schema breaks will silently destroy edits otherwise |
Script approval gate before execute | ElevenLabs credits cost money, and Drive uploads clutter client folders |
When to use this skill vs generators
| User says | Skill |
|---|---|
| "Edit this video" + Drive link | /video-edit |
| "Cut this raw footage" | /video-edit |
| "Make a 15s ad from this 3 min shoot" | /video-edit |
| "Make us a video ad for X" (no source) | /<id> |
| "Make a video like this reference" (reference but no editable footage) | /<id> with reference |
| "Explainer-style video of someone talking" | /heygen-ad-generator |
| "Branded motion graphic" | /motion-ad-generator |
The creative director agent routes on presence of editable source footage — if the user handed you an MP4 they already own, it's /video-edit.
Critical files
- [ACME Agency/scripts/video_edit.mjs](ACME Agency/scripts/video_edit.mjs) — two-phase entrypoint (prepare → execute)
- [ACME Agency/scripts/lib/ffmpeg.mjs](ACME Agency/scripts/lib/ffmpeg.mjs) — core FFmpeg helpers (getMeta, cutClip, concatClips, muxAudio, reframe, logoOverlay, textOverlay, exportFinal)
- [ACME Agency/scripts/lib/elevenlabs.mjs](ACME Agency/scripts/lib/elevenlabs.mjs) — voiceover generation
- [ACME Agency/scripts/lib/google_drive.mjs](ACME Agency/scripts/lib/google_drive.mjs) — download source, upload finals
- [ACME Agency/scripts/lib/slack.mjs](ACME Agency/scripts/lib/slack.mjs) — report posting
Reference example
ACME Agency/clients/ACME Agency/video-ads/ — the ad-hoc prototype that motivated this skill:
pergola-montaz-source.mp4(751 MB raw 3-min shoot)clips/h1,h2,m1-m5,c1-c3,cta.mp4— the cut segmentsvo/hook,middle,closing,cta.mp3— per-segment voiceovers<id>.mp4— the 25.6s final
Rebuilding this with the skill: create a plan.json pointing at pergola-montaz-source.mp4, fill in the shots with the same timestamps, run execute. Result should match the prototype.