# /video-edit

> Edit an existing source video (from Drive or local) into a finished ad.


# Skill: `/video-edit`

## Overview

Takes **existing footage** (typically a raw 1–5 minute shoot from the client) and produces a finished ad — cut down, voiced over, reframed, delivered.

This is the only skill in the workspace that **edits** real-world video. All other video skills (`/<id>`, `/heygen-ad-generator`, `/motion-ad-generator`, `/video-ad-generator`, `/static-ad-generator`) **generate** new footage via Krea / Kling / Veo / Creatomate / HeyGen. If the user already has footage, this is the skill.

Built after one-off editing requests started piling up (see `ACME Agency/clients/ACME Agency/video-ads/` for the prototype that motivated this skill).

**What it produces:**
- A cut-down MP4 per requested aspect ratio (16:9, 9:16, 1:1, 4:5)
- ElevenLabs voiceover in the target language (auto-selected from `client.market`)
- Drive upload under `Klijenti/<Client>/Video Ads/<Year>/<Month>/<slug>/`
- Slack report with variant links
- A `plan.json` that documents every cut, VO line, and decision — rerunnable

**What it does NOT do:**
- Generate new footage (use the generators)
- Motion graphics beyond simple text + logo overlays (use `/motion-ad-generator`)
- Auto-transcribe the source (later — Whisper is in the FFmpeg build, add a flag when needed)
- Beat-matched cuts (no audio analysis — pacing is driven by the shot plan)

---

## The Golden Rules

1. **Vision-based shot selection is YOUR job.** The `prepare` phase extracts pACME Agencyw frames to disk — you read them with the Read tool and pick which timestamps make the cut. The Node script does not guess shot boundaries.

2. **Pacing is enforced.** No shot longer than `shotMaxDuration` (default 3.0s). The script warns and reports any shot that exceeds it.

3. **ElevenLabs only — no silent fallback.** If ElevenLabs fails, the script errors loudly. If you must fall back, pre-generate an MP3 with another TTS, drop it in `vo/voiceover.mp3`, and rerun with `--no-voice`. The Slack report must mention any fallback explicitly.

4. **Plan.json is the contract.** Every decision — shot timestamps, VO script, target ratios, audio mode — goes into `plan.json`. The script never asks the user a question; it reads the plan and executes.

5. **One source of truth for the final duration.** `plan.target.duration` is the contract. The script warns if the sum of shot durations deviates more than 20% from target.

---

## Preflight (before touching Node)

1. **Client exists** in `clients.json` (3-step cascade — clients.json → registry sheet → API discovery). Resolve the canonical key.
2. **Source video provided.** Either a Drive URL (`https://drive.google.com/file/d/<ID>/view`) or a local path. If the user pasted only a filename without a folder, ask which Drive folder it's in.
3. **Required env vars:** `ELEVENLABS_API_KEY`, `<id>`. Abort if missing.
4. **Disk space** ≥ 2 GB free (source videos can be 500 MB – 1 GB for a 3-min shoot).
5. **`client.drive_folder_id` is set.** If not, auto-discover via Drive search (don't just skip).

---

## Step-by-step workflow

### Step 0 — Identify client + parse source

Extract from the user's message:
- **Client name** → match to `clients.json` key
- **Source** → Drive URL, Drive file name + folder hint, or local path
- **Instructions** → structure, language, duration, tone, CTA text, ratios, any specific shots to include/exclude

If any of those are missing, ask before running anything. Examples of ambiguous briefs that need clarification:
- No explicit duration → ask ("15s, 25s, or 30s?")
- No explicit target ratio → default to the source's ratio, mention you'll add others if asked
- No explicit language → infer from `client.market`, mention the inference

### Step 1 — Run `prepare` phase

```bash
node ACME Agency/scripts/video_edit.mjs prepare "<ClientKey>" "<drive-url-or-path>" --slug "YYYY-MM-DD-<short-descriptor>"
```

Example:
```bash
node ACME Agency/scripts/video_edit.mjs prepare "ACME Agency" \
  "https://drive.google.com/file/d/<id>/view" \
  --slug "<id>"
```

What this does:
- Downloads the source to `clients/<Client>/video-ads/<slug>/source/source.mp4`
- Probes the source (duration, dimensions, fps, audio)
- Extracts pACME Agencyw frames every 2s to `frames/` (cap 90 — adjust with `--max-frames`)
- Writes a **stub** `plan.json` with empty `shots`, empty `voiceover.script`, default target

### Step 2 — Read the frames (vision-based shot selection)

This is the judgment-heavy step. Use the Read tool to read the frames from `<campaignDir>/frames/` in order. You're looking for:

- **Hero/beauty shots** — wide establishing shots of the finished product, cinematic angles, good lighting
- **Process shots** — the interesting parts of the work (assembly, drilling, lifting, installing)
- **Detail shots** — close-ups that show craftsmanship (materials, joins, mechanisms)
- **Closing shots** — the transformation, the result, people enjoying the finished thing
- **Anti-picks** — bland wides, safety-gear-only frames, empty tripod shots, motion-blur clips

Map frames to timestamps from the filename (`frame_012_t22.0.jpg` → t=22s) and build a candidate list in your head.

**Pacing math.** Given `target.duration` and `shotMaxDuration`, you need at minimum `ceil(target / shotMaxDuration)` shots. For a 25s ad at 3s max, that's ≥9 shots. Aim higher — 11–13 shots for a 25s ad gives you room for varied pacing (a 1.5s punch cut followed by a 2.5s beauty hold).

### Step 3 — Write the voiceover script

Apply direct-response principles from `.claude/skills/copywrite/PRINCIPLES.md` if you haven't already loaded them. Beat structure for a 25s editing ad:

```
HOOK (3-5s)    — benefit-led promise OR intrigue question
BODY (10-15s)  — what the product is, why it's different, one concrete proof
CLOSE (5-7s)   — the transformation + CTA
```

**Word budget:**
- German / Croatian / Bosnian: `duration × 2.4` words max
- English: `duration × 2.6` words max

25s German → ~60 words. 15s → ~36 words.

**Anti-AI sweep** (required — do NOT skip):
- No em-dashes mid-sentence (—)
- No 2-word fragments ("Premium. Schnell.")
- No formulaic openings ("Stellen Sie sich vor…", "Entdecken Sie…")
- No literal translation tells
- Read it aloud — if it sounds translated, rewrite

If the user gave you 2-3 script options to choose from (like a teammate did), present them first, wait for pick, then proceed.

### Step 4 — Fill `plan.json`

Edit the plan file the script wrote. Example for ACME Agency's Lamellenpergola ad:

```json
{
  "version": 1,
  "client": "ACME Agency",
  "clientKey": "ACME Agency",
  "slug": "<id>",
  "source": {
    "path": "...",
    "duration": 183.5,
    "ratio": "16:9",
    "width": 1920,
    "height": 1080,
    "fps": 30
  },
  "target": {
    "duration": 25,
    "ratios": ["16:9", "9:16"],
    "language": "de",
    "shotMaxDuration": 3.0
  },
  "shots": [
    { "id": "h1", "start": 12.5, "duration": 2.5, "tag": "hook",    "note": "wide beauty shot, finished pergola w/ furniture" },
    { "id": "h2", "start": 48.0, "duration": 1.8, "tag": "hook",    "note": "close angle lamel — sun through slats" },
    { "id": "m1", "start": 64.0, "duration": 2.2, "tag": "montage", "note": "aluminum profile pickup" },
    { "id": "m2", "start": 72.0, "duration": 2.0, "tag": "montage", "note": "drill into mounting bracket" },
    { "id": "m3", "start": 88.0, "duration": 2.5, "tag": "montage", "note": "lifting crossbeam into place" },
    { "id": "m4", "start": 104.0,"duration": 1.8, "tag": "montage", "note": "snap-fit assembly wide" },
    { "id": "m5", "start": 118.0,"duration": 2.0, "tag": "montage", "note": "scaffold overhead shot" },
    { "id": "c1", "start": 152.0,"duration": 2.2, "tag": "closing", "note": "lamels rotating — slow motion" },
    { "id": "c2", "start": 165.0,"duration": 2.5, "tag": "closing", "note": "view through olive trees to finished pergola" },
    { "id": "c3", "start": 172.0,"duration": 2.5, "tag": "closing", "note": "aerial top-down of closed lamels" },
    { "id": "cta","start": 178.0,"duration": 3.0, "tag": "cta",     "note": "final beauty hold — VO delivers CTA over this" }
  ],
  "voiceover": {
    "enabled": true,
    "language": "de",
    "voiceId": "j46AY0iVY3oHcnZbgEJg",
    "script": "Von der leeren Fläche zu Ihrem persönlichen Rückzugsort. In nur zwei Tagen baut unser Team Ihre Lamellenpergola auf. Präzise, wetterfest, maßgefertigt. Kein Bausatz, ein Premiumprodukt. Das Ergebnis: Ihr neuer Lieblingsplatz, bei jedem Wetter. Jetzt Angebot anfordern und zehn Prozent Rabatt sichern.",
    "stability": 0.5,
    "similarityBoost": 0.75
  },
  "audio": { "mode": "replace", "videoVolume": 0.3, "audioVolume": 1.0 },
  "logo": null,
  "logoPosition": "br"
}
```

**Key fields to fill:**
- `target.duration` — integer seconds
- `target.ratios` — array of `"16:9"`, `"9:16"`, `"1:1"`, `"4:5"`
- `target.language` — two-letter code (de/hr/bs/en)
- `target.shotMaxDuration` — keep at 3.0 unless the brief explicitly allows slower pacing
- `shots[]` — each with `id`, `start` (source timestamp), `duration`, `tag`, and a short `note`
- `voiceover.script` — the full narration text, single string
- `voiceover.voiceId` — use the auto-selected one from the stub unless the user asked for a specific voice
- `logo` — absolute path to a PNG if the client has a transparent logo (e.g. `.../<Client>/logo-ref.png`). Leave null if unsure.

**Audio mode choice:**
- `replace` — VO only, source audio dropped. Default for ad edits.
- `mix` — VO + source audio both audible. Use when the source has ambient (machinery, footsteps) that adds flavour.
- `duck` — source audio plays but ducks under the VO. Use when there's music or dialogue in the source that's worth preserving.

### Step 5 — Script approval gate (REQUIRED)

Before calling `execute`, post a summary to the user:

```
═══════════════════════════════════════════════════════════════
VIDEO EDIT PLAN — ACME Agency | Lamellenpergola | 25s | 11 shots | DE
═══════════════════════════════════════════════════════════════

Shots:                      Duration: 24.5s  (target 25s, within tolerance)
  h1  12.5s  +2.5s  hook
  h2  48.0s  +1.8s  hook
  m1  64.0s  +2.2s  montage
  ... (condensed)
  cta 178.0s +3.0s  cta

Voiceover (DE, Chris Norddeutscher):
  "Von der leeren Fläche zu Ihrem persönlichen Rückzugsort. In nur zwei Tagen..."

Variants: 16:9 (primary), 9:16 (vertical crop)
Audio mode: replace (source audio dropped)
═══════════════════════════════════════════════════════════════
```

Ask: "Approve and execute?" Wait for explicit go-ahead before spending ElevenLabs credits.

### Step 6 — Run `execute` phase

```bash
node ACME Agency/scripts/video_edit.mjs execute "<ClientKey>" --plan "<path-to-plan.json>"
```

The script:
1. Cuts each shot from source (re-encode, for uniform params)
2. Concats into `master_silent.mp4`
3. Generates ElevenLabs VO → `vo/voiceover.mp3`
4. Muxes VO onto silent master → `master_vo.mp4`
5. Applies logo overlay if `plan.logo` is set → `master_branded.mp4`
6. Exports one final MP4 per requested ratio → `final/<ClientKey>_<slug>_<ratio>.mp4`
7. Uploads finals + plan.json to Drive `Klijenti/<Client>/Video Ads/<YYYY>/<MM>/<slug>/`
8. Posts Slack report to the client's channel
9. Appends to `video_edit_manifest.json` in the client's video-ads folder
10. Writes a knowledge inbox entry (future — not wired yet)

### Step 7 — Verification

After execute completes, confirm:

- [ ] Each `final/*.mp4` exists and has non-zero size
- [ ] `getMeta()` duration on each final is within ±10% of `plan.target.duration`
- [ ] Drive upload returned URLs for every ratio (check Slack report has all links)
- [ ] No silent ElevenLabs fallback (if VO failed, the script would have errored out loudly)
- [ ] Slack report posted to the client channel (not #your-channel default)
- [ ] `plan.json` uploaded to Drive alongside the finals (for rerunnability)

If any step failed, re-run just the failed piece — the plan is the single source of truth and intermediate files stay on disk.

---

## Iteration patterns

The plan-based design makes iterations cheap. Common ones:

| Change | What to edit | Rerun |
|--------|-------------|-------|
| Swap a shot | Edit `shots[i].start` or `duration` in plan.json | `execute` |
| Rewrite VO | Edit `voiceover.script` | `execute` |
| Different voice | Change `voiceover.voiceId` | `execute` |
| Add 9:16 variant | Add `"9:16"` to `target.ratios` | `execute` |
| Different duration | Change `target.duration`, re-pick shots | `execute` |
| New source video | Run `prepare` again with new slug | |

For small script tweaks, you can also skip `prepare` entirely and just rerun `execute` — the cut clips get regenerated each time, but the source is already downloaded.

---

## Hard constraints

| Rule | Why |
|------|-----|
| Vision-based shot selection happens in Claude, never in the script | The script doesn't know what "good footage" looks like — it's deterministic, not judgmental |
| ElevenLabs failures must be loud, never silent | The OpenAI TTS fallback on a teammate's first run produced robotic audio that shipped before the user caught it. Never again. |
| `shotMaxDuration` default is 3.0s | Meta ads with >3s cuts have visibly lower retention. Override only for explicit creative direction. |
| `target.duration` ± 20% is a warning, not a hard stop | Sometimes you want 23s or 28s — but if it's ±50%, the shot list is broken and the skill should refuse to execute |
| Plan version must match `PLAN_VERSION` in the script | Schema breaks will silently destroy edits otherwise |
| Script approval gate before `execute` | ElevenLabs credits cost money, and Drive uploads clutter client folders |

---

## When to use this skill vs generators

| User says | Skill |
|-----------|-------|
| "Edit this video" + Drive link | `/video-edit` |
| "Cut this raw footage" | `/video-edit` |
| "Make a 15s ad from this 3 min shoot" | `/video-edit` |
| "Make us a video ad for X" (no source) | `/<id>` |
| "Make a video like this reference" (reference but no editable footage) | `/<id>` with reference |
| "Explainer-style video of someone talking" | `/heygen-ad-generator` |
| "Branded motion graphic" | `/motion-ad-generator` |

The creative director agent routes on presence of **editable source footage** — if the user handed you an MP4 they already own, it's `/video-edit`.

---

## Critical files

- [ACME Agency/scripts/video_edit.mjs](ACME Agency/scripts/video_edit.mjs) — two-phase entrypoint (prepare → execute)
- [ACME Agency/scripts/lib/ffmpeg.mjs](ACME Agency/scripts/lib/ffmpeg.mjs) — core FFmpeg helpers (getMeta, cutClip, concatClips, muxAudio, reframe, logoOverlay, textOverlay, exportFinal)
- [ACME Agency/scripts/lib/elevenlabs.mjs](ACME Agency/scripts/lib/elevenlabs.mjs) — voiceover generation
- [ACME Agency/scripts/lib/google_drive.mjs](ACME Agency/scripts/lib/google_drive.mjs) — download source, upload finals
- [ACME Agency/scripts/lib/slack.mjs](ACME Agency/scripts/lib/slack.mjs) — report posting

## Reference example

`ACME Agency/clients/ACME Agency/video-ads/` — the ad-hoc prototype that motivated this skill:
- `pergola-montaz-source.mp4` (751 MB raw 3-min shoot)
- `clips/h1,h2,m1-m5,c1-c3,cta.mp4` — the cut segments
- `vo/hook,middle,closing,cta.mp3` — per-segment voiceovers
- `<id>.mp4` — the 25.6s final

Rebuilding this with the skill: create a `plan.json` pointing at `pergola-montaz-source.mp4`, fill in the shots with the same timestamps, run `execute`. Result should match the prototype.