JM BudoAcademy

CONTENT-PIPELINE

JM BudoAcademy

From a video recording, a single run produces a publication-ready, bilingual CMS entry — including polished transcripts, translated metadata, and matching preview images.
Pattern Bilingual content pipelines
  • Python 3.12
  • Google Gemini
  • OpenAI Whisper

THE STARTING POINT

The starting point

A martial arts academy regularly publishes videos on its website — in German and English, with readable transcripts, custom preview images, clean metadata, and full headless-CMS upkeep. Before automation, every video meant: pull the YouTube transcript or transcribe the MP4, smooth it manually, translate it, design a preview image, enter everything in the CMS, set categories and tags.

That isn't just "a bit of post-processing" — it ties up a noticeable share of weekly editorial time. And as the video archive grows, it isn't a task that fits on the side.

WHAT WE BUILT

What we built

A standalone Python tool that automates the full workflow from source video to CMS entry — in clearly separated phases, with human review before final approval.

The six phases

  1. Pull the transcript — for YouTube via the official transcript API, for local MP4 files via Whisper (running locally on your own machine, no cloud upload of audio).
  2. Polish the transcript — Gemini smooths filler sounds, fixes recognition errors, and structures the raw text into readable paragraphs.
  3. Translate bilingually — DE/EN translation including title, description, slug, and tags.
  4. Extract a scene image — the highest-resolution YouTube thumbnail or an ffmpeg frame from the MP4 as a visual reference for the next step.
  5. Generate preview images — Gemini Nano Banana Pro produces two language-specific preview images with title overlay and a consistent brand look, based on the scene image.
  6. Upload to Directus — images go into the file repository, the video entity with all metadata into the right collection.

Two interfaces, one pipeline core

Both the CLI for batch runs and the web interface share the same code. The web UI guides through a three-step wizard:

  • Input — source URL or file path, topic, category, difficulty level.
  • Processing — automated pipeline with live progress per phase.
  • Review — side-by-side DE/EN view for title, description, slug, tags, and preview images, with a per-language "regenerate image" button and expandable transcript editors.

What the editor changes during review is what lands in the CMS — not the raw AI output. Human control isn't theatre here, it's structurally anchored.

WHAT IT GIVES THEM

What it gives them

  • Hours of detail work become a short review. The repeating steps — clean transcript, translate, build images, upload — happen automatically in the background.
  • Both language versions ship in parallel. No more "German first, English maybe later".
  • Visual style stays consistent. A prompt template with brand guidelines keeps the preview images recognisable as a series — regardless of who's running the tool.
  • Data stays where it belongs. Local videos are transcribed with Whisper on your own machine; only the polished texts go to the language API.

WHAT WE DELIBERATELY DID NOT AUTOMATE

What we deliberately did not automate

  • The final approval. The tool doesn't replace review — it prepares for it.
  • The creative input decisions about topic, category, and difficulty level. These still come from a person.
  • Fact-checking on people mentioned. Whoever appears in the videos is verified by the person who knows the topic area.

WHY THIS PATTERN TRANSFERS

Why this pattern transfers

The setup works wherever bilingual content is produced repeatedly from audio or video sources: training videos, interview series, recorded talks, podcast companion pieces, webinar recordings.

The pattern: source → AI processing in phases → human review step → CMS integration.

AI takes on the tedious 80% of the routine. The creative and quality-assurance 20% stays with the person — where it belongs.

Talk to us

Two doors, one address.

Specific bottleneck?

Let us talk for 30 minutes about your use case.

No obligation, no cost, with concrete next steps at the end.

Book a 30-minute call

Your own AI platform?

See CompanyWizard live in action.

Demo with your own data is possible. We bring the pseudonymisation set up and ready.

Request a demo