Pipeline

Run transcript and metadata pipeline.

The pipeline is a Python workflow that runs per video: it extracts audio, generates a transcript, creates AI metadata (title, description, hashtags) for YouTube, Instagram, and TikTok, and writes platform exports. The app invokes it when you click Generate (for the selected row) or Publish → Metadata only → Generate (for selected rows).

Who gets processed: Only rows that do not yet have metadata for the requested platforms are sent to the pipeline; the app checks outputs on disk first. If all selected rows already have metadata, you see a message and no pipeline runs. Rows already being processed are skipped (single-flight).

Optional platforms: You can generate metadata only for specific platforms (e.g. only YouTube). In the Details panel, if some platforms already have metadata, Generate can be run for the missing platform(s). The app passes the chosen platforms to the pipeline so only those are (re)generated and merged with existing metadata.

Outputs go to yt_pipeline/outputs (relative to the app). Logs stream in the Pipeline Log section in the Details panel and are saved to yt_pipeline/outputs/Reports/last_run.log.

What the pipeline does per video

  1. Audio — Extracts audio with ffmpeg → outputs/Audio/{stem}.mp3 (mono 16 kHz). Skipped if the file already exists and is non-empty.
  2. Transcript — Transcribes with faster-whisper → outputs/Transcripts/{stem}.txt. Skipped if a transcript file already exists and is non-empty; otherwise the existing file is used. If the transcript is too short (e.g. under 40 characters), the pipeline stops with an error for that video.
  3. Metadata — Generates title, description, and hashtags with OpenAI using your Custom AI settings (loaded from app user data). Writes outputs/Metadata/{stem}.json. If metadata for the requested platform(s) already exists, OpenAI is skipped and existing metadata is reused (or merged when only some platforms were requested).
  4. Exports — Writes platform-specific files under outputs/Exports/YouTube, Exports/Instagram, Exports/TikTok: {stem}.title.txt, {stem}.description.txt, {stem}.hashtags.txt, {stem}.json. Only the platforms that were requested (or all three if none specified) are written.

The {stem} is derived from the video filename (normalized) plus a short path hash to avoid collisions between videos with the same name in different folders.

How to run the pipeline

Step 1 — Add rows first

Ensure the row(s) you want to process are in the Jobs table (see Add Files & Jobs). Add videos via Add (Files or Folder) if needed.

Pipeline Log section in Details panel

Screenshot 01 — Pipeline Log in Details panel

Step 2 — Start the pipeline

Select the row(s). Then either: in the Details panel click Generate in the Metadata section (for the selected row; you can choose to generate for all platforms or only for missing ones), or click Publish, choose Metadata only, then Generate (for all selected rows). Only rows that need metadata for the chosen platforms are sent; the rest are skipped and you see a message.

Generate button

Screenshot 02 — Select row(s), then Generate in Details or Publish → Metadata only → Generate

Step 3 — Watch the log

The app runs the Python script; progress and logs appear in the Pipeline Log section in the Details panel. Each processed file is reported (e.g. OK, SKIP_OPENAI, ERROR). Row status updates to Processing and then Done (or Error) as the pipeline finishes.

Pipeline Log area

Screenshot 03 — Pipeline Log shows progress and output

Step 4 — Find outputs

When the run finishes, outputs are in yt_pipeline/outputs: Audio, Transcripts, Metadata, and Exports/YouTube, Exports/Instagram, Exports/TikTok. Run reports and logs are written to yt_pipeline/outputs/Reports/ (e.g. last_run.log, report_*.txt, report_*.csv). Use Open outputs or Open exports in the Details panel to open these folders.

Pipeline outputs

Screenshot 04 — Outputs in yt_pipeline/outputs

Common issues

  • Pipeline fails or hangs — Check that Python and required dependencies (ffmpeg, faster-whisper, openai) are installed and on your PATH. You can set PYTHON_PATH to the full path of your Python executable. The app tries to use a conda environment named yt-gpu if present. Review yt_pipeline/outputs/Reports/last_run.log and the in-app Pipeline Log for the exact error.
  • Output location — Outputs are written to yt_pipeline/outputs next to the app (or your configured pipeline root).
  • All selected already have metadata — The app only runs the pipeline for rows that lack metadata for the platforms you requested. To regenerate, delete metadata for that platform (or file) first, or use the per-platform Generate in Details for the missing platform.