CLI subtitle workflow: generate, convert, and burn

Managing subtitles quickly and reproducibly is key to accessible, multilingual video. Command-line interface (CLI) tools excel here because they script well, play nice with CI/CD, and avoid slow, click-heavy GUIs. This guide walks through a soup-to-nuts workflow—generate, convert, embed, or burn subtitles—using open-source utilities you can install with a single command.
Why use CLI tools for subtitle management?
CLI programs let you:
- batch-process hundreds of videos without lifting a mouse,
- automate tasks in GitHub Actions, Jenkins, or cron jobs,
- mix and match best-in-class tools instead of relying on one monolithic app, and
- reproduce results exactly by committing your commands to version control.
If your team already ships code from the terminal, adding subtitles the same way keeps the pipeline consistent.
Generate subtitles with whisper or subsai
Modern speech-to-text models finally make auto-captioning accurate enough for production. OpenAI’s Whisper is the workhorse, and subsai adds a friendly wrapper plus some extra editing tricks.
Whisper
# Create a virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate
# Install whisper
pip install git+https://github.com/openai/whisper.git
# Generate english subtitles in srt format
whisper video.mp4 --model base --output_format srt --language en
Tips
- Choose
tiny
orsmall
for speed,medium
orlarge
for higher accuracy. - Use
--device cuda
when an NVIDIA GPU is available; it cuts runtime dramatically.
Subsai
pip install git+https://github.com/absadiki/subsai.git
# Same result as above, with some post-processing niceties
subsai video.mp4 --model base --output subtitles.srt
Both tools output time-stamped subtitles that are usually 95 %–98 % accurate out of the box. A quick manual review with a subtitle editor such as Aegisub finishes the job.
Convert between formats with subtitle edit CLI
Need WebVTT for the web or ASS for advanced styling? Subtitle Edit—famous for its GUI—also offers a cross-platform CLI.
# For Ubuntu/Debian, use the .Net version
sudo apt-get install -y dotnet-sdk-8.0
# Install the global tool
dotnet tool install --global SubtitleEdit.CLI
# Convert srt ➜ vtt
SubtitleEdit.CLI convert input.srt output.vtt
# Convert srt ➜ ass
SubtitleEdit.CLI convert input.srt output.ass
Subtitle Edit supports more than 200 formats, so odds are good it speaks the one you need.
Embed soft subtitles with mkvtoolnix
Soft subtitles live in the container as a separate track. They can be toggled, styled, or replaced without touching the video stream.
# Install mkvtoolnix (Ubuntu/Debian)
sudo apt-get install mkvtoolnix
# Embed subtitles
mkvmerge -o output.mkv video.mp4 subtitles.srt
Want multiple languages?
mkvmerge -o output.mkv \
video.mp4 \
--language 0:eng subs_en.srt \
--language 0:spa subs_es.srt
Players such as VLC, mpv, and most smart-TV apps will present a language picker automatically.
Burn (hard-code) subtitles with FFmpeg
Hard subtitles are painted onto the video pixels, guaranteeing 100 % compatibility at the cost of flexibility.
# Plain srt burn-in
ffmpeg -i video.mp4 -vf "subtitles=subtitles.srt" -c:v libx264 -crf 20 -preset veryfast output.mp4
# Custom styling, white text, 24 pt font
affinity='FontSize=24,PrimaryColour=&HFFFFFF&'
ffmpeg -i video.mp4 -vf "subtitles=subtitles.srt:force_style='${affinity}'" \
-c:v libx264 -crf 20 -preset veryfast output-styled.mp4
# Ass/ssa subtitles retain their own styling
affinity_file=subtitles.ass
ffmpeg -i video.mp4 -vf "ass=${affinity_file}" -c:v libx264 -crf 20 -preset veryfast output-ass.mp4
Use -c:v h264_nvenc
or -c:v hevc_nvenc
to leverage an NVIDIA GPU and cut encoding time.
Automate everything with a shell script
Below is a Bash script that wires some of the previous commands together. Drop it into
subtitles.sh
, make it executable, and point it at a directory of .mp4
files.
#!/usr/bin/env bash
set -euo pipefail
# Create a function for processing
process_video() {
local video="$1"
echo "Processing ${video}..."
# Generate subtitles
# Whisper output is e.g. video.srt for video.mp4.
whisper "$video" --model base --output_format srt --language en
# Convert to VTT if needed
# This assumes whisper created ${video%.mp4}.srt
SubtitleEdit.CLI convert "${video%.mp4}.srt" "${video%.mp4}.vtt"
# Create soft-subbed version
# This also assumes ${video%.mp4}.srt exists
mkvmerge -o "${video%.mp4}_subbed.mkv" "$video" "${video%.mp4}.srt"
}
# Process all MP4 files
for video in *.mp4; do
process_video "$video"
done
Feel free to stash this in a CI job so every new video that lands in the repo gets subtitles automatically.
Performance and scaling tips
- Pick the right Whisper model. The
tiny
model processes roughly real-time on a CPU;large
can be 10× slower but catches tricky words. - Use GPUs whenever possible. Both Whisper (
--device cuda
) and FFmpeg (-hwaccel cuda
) see 5×–10× gains on modern hardware. - Parallelize cautiously. Whisper is CPU/GPU-bound, so one instance per GPU is usually optimal. Launching too many processes may slow everything down.
- Cache results. Store generated
.srt
files so reruns only transcode.
Soft vs. Hard subtitles
Soft | Hard | |
---|---|---|
Toggleable? | Yes | No |
Multiple languages | Yes | One per file |
Player support | Needs subtitle-aware player | Any player |
File size impact | Negligible | Re-encode increases size |
Editable later | Easy | Impossible |
Choose soft subtitles for streaming platforms and editing flexibility. Use hard subtitles when you cannot control the playback environment or need burnt-in branding.
Wrap-up
With Whisper, subsai, Subtitle Edit CLI, MKVToolNix, and FFmpeg you can stitch together a zero-click pipeline that adds high-quality captions to every video your team ships. If you would rather skip server maintenance altogether, our 🤖 /video/subtitle Robot handles generation, conversion, and burn-in through a single API call. A minimal Assembly looks like:
{
"steps": {
"subtitle_video": {
"robot": "/video/subtitle",
"use": ":original",
"subtitles_type": "burned",
"font_size": 24,
"position": "bottom",
"font_color": "FFFFFF",
"border_style": "outline"
}
}
}
Give it a try—your viewers, especially those watching on muted autoplay, will thank you.