How to create video transcription with ffmpeg and whisper

Requirements ffmpeg whisper Python 3.10+ (for Whisper) Installation macOS # Install Homebrew if you don't have it /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" # Install ffmpeg brew install ffmpeg # Install Python (if needed) brew install python # Install Whisper pip3 install --upgrade pip pip3 install git+https://github.com/openai/whisper.git Windows # Install Chocolatey if you don't have it # Run in PowerShell as administrator: Set-ExecutionPolicy Bypass -Scope Process -Force [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072 iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1')) # Install ffmpeg choco install ffmpeg # Install Python (from python.org) # Make sure to check "Add Python to PATH" during installation # Install Whisper pip install -U openai-whisper Linux # Install ffmpeg sudo apt update && sudo apt install ffmpeg # Install Python and pip sudo apt install python3 python3-pip # Install Whisper pip3 install git+https://github.com/openai/whisper.git Transcription Steps Extract audio from video using ffmpeg ffmpeg -i input_video.mp4 -vn -acodec mp3 output.mp3 Transcribe audio with Whisper whisper output.mp3 --language English --model small --output_format txt Model Options tiny: Fastest, lowest accuracy (~1GB RAM) base: Fast, decent accuracy (~1GB RAM) small: Balanced speed/accuracy (~2GB RAM) medium: Good accuracy (~5GB RAM) large: Best accuracy (~10GB RAM) Output Formats txt: Plain text transcript srt: Standard subtitle format vtt: Web Video Text Tracks format json: Detailed JSON with timestamps Additional Options --task translate: Translates non-English audio to English --language en: Specifies the source language (faster and more accurate) --model: Selects the model size (tiny/base/small/medium/large) Source: macos.gadgethacks.com Source: dev.to

Apr 9, 2025 - 22:37

How to create video transcription with ffmpeg and whisper

Requirements

ffmpeg
whisper
Python 3.10+ (for Whisper)

Installation

macOS

# Install Homebrew if you don't have it
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Install ffmpeg
brew install ffmpeg

# Install Python (if needed)
brew install python

# Install Whisper
pip3 install --upgrade pip
pip3 install git+https://github.com/openai/whisper.git

Windows

# Install Chocolatey if you don't have it
# Run in PowerShell as administrator:
Set-ExecutionPolicy Bypass -Scope Process -Force
[System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072
iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))

# Install ffmpeg
choco install ffmpeg

# Install Python (from python.org)
# Make sure to check "Add Python to PATH" during installation

# Install Whisper
pip install -U openai-whisper

Linux

# Install ffmpeg
sudo apt update && sudo apt install ffmpeg

# Install Python and pip
sudo apt install python3 python3-pip

# Install Whisper
pip3 install git+https://github.com/openai/whisper.git

Transcription Steps

Extract audio from video using ffmpeg

   ffmpeg -i input_video.mp4 -vn -acodec mp3 output.mp3

Transcribe audio with Whisper

   whisper output.mp3 --language English --model small --output_format txt

Model Options

tiny: Fastest, lowest accuracy (~1GB RAM)
base: Fast, decent accuracy (~1GB RAM)
small: Balanced speed/accuracy (~2GB RAM)
medium: Good accuracy (~5GB RAM)
large: Best accuracy (~10GB RAM)

Output Formats

txt: Plain text transcript
srt: Standard subtitle format
vtt: Web Video Text Tracks format
json: Detailed JSON with timestamps

Additional Options

--task translate: Translates non-English audio to English
--language en: Specifies the source language (faster and more accurate)
--model: Selects the model size (tiny/base/small/medium/large)

Source: macos.gadgethacks.com
Source: dev.to