How to create video transcription with ffmpeg and whisper
Requirements ffmpeg whisper Python 3.10+ (for Whisper) Installation macOS # Install Homebrew if you don't have it /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" # Install ffmpeg brew install ffmpeg # Install Python (if needed) brew install python # Install Whisper pip3 install --upgrade pip pip3 install git+https://github.com/openai/whisper.git Windows # Install Chocolatey if you don't have it # Run in PowerShell as administrator: Set-ExecutionPolicy Bypass -Scope Process -Force [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072 iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1')) # Install ffmpeg choco install ffmpeg # Install Python (from python.org) # Make sure to check "Add Python to PATH" during installation # Install Whisper pip install -U openai-whisper Linux # Install ffmpeg sudo apt update && sudo apt install ffmpeg # Install Python and pip sudo apt install python3 python3-pip # Install Whisper pip3 install git+https://github.com/openai/whisper.git Transcription Steps Extract audio from video using ffmpeg ffmpeg -i input_video.mp4 -vn -acodec mp3 output.mp3 Transcribe audio with Whisper whisper output.mp3 --language English --model small --output_format txt Model Options tiny: Fastest, lowest accuracy (~1GB RAM) base: Fast, decent accuracy (~1GB RAM) small: Balanced speed/accuracy (~2GB RAM) medium: Good accuracy (~5GB RAM) large: Best accuracy (~10GB RAM) Output Formats txt: Plain text transcript srt: Standard subtitle format vtt: Web Video Text Tracks format json: Detailed JSON with timestamps Additional Options --task translate: Translates non-English audio to English --language en: Specifies the source language (faster and more accurate) --model: Selects the model size (tiny/base/small/medium/large) Source: macos.gadgethacks.com Source: dev.to

Requirements
- ffmpeg
- whisper
- Python 3.10+ (for Whisper)
Installation
macOS
# Install Homebrew if you don't have it
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Install ffmpeg
brew install ffmpeg
# Install Python (if needed)
brew install python
# Install Whisper
pip3 install --upgrade pip
pip3 install git+https://github.com/openai/whisper.git
Windows
# Install Chocolatey if you don't have it
# Run in PowerShell as administrator:
Set-ExecutionPolicy Bypass -Scope Process -Force
[System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072
iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))
# Install ffmpeg
choco install ffmpeg
# Install Python (from python.org)
# Make sure to check "Add Python to PATH" during installation
# Install Whisper
pip install -U openai-whisper
Linux
# Install ffmpeg
sudo apt update && sudo apt install ffmpeg
# Install Python and pip
sudo apt install python3 python3-pip
# Install Whisper
pip3 install git+https://github.com/openai/whisper.git
Transcription Steps
- Extract audio from video using ffmpeg
ffmpeg -i input_video.mp4 -vn -acodec mp3 output.mp3
- Transcribe audio with Whisper
whisper output.mp3 --language English --model small --output_format txt
Model Options
-
tiny
: Fastest, lowest accuracy (~1GB RAM) -
base
: Fast, decent accuracy (~1GB RAM) -
small
: Balanced speed/accuracy (~2GB RAM) -
medium
: Good accuracy (~5GB RAM) -
large
: Best accuracy (~10GB RAM)
Output Formats
-
txt
: Plain text transcript -
srt
: Standard subtitle format -
vtt
: Web Video Text Tracks format -
json
: Detailed JSON with timestamps
Additional Options
-
--task translate
: Translates non-English audio to English -
--language en
: Specifies the source language (faster and more accurate) -
--model
: Selects the model size (tiny/base/small/medium/large)