AI Video Transcription with OpenAI Whisper

View on GitHub

A straightforward Python application that converts video content into text using OpenAI Whisper. Simply place video files in a designated folder, execute the script, and receive transcribed text output.

Key Features

  • Video to text conversion - Accurate speech-to-text transcription
  • Multiple audio formats - Supports mp3, wav, m4a and more
  • Translation support - Translates non-English speech to English
  • Subtitle generation - Creates .srt and .vtt subtitle files
  • Multilingual - Handles content in various languages

Use Cases

  • Transcribing voice recordings
  • Converting lectures to text
  • Creating subtitles for videos
  • Translating foreign language content
  • Accessibility improvements

Project Structure

├── app.py          # Main transcription script
├── videos/         # Input folder for video files
├── transcripts/    # Output folder for transcriptions
└── requirements.txt

Quick Start

# Create virtual environment
python -m venv venv
source venv/bin/activate

# Install dependencies
pip install openai-whisper

# Install FFmpeg (macOS)
brew install ffmpeg

# Place videos in the videos folder, then run
python app.py

Technologies

  • Python 3
  • OpenAI Whisper (speech-to-text engine)
  • FFmpeg (audio extraction)
  • Virtual environment for dependency management

This tool provides an easy way to leverage state-of-the-art speech recognition for video content processing.