Building vid2text: A Privacy-First CLI Tool for Video Transcription

I had hours of podcast episodes and lecture recordings that I needed to search through, but no time to listen to everything. I needed to find specific quotes, concepts, and information buried in video content. Cloud transcription services were expensive and required uploading private content to external servers.
So I built a local solution that turns any video into searchable text while keeping everything on my machine.
What is vid2text?
vid2text
is a CLI tool that extracts transcriptions from videos and stores them in a searchable database. It processes YouTube videos, local files, and M3U8 streams using local AI models, ensuring complete privacy and zero ongoing costs.
Simple installation:
pip install vid2text
Basic usage:
# Process individual videos
vid2text youtube "https://youtu.be/VIDEO_ID"
vid2text local "/path/to/video.mp4"
vid2text m3u8 "https://example.com/stream.m3u8"
# View your searchable database
vid2text view # Launches web interface
Key Features
Privacy-First: Everything runs locally using Whisper AI models. No data leaves your machine.
Cost-Effective: One-time setup cost versus cloud services at ~$0.006 per minute. Processing 100 hours locally costs $0 vs $36/month for cloud services.
Smart Processing: Automatically uses existing YouTube transcripts when available, falling back to local AI transcription only when needed.
Batch Processing: Handle multiple videos with YAML configuration:
videos:
youtube:
- url: "https://youtu.be/VIDEO1"
- url: "https://youtu.be/VIDEO2"
local:
- path: "/path/to/meetings/"
Powerful Search: Uses Datasette to provide a web interface with full-text search, filtering, SQL queries, and export capabilities.
Key Technical Decisions
Local AI Processing: Uses OpenAI Whisper and MLX Whisper instead of cloud APIs. MLX Whisper provides 3x faster transcription on Apple Silicon.
SQLite + Datasette: SQLite for storage (lightweight, portable) paired with Datasette for the web interface. Provides powerful search without complex infrastructure.
Smart YouTube Handling: Always attempts to fetch existing YouTube transcripts first before downloading audio, saving bandwidth and processing time.
Modern Python Packaging: Uses pyproject.toml and GitHub Actions with trusted publishing for streamlined development.
Real-World Use Cases
Students: Process course videos without cloud uploads
vid2text local "/Downloads/lecture-recordings/"
# Search: "final exam topics" - all stays local
Professionals: Make confidential meetings searchable
vid2text process team-meetings.yaml
# Find decisions and action items across meetings privately
Content Creators: Build searchable archives of your own content
vid2text youtube "https://youtu.be/my-video"
# Research your own content for new ideas
Current Limitations & Possible Improvements
Current Limitations:
- No YouTube playlist support - process videos individually
- Video files only - no direct audio file support yet
- Requires Datasette for web interface
Possible Future Improvements:
- YouTube playlist support - process entire channels or playlists at once
- Direct audio file support - handle MP3, WAV, and other audio formats
- Built-in search capabilities - search transcripts without launching Datasette
- Timestamp extraction - link transcript text to specific video timestamps
- Better content discovery - find similar videos or extract main themes
The Bottom Line
After months of use, vid2text has transformed how I interact with video content. Instead of losing information in hours of recordings, I now have a private, searchable knowledge base that costs nothing to maintain and keeps my data completely under my control.
The tool essentially creates your own personal search engine for video content - completely private, completely local, completely under your control.
Open Source & Community
vid2text is open source at github.com/kashw1n/vid2text.
Want to contribute? Areas where help would be valuable:
- YouTube playlist support
- Audio file processing
- Built-in search capabilities
- Timestamp extraction
- Documentation and examples
Try It Yourself
pip install vid2text
# Start with a simple video
vid2text youtube "https://youtu.be/your-favorite-tutorial"
vid2text view # Launch the web interface
# Process a folder of videos
vid2text local "/path/to/videos/"
vid2text view # Search across everything
Sometimes the best tools come from personal frustrations. I just wanted to find quotes in podcasts without sending my data to the cloud. The result: a private, searchable video knowledge base that costs nothing to run.