Building vid2text: A Privacy-First CLI Tool for Video Transcription

#video#transcription#cli#privacy#ai#whisper#python#open-source
vid2text cli tool

I had hours of podcast episodes and lecture recordings that I needed to search through, but no time to listen to everything. I needed to find specific quotes, concepts, and information buried in video content. Cloud transcription services were expensive and required uploading private content to external servers.

So I built a local solution that turns any video into searchable text while keeping everything on my machine.

What is vid2text?

vid2text is a CLI tool that extracts transcriptions from videos and stores them in a searchable database. It processes YouTube videos, local files, and M3U8 streams using local AI models, ensuring complete privacy and zero ongoing costs.

Simple installation:

pip install vid2text

Basic usage:

# Process individual videos
vid2text youtube "https://youtu.be/VIDEO_ID"
vid2text local "/path/to/video.mp4"
vid2text m3u8 "https://example.com/stream.m3u8"

# View your searchable database
vid2text view  # Launches web interface

Key Features

Privacy-First: Everything runs locally using Whisper AI models. No data leaves your machine.

Cost-Effective: One-time setup cost versus cloud services at ~$0.006 per minute. Processing 100 hours locally costs $0 vs $36/month for cloud services.

Smart Processing: Automatically uses existing YouTube transcripts when available, falling back to local AI transcription only when needed.

Batch Processing: Handle multiple videos with YAML configuration:

videos:
  youtube:
    - url: "https://youtu.be/VIDEO1"
    - url: "https://youtu.be/VIDEO2"
  local:
    - path: "/path/to/meetings/"

Powerful Search: Uses Datasette to provide a web interface with full-text search, filtering, SQL queries, and export capabilities.

Key Technical Decisions

Local AI Processing: Uses OpenAI Whisper and MLX Whisper instead of cloud APIs. MLX Whisper provides 3x faster transcription on Apple Silicon.

SQLite + Datasette: SQLite for storage (lightweight, portable) paired with Datasette for the web interface. Provides powerful search without complex infrastructure.

Smart YouTube Handling: Always attempts to fetch existing YouTube transcripts first before downloading audio, saving bandwidth and processing time.

Modern Python Packaging: Uses pyproject.toml and GitHub Actions with trusted publishing for streamlined development.

Real-World Use Cases

Students: Process course videos without cloud uploads

vid2text local "/Downloads/lecture-recordings/"
# Search: "final exam topics" - all stays local

Professionals: Make confidential meetings searchable

vid2text process team-meetings.yaml
# Find decisions and action items across meetings privately

Content Creators: Build searchable archives of your own content

vid2text youtube "https://youtu.be/my-video"
# Research your own content for new ideas

Current Limitations & Possible Improvements

Current Limitations:

  • No YouTube playlist support - process videos individually
  • Video files only - no direct audio file support yet
  • Requires Datasette for web interface

Possible Future Improvements:

  • YouTube playlist support - process entire channels or playlists at once
  • Direct audio file support - handle MP3, WAV, and other audio formats
  • Built-in search capabilities - search transcripts without launching Datasette
  • Timestamp extraction - link transcript text to specific video timestamps
  • Better content discovery - find similar videos or extract main themes

The Bottom Line

After months of use, vid2text has transformed how I interact with video content. Instead of losing information in hours of recordings, I now have a private, searchable knowledge base that costs nothing to maintain and keeps my data completely under my control.

The tool essentially creates your own personal search engine for video content - completely private, completely local, completely under your control.

Open Source & Community

vid2text is open source at github.com/kashw1n/vid2text.

Want to contribute? Areas where help would be valuable:

  • YouTube playlist support
  • Audio file processing
  • Built-in search capabilities
  • Timestamp extraction
  • Documentation and examples

Try It Yourself

pip install vid2text

# Start with a simple video
vid2text youtube "https://youtu.be/your-favorite-tutorial"
vid2text view  # Launch the web interface

# Process a folder of videos
vid2text local "/path/to/videos/"
vid2text view  # Search across everything

Sometimes the best tools come from personal frustrations. I just wanted to find quotes in podcasts without sending my data to the cloud. The result: a private, searchable video knowledge base that costs nothing to run.