TheWhisper provides optimized Whisper models for high-performance speech-to-text conversion, focusing on streaming and on-device use. It offers low-latency transcription for real-time applications on NVIDIA GPUs and Apple Silicon.
TheWhisper is an open-source project dedicated to developing highly efficient speech-to-text and text-to-speech inference solutions, with a strong emphasis on self-hosting, cloud hosting, and on-device inference across various platforms. It provides optimized Whisper models with streaming inference support, offering flexible chunk sizes (10s, 15s, 20s, 30s) unlike the original 30s fixed size. The tool features high-performance inference engines for NVIDIA GPUs and CoreML engines for macOS/Apple Silicon, known for their low power consumption. It's ideal for real-time captioning, live meetings, voice interfaces, and edge deployments, and includes a local RestAPI with frontend examples and a demo Electron app for macOS.
Best used for
Ideal for developers and organizations who need to implement real-time speech-to-text transcription, develop low-latency voice interfaces, and deploy efficient AI models on various devices. Especially valuable for applications requiring high performance and low power consumption on NVIDIA GPUs or Apple Silicon.
What are the key optimizations in TheWhisper compared to original Whisper models?
TheWhisper offers optimized Whisper models with flexible chunk sizes (10s, 15s, 20s, 30s) for streaming inference, unlike the original 30s fixed size. It also provides high-performance inference engines for NVIDIA GPUs and highly efficient CoreML engines for Apple Silicon, focusing on low latency and power consumption.
What are the system requirements for using TheWhisper with NVIDIA GPUs?
For NVIDIA GPUs, TheWhisper requires supported GPUs like RTX 4090/5090, L40s, H100, A100, or Jetson-Thor. The operating system should be Ubuntu 20.04+, with at least 2.5 GB RAM (5 GB recommended), CUDA 11.8+, driver 520.0+, and Python 3.10-3.12.
Can I use TheWhisper for commercial applications?
Yes, TheWhisper offers a free license for small organizations using up to 4 GPUs per year with TheStage AI optimized engines. For larger commercial deployments or more GPUs, an enterprise license is required, which can be obtained by contacting TheStage AI directly.