Sopro
Visit ToolSopro is a lightweight text-to-speech model that offers zero-shot voice cloning. It is designed for rapid audio generation, achieving 0.05 RTF on CPU.
At a glance
Trending
Sopro is a lightweight text-to-speech model that offers zero-shot voice cloning. It is designed for rapid audio generation, achieving 0.05 RTF on CPU.
Trending
About
Sopro is a lightweight English text-to-speech model developed as a side project, focusing on efficiency and speed. It utilizes dilated convolutions and lightweight cross-attention layers, diverging from the common Transformer architecture. Key features include 135 million parameters, streaming capabilities, and zero-shot voice cloning. The model boasts an impressive 0.05 Real-Time Factor (RTF) on CPU, meaning it can generate 32 seconds of audio in just 1.77 seconds on an M3 base model. It requires only 3-12 seconds of reference audio for effective voice cloning. Sopro is ideal for developers and researchers looking for a cost-effective and fast TTS solution, trained for just $100 on a single GPU.
Capabilities
Pricing & Plans
Open Source
Free
FAQs
Trending