Zyphra Zonos
Visit ToolZyphra Zonos is a text-to-speech (TTS) tool that offers expressive and real-time voice cloning. It features both transformer and hybrid models for high-fidelity audio generation.
At a glance
Trending
Zyphra Zonos is a text-to-speech (TTS) tool that offers expressive and real-time voice cloning. It features both transformer and hybrid models for high-fidelity audio generation.
Trending
About
Zyphra Zonos is a cutting-edge text-to-speech (TTS) platform that provides expressive and real-time voice cloning capabilities. It features two 1.6B models: a transformer and an SSM hybrid, with the latter being the first open-source SSM model for TTS. These models are trained on approximately 200,000 hours of speech data, primarily English but also including Chinese, Japanese, French, Spanish, and German. Zonos allows for highly expressive and natural speech generation from text prompts, speaker embeddings, or audio prefixes. It also supports high-fidelity voice cloning from short audio clips (5-30 seconds) and can be conditioned based on speaking rate, pitch, audio quality, and emotions like sadness, fear, anger, happiness, and surprise, outputting speech natively at 44KHz. The models are released under an Apache 2.0 license, with weights available on Huggingface and inference code on Github. Users can access Zonos via a model playground and API.
Capabilities
Pricing & Plans
Freemium ยท Paid ยท Usage-based ยท Enterprise ยท Open Source
Not Disclosed
FAQs
Trending