Melgan-Neurips
Visit ToolMelGAN-NeurIPS is an open-source GAN-based Mel-Spectrogram Inversion Network for Text-to-Speech Synthesis. It generates high-quality, coherent audio waveforms significantly faster than real-time.
At a glance
Trending
MelGAN-NeurIPS is an open-source GAN-based Mel-Spectrogram Inversion Network for Text-to-Speech Synthesis. It generates high-quality, coherent audio waveforms significantly faster than real-time.
Trending
About
MelGAN-NeurIPS is an open-source project that provides a GAN-based Mel-Spectrogram Inversion Network designed for Text-to-Speech Synthesis. This tool addresses the challenge of generating coherent raw audio waveforms with Generative Adversarial Networks by introducing architectural changes and simple training techniques. It has been shown to reliably produce high-quality audio, as evidenced by subjective evaluation metrics like Mean Opinion Score (MOS) for mel-spectrogram inversion. The model is non-autoregressive, fully convolutional, and boasts significantly fewer parameters than competing models. A key differentiator is its speed, running over 100x faster than real-time on a GTX 1080Ti GPU and more than 2x faster than real-time on a CPU, without specific hardware optimizations. It also generalizes well to unseen speakers.
Capabilities
Pricing & Plans
Open Source
Free
FAQs
Trending