Mini-Omni
Visit ToolMini-Omni is an open-source multimodal large language model that can hear, talk while thinking. It features real-time end-to-end speech input and streaming audio output conversational capabilities.
At a glance
Trending
Mini-Omni is an open-source multimodal large language model that can hear, talk while thinking. It features real-time end-to-end speech input and streaming audio output conversational capabilities.
Trending
About
Mini-Omni is an open-source multimodal large language model designed for real-time, end-to-end speech input and streaming audio output conversational capabilities. It allows the model to "talk while thinking," generating text and audio simultaneously without requiring separate ASR or TTS models. The project provides features like real-time speech-to-speech conversations, streaming audio output, and batch inference options for "Audio-to-Text" and "Audio-to-Audio" tasks. Built on Qwen2 as the LLM backbone, litGPT for training and inference, Whisper for audio encoding, and snac for audio decoding, Mini-Omni is ideal for developers and researchers looking to experiment with and build upon advanced conversational AI models.
Capabilities
Pricing & Plans
Open Source
Free
FAQs
Trending
Also listed in