Mini-Omni

Visit Tool

Mini-Omni is an open-source multimodal large language model that can hear, talk while thinking. It features real-time end-to-end speech input and streaming audio output conversational capabilities.

Claim this tool

1View

At a glance

Pricing

Open Source

Free tier

Yes

API

Yes

Skill level

Technical

About

What is mini-omni?

Mini-Omni is an open-source multimodal large language model designed for real-time, end-to-end speech input and streaming audio output conversational capabilities. It allows the model to "talk while thinking," generating text and audio simultaneously without requiring separate ASR or TTS models. The project provides features like real-time speech-to-speech conversations, streaming audio output, and batch inference options for "Audio-to-Text" and "Audio-to-Audio" tasks. Built on Qwen2 as the LLM backbone, litGPT for training and inference, Whisper for audio encoding, and snac for audio decoding, Mini-Omni is ideal for developers and researchers looking to experiment with and build upon advanced conversational AI models.

Best used for

Ideal for developers and researchers who need to build real-time conversational AI systems, implement speech-to-speech capabilities, and experiment with multimodal large language models. Especially valuable for creating interactive voice assistants and exploring advanced AI communication paradigms.

Common actions

develop conversational AI

implement speech-to-speech

experiment with multimodal LLM

build voice assistants

github copilot"AI Agents"face swappingopen-sourcecollaborationlow-code/no-codeautomated workflowdeepfakeworkflows

Capabilities

Key features

Real-time speech-to-speech
Talking while thinking
Streaming audio output
Audio-to-Text batch inference
Audio-to-Audio batch inference
Qwen2 LLM backbone
Whisper audio encoding

Target Audience

developersresearchersmachine learning engineers

Integrations

Not yet documented

Pricing & Plans

Open Source

Free

FAQs

Does Mini-Omni support languages other than English?

The model is primarily trained on English. While it can understand other languages supported by Whisper (its audio encoder), the output generated by Mini-Omni will only be in English. This means input can be multilingual, but responses are English-only.

Is the tts-adapter supported in the open-source version?

The post_adapter in the code is indeed the tts-adapter. However, the current open-source version of Mini-Omni does not support the tts-adapter functionality. Users should be aware of this limitation when working with the public release.

How can I resolve 'ModuleNotFoundError: No module named 'utils.xxxx''?

If you encounter a ModuleNotFoundError related to 'utils.xxxx', you should run `export PYTHONPATH=./` in your terminal. This command ensures that Python can correctly locate the utility modules within the project directory, resolving the import issue.

Trending

Subcategories trending in Coding & Development

Code Assistants DevOps & Infrastructure No-Code / Low-Code Testing & QA Backend & APIs Prompt Engineering

Trending

Also listed in

This tool also appears in

Content & Design › Audio & Music AI Agents & Automation › Chatbots & Conversational AI AI Agents & Automation › Voice Agents

Explore

Browse AI tools by category

Content & Design Productivity & Business Coding & Development AI Agents & Automation Research & Education Wellness & Lifestyle Career Development Marketing & Growth Data & Analytics Customer Support & CX Finance E-commerce