Qwen2-Audio
Visit ToolQwen2-Audio is an open-source large audio language model developed by Alibaba Cloud. It accepts various audio signals and performs audio analysis or direct textual responses to speech instructions.
At a glance
Trending
Qwen2-Audio is an open-source large audio language model developed by Alibaba Cloud. It accepts various audio signals and performs audio analysis or direct textual responses to speech instructions.
Trending
About
Qwen2-Audio is an official large audio language model proposed by Alibaba Cloud, designed to accept diverse audio signal inputs and perform audio analysis or generate direct textual responses based on speech instructions. It supports two distinct interaction modes: voice chat, allowing users to engage in free voice interactions without text input, and audio analysis, where users can provide both audio and text instructions for detailed analysis. The project has released two models, Qwen2-Audio-7B and Qwen2-Audio-7B-Instruct, and provides evaluation scripts to reproduce its performance across 13 standard benchmarks including ASR, S2TT, SER, and VSC. It is built on Hugging Face Transformers, making it accessible for developers and researchers.
Capabilities
Pricing & Plans
Open Source
Free
FAQs
Trending