Fun-ASR

Visit Tool

Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab. It offers low-latency real-time transcription across 31 languages and excels in recognizing professional terminology.

Claim this tool

1View

At a glance

Pricing

Open Source

Free tier

Yes

API

Skill level

Technical

About

What is Fun-ASR?

Fun-ASR is an end-to-end speech recognition large model developed by Tongyi Lab, trained on tens of millions of hours of real speech data. It provides powerful contextual understanding and industry adaptability, supporting low-latency real-time transcription across 31 languages. The model is particularly adept at recognizing professional terminology and industry-specific expressions in vertical domains like education and finance, effectively addressing challenges such as "hallucination" generation and language confusion. Fun-ASR also features robust performance in far-field and high-noise environments, supports various Chinese dialects and regional accents, and offers enhanced lyric recognition under music interference. It is a fundamental speech recognition toolkit that includes ASR, VAD, Punctuation Restoration, Language Models, Speaker Verification, Speaker Diarization, and multi-talker ASR.

Best used for

Ideal for content creators who need to accurately transcribe speech from various audio sources, including those with background music or noise, and across multiple languages. Especially valuable for professionals working with specialized terminology in fields like education and finance, ensuring high-precision recognition.

Common actions

transcribe audio

recognize speech

process audio

analyze audio

github copilot"AI Agents"face swappingopen-sourceautomated workflowworkflowsdeepfakelow-code/no-codecollaboration

Capabilities

Key features

End-to-end speech recognition
Low-latency real-time transcription
31-language support
Chinese dialect/accent recognition
Far-field high-noise recognition
Music background lyric recognition
Speaker diarization

Target Audience

content creator

Integrations

Not yet documented

Pricing & Plans

Open Source

Free

FAQs

What languages does Fun-ASR support?

Fun-ASR supports 31 languages, including Chinese (with 7 dialects and 26 regional accents), English, Japanese, Korean, Vietnamese, Indonesian, Thai, Malay, Filipino, Arabic, Hindi, and many European languages. It also offers free language switching and mixed recognition capabilities.

Can Fun-ASR handle speech in noisy environments or with music?

Yes, Fun-ASR is deeply optimized for far-distance sound pickup and high-noise scenarios like conference rooms or industrial sites, achieving up to 93% accuracy. It also provides enhanced speech recognition performance for lyric content in songs with music background interference.

What are the core features of the Fun-ASR toolkit?

Beyond high-precision speech recognition, Fun-ASR is a comprehensive toolkit offering Voice Activity Detection (VAD), Punctuation Restoration, Language Models, Speaker Verification, Speaker Diarization, and multi-talker ASR. It focuses on multi-language support and industry customization.

Trending

Subcategories trending in Content & Design

Image Generation AI Writing Assistants Video Generation Photo Editing Graphic Design Video Editing

Trending

Explore

Browse AI tools by category

Content & Design Productivity & Business Coding & Development AI Agents & Automation Research & Education Wellness & Lifestyle Career Development Marketing & Growth Data & Analytics Customer Support & CX Finance E-commerce