About
What is AssemblyAI?
AssemblyAI provides industry-leading Speech AI models for transcribing speech to text and extracting insights from voice data. The platform offers various products including Speech-to-Text, Streaming Speech-to-Text, Speech Understanding, LLM Gateway, Guardrails, and Speech-to-Speech. It supports use cases like conversation intelligence, medical transcription, contact centers, voice agents, and AI notetakers. AssemblyAI emphasizes high accuracy, low latency, and scalability, processing over 40 terabytes of audio daily. Key features include prompting, disfluency control, code-switching, real-time diarization, and support for over 99 languages, making it suitable for building advanced voice AI applications.
Best used for
Ideal for developers and businesses who need to accurately transcribe audio, extract insights from voice data, and build advanced voice AI applications. Especially valuable for creating conversation intelligence platforms, medical transcription services, and real-time voice agents.
Common actions
conformer-2AI Modellatencynoise resistanceword error rateautomatic speech recognitionperformance improvementproper nounsalphanumericsenglish audio+ 1 more
Capabilities
Key features
- Speech-to-Text transcription
- Streaming Speech-to-Text
- Speech Understanding models
- LLM Gateway
- Guardrails
- Real-time diarization
- Multilingual support
Target Audience
developersdata scientistsproduct managersai engineers
Integrations
Not yet documentedPricing & Plans
Freemium ยท Usage-based ยท Enterprise
FAQs
What is the pricing structure for AssemblyAI's Speech-to-Text API?
AssemblyAI offers a pay-as-you-go model for its Speech-to-Text API. After a free tier, the Universal-3 Pro model costs $0.21/hr and Universal-2 costs $0.15/hr. Add-on features like Keyterms Prompting and Medical Mode have additional hourly costs.
Does AssemblyAI support real-time transcription for live audio?
Yes, AssemblyAI provides a Streaming Speech-to-Text API designed for transcribing live audio and video files in real-time. It offers ultra-low latency, high accuracy, and features like auto punctuation, casing, and next-gen end-of-turn detection.
Can AssemblyAI optimize transcription for medical terminology?
Yes, AssemblyAI offers a 'Medical Mode' add-on feature. This mode is specifically designed to optimize transcription for medical terminology and healthcare conversations, significantly improving accuracy in these specialized contexts for both Universal-3 Pro and Universal-2 models.
What languages does AssemblyAI's Universal-3 Pro model support?
The Universal-3 Pro model currently supports English, Spanish, German, French, Italian, and Portuguese. AssemblyAI states that more languages are coming soon to enhance its multilingual capabilities.
Does AssemblyAI offer speaker diarization?
Yes, speaker diarization is an add-on feature available for both Universal-3 Pro and Universal-2 models. It detects multiple speakers in audio files and segments the transcript into utterances, indicating what each speaker said.