GigaSpeech

Visit Tool

GigaSpeech is a large, modern dataset for speech recognition, offering over 10,000 hours of transcribed audio. It provides an evolving, multi-domain corpus for training and evaluating ASR models.

Claim this tool

No Views Yet

At a glance

Pricing

Open Source

Free tier

Yes

API

Skill level

Technical

About

What is GigaSpeech?

GigaSpeech is a comprehensive, open-source dataset specifically designed for advancing speech recognition research and development. It features over 10,000 hours of high-quality human-transcribed audio, alongside an additional 33,000+ hours suitable for unsupervised or semi-supervised learning. The dataset encompasses diverse acoustic conditions and domains, including audiobooks, podcasts, and YouTube content, with various ages and accents. It provides pre-processed versions via HuggingFace and includes detailed metadata in a version-controlled JSON file, allowing users to extract relevant information for tasks like speech recognition. GigaSpeech also offers data preparation scripts for popular toolkits like Kaldi, Espnet, and Icefall, making it easier for researchers to integrate and utilize the dataset.

Best used for

Ideal for developers and data scientists who need to train and evaluate Automatic Speech Recognition (ASR) models, benchmark speech algorithms, and develop new speech technologies. Especially valuable for researchers requiring a large, diverse, and evolving dataset with high-quality human transcriptions.

Common actions

train speech models

evaluate speech algorithms

access audio datasets

prepare speech data

face swappinggithub copilot"AI Agents"automated workflowlow-code/no-codedeepfakecollaborationopen-sourceworkflows

Capabilities

Key features

10,000+ hours transcribed audio
Multi-domain audio sources
Pre-processed data HuggingFace
Detailed metadata JSON
Toolkit preparation scripts
Evolving dataset

Target Audience

developerdata scientist

Integrations

Not yet documented

Pricing & Plans

Open Source

Free

FAQs

How can I access the GigaSpeech dataset?

You can access the GigaSpeech dataset by first filling out a Google Form. After that, you can either follow instructions in a replied email for the raw release or refer to GigaSpeech On HuggingFace for a pre-processed version.

What kind of audio sources are included in GigaSpeech?

GigaSpeech includes audio from diverse sources such as audiobooks, podcasts, and YouTube. These sources provide a variety of acoustic conditions, including clean, noisy, indoor, outdoor, near-field, and far-field recordings, with various ages and accents.

Does GigaSpeech support different speech recognition toolkits?

Yes, GigaSpeech maintains data preparation scripts for various speech recognition toolkits within its repository. Examples include scripts for Kaldi, Espnet, and Icefall, making it easier to integrate the dataset into different research pipelines.

Trending

Subcategories trending in Data & Analytics

Business Intelligence Predictive Analytics Real-Time Analytics Market Research Data Cleaning & Prep Data Pipelines & Integration

Trending

Explore

Browse AI tools by category

Content & Design Productivity & Business Coding & Development AI Agents & Automation Research & Education Wellness & Lifestyle Career Development Marketing & Growth Data & Analytics Customer Support & CX Finance E-commerce