MagicData
SIGN IN

Total Size: 282M

Dataset Overview

Dataset Type

ASR Corpus

Language

English

Speech Style

N/A

Content

N/A

Audio Parameters

16 kHz, 16 bits

File Format

WAV (PCM)

Recording Equipment

mobile

Recording Environment

mobile
Open Source
ASR Corpus
5 hours

Multi-stream Spontaneous Conversation Training Datasets_English

The Multi-stream conversation dataset developed by MagicData captures each speaker's audio track and labels each speaker separately, thereby preserving the natural occurrences of interruptions, interactions, and other dynamics in conversation. By isolating each speaker's audio, we can provide clearer and more accurate training data, enabling models to more effectively understand and respond to natural conversational exchanges. To facilitate broader understanding and accessibility, we have released a 5-hour sample as part of our open-source initiative: "Multi-stream Spontaneous Conversation Training Datasets_English".

For more commercial datasets, please contact business@magicdatatech.com.

Dataset Overview

Dataset Type

ASR Corpus

Language

English

Speech Style

N/A

Content

N/A

Audio Parameters

16 kHz, 16 bits

File Format

WAV (PCM)

Recording Equipment

mobile

Recording Environment

mobile
{{ reviewsTotal }}{{ options.labels.singularReviewCountLabel }}
{{ reviewsTotal }}{{ options.labels.pluralReviewCountLabel }}
{{ options.labels.newReviewButton }}
{{ userData.canReview.message }}

Verifying Email