MagicData

sign in

Dataset Overview

Dataset Type

ASR speech corpus

Language

zh-CN

Speech Style

conversational

Content

spontaneous conversation

Audio Parameters

16 kHz, 16 bits, mono
8 kHz, 16 bits, mono

File Format

WAV (PCM)
TXT (UTF8)

Recording Equipment

mobile & telephony

Recording Environment

indoor environment
Proprietary
ASR Corpus
5873 hours

ASR-BigCCSC: A Chinese Conversational Speech Corpus

MDT-ASR-E037 | MDT-ASR-E043 | MDT-ASR-E056 | MDT-ASR-F002
5873 hours of transcribed Mandarin Chinese conversational speech

This datasets collection consists of 5,873 hours of transcribed Mandarin Chinese spontaneous conversational speech contributed by 10,436 speakers.

Contact business@magicdatatech.com to learn more.

Sample:

Dataset Overview

Dataset Type

ASR speech corpus

Language

zh-CN

Speech Style

conversational

Content

spontaneous conversation

Audio Parameters

16 kHz, 16 bits, mono
8 kHz, 16 bits, mono

File Format

WAV (PCM)
TXT (UTF8)

Recording Equipment

mobile & telephony

Recording Environment

indoor environment

License

{{ reviewsTotal }}{{ options.labels.singularReviewCountLabel }}
{{ reviewsTotal }}{{ options.labels.pluralReviewCountLabel }}
{{ options.labels.newReviewButton }}
{{ userData.canReview.message }}

Verifying Email