MagicData
SIGN IN

Total Size: 59 GB

Dataset Overview

Dataset Type

ASR speech corpus

Language

zh-CN, Mandarin Chinese (China)

Speech Style

scripted monologue

Content

daily use sentences,
commands and queries,
SMS

Audio Parameters

16 kHz, 16 bits, mono

File Format

WAV (PCM)
TXT (UTF-8)

Recording Equipment

mobile (mostly)

Recording Environment

indoor environment

License

Magic Data
open-source license

Open Source
ASR Corpus
755 hours

ASR-BigSCDuComSmsSC: A Scripted Chinese Daily-use, Commands & SMS Speech Corpus

755 hours of transcribed Mandarin Chinese scripted speech

This open-source dataset consists of 755 hours of transcribed Mandarin Chinese scripted speech contributed by 1,080 speakers.

Sample:

"提醒他明天早上差五分九点聚会"

This dataset is released on OpenSLR. Visit http://openslr.org/68/ to download.

Dataset Overview

Dataset Type

ASR speech corpus

Language

zh-CN, Mandarin Chinese (China)

Speech Style

scripted monologue

Content

daily use sentences,
commands and queries,
SMS

Audio Parameters

16 kHz, 16 bits, mono

File Format

WAV (PCM)
TXT (UTF-8)

Recording Equipment

mobile (mostly)

Recording Environment

indoor environment

License

Magic Data
open-source license

{{ reviewsTotal }}{{ options.labels.singularReviewCountLabel }}
{{ reviewsTotal }}{{ options.labels.pluralReviewCountLabel }}
{{ options.labels.newReviewButton }}
{{ userData.canReview.message }}

Verifying Email