Total Size: 59 GB

Sign In to Download.

Dataset Overview

Dataset Type

ASR speech corpus

Language

zh-CN, Mandarin Chinese (China)

Speech Style

scripted monologue

Content

daily use sentences,
commands and queries,
SMS

Audio Parameters

16 kHz, 16 bits, mono

File Format

WAV (PCM)
TXT (UTF-8)

Recording Equipment

mobile (mostly)

Recording Environment

indoor environment
Open Source
ASR Corpus
755 hours

Mandarin Chinese Scripted Speech Corpus - Daily Use Sentence / Command and Query / SMS

755 hours of transcribed Mandarin Chinese scripted speech

This open-source dataset consists of 755 hours of transcribed Mandarin Chinese scripted speech contributed by 1,080 speakers.

Sample:

"提醒他明天早上差五分九点聚会"

This dataset is released on OpenSLR. Visit http://openslr.org/68/ to download.

Dataset Overview

Dataset Type

ASR speech corpus

Language

zh-CN, Mandarin Chinese (China)

Speech Style

scripted monologue

Content

daily use sentences,
commands and queries,
SMS

Audio Parameters

16 kHz, 16 bits, mono

File Format

WAV (PCM)
TXT (UTF-8)

Recording Equipment

mobile (mostly)

Recording Environment

indoor environment
The dataset is provided on an "As Is" basis, and no warranty, either expressed or implied, is given. Your use of the dataset is at your sole risk. You expressly understand and agree that MagicHub and/or Beijing Magic Data Technology Co., Ltd. shall not be liable for any direct, indirect, incidental, special or consequential damages; including but not limited to, damages for loss of profits, goodwill, use, data or other intangible losses related to the datasets.

Copyright © 2021 Beijing Magic Data Technology Co., Ltd. All rights reserved.

Similar datasets are available! Please feel free to CONTACT US if you have any questions or data requirements.
Sign In to Download.
{{ reviewsTotal }}{{ options.labels.singularReviewCountLabel }}
{{ reviewsTotal }}{{ options.labels.pluralReviewCountLabel }}
{{ options.labels.newReviewButton }}
{{ userData.canReview.message }}