MagicData
SIGN IN

Dataset Overview

Dataset Type

ASR speech corpus

Language

zh-CN

Speech Style

Scripted

Content

Command and Query, Keyword Spotting, SMS

Audio Parameters

16 kHz, 16 bits, mono
44.1 kHz, 16 bits, mono
44.1 kHz, 16 bits, dual
48 kHz, 16 bits, mono

File Format

WAV (PCM)
TXT (UTF8)

Recording Equipment

microphone & mobile

Recording Environment

indoor, in-vehicle, far/near-field
Proprietary
ASR Corpus
5866 hours

ASR-BigSCKwsptComSmsSC: A Scripted Chinese Keyword-spotting, Command & SMS Speech Corpus

MDT-ASR-A003 | MDT-ASR-A010 | MDT-ASR-A011 | MDT-ASR-B001 | MDT-ASR-B002 | MDT-ASR-B016 | MDT-ASR-C001 | MDT-ASR-C009 | MDT-ASR-D024 | MDT-ASR-F055 | MDT-ASR-F063
5,866 hours of transcribed Mandarin Chinese scripted speech on command and query, keyword spotting, and SMS

This is a dataset collection consists of 5,866 hours of transcribed Mandarin Chinese scripted speech focusing on command and query, keyword spotting, and SMS contributed by 18,954 speakers.

Contact business@magicdatatech.com to learn more.

Sample:

Dataset Overview

Dataset Type

ASR speech corpus

Language

zh-CN

Speech Style

Scripted

Content

Command and Query, Keyword Spotting, SMS

Audio Parameters

16 kHz, 16 bits, mono
44.1 kHz, 16 bits, mono
44.1 kHz, 16 bits, dual
48 kHz, 16 bits, mono

File Format

WAV (PCM)
TXT (UTF8)

Recording Equipment

microphone & mobile

Recording Environment

indoor, in-vehicle, far/near-field

License

{{ reviewsTotal }}{{ options.labels.singularReviewCountLabel }}
{{ reviewsTotal }}{{ options.labels.pluralReviewCountLabel }}
{{ options.labels.newReviewButton }}
{{ userData.canReview.message }}

Verifying Email