MagicData
SIGN IN

Dataset Overview

Dataset Type

ASR speech corpus

Language

zh-CN, Mandarin Chinese (China)

Speech Style

scripted monologue

Content

scripted speech of keyword spotting

Audio Parameters

48 kHz, 16 bits, mono

File Format

WAV (PCM)
TXT (UTF-8)

Recording Equipment

microphone

Recording Environment

indoor environment

License

Magic Data
open-source license

Open Source
ASR Corpus
6 hours

ASR-SCKwsptSC: A Scripted Chinese Keyword-spotting Speech Corpus

This open-source dataset consists of 6 hours of transcribed Mandarin Chinese scripted speech of keyword spotting in fast, normal, and slow speed, where 11,030 utterances contributed by 37 speakers were contained.

This open-source dataset consists of 6 hours of transcribed Mandarin Chinese scripted speech of keyword spotting in fast, normal, and slow speed, where 11,030utterances contributed by 37 speakers were contained.

Sample:

小艺小艺

Dataset Overview

Dataset Type

ASR speech corpus

Language

zh-CN, Mandarin Chinese (China)

Speech Style

scripted monologue

Content

scripted speech of keyword spotting

Audio Parameters

48 kHz, 16 bits, mono

File Format

WAV (PCM)
TXT (UTF-8)

Recording Equipment

microphone

Recording Environment

indoor environment

License

Magic Data
open-source license

{{ reviewsTotal }}{{ options.labels.singularReviewCountLabel }}
{{ reviewsTotal }}{{ options.labels.pluralReviewCountLabel }}
{{ options.labels.newReviewButton }}
{{ userData.canReview.message }}

Verifying Email