MagicData
SIGN IN

Total Size: 3.09 GB

Dataset Overview

Dataset Type

ASR speech corpus

Language

zh-CN, Mandarin Chinese (China)

Speech Style

scripted monologue

Content

commands and queries
in vehicle-related scenes

Audio Parameters

44.1 kHz, 16 bits, dual

File Format

WAV (PCM)
TXT (UTF-8)

Recording Equipment

microphone

Recording Environment

in-vehicle environment

License

Magic Data
open-source license

Open Source
ASR Corpus
6.13 hours

ASR-SCCabSC: A Scripted Chinese Cabin Speech Corpus

6.13 hours of transcribed Mandarin Chinese scripted speech
on commands and queries in vehicle-related scenes

This open-source dataset consists of 6.13 hours of transcribed Mandarin Chinese scripted speech focusing on commands and queries in vehicle-related scenes, where 5,948 utterances contributed by ten speakers were contained.

A noteworthy feature is that two microphones were set up while recording—one at the sun visor, another near the speaker's mouth, on a front passenger seat. Synchronous dual voices, consequently, were recorded.

Sample:

"去珠江发展中心的最快路线"

Dataset Overview

Dataset Type

ASR speech corpus

Language

zh-CN, Mandarin Chinese (China)

Speech Style

scripted monologue

Content

commands and queries
in vehicle-related scenes

Audio Parameters

44.1 kHz, 16 bits, dual

File Format

WAV (PCM)
TXT (UTF-8)

Recording Equipment

microphone

Recording Environment

in-vehicle environment

License

Magic Data
open-source license

{{ reviewsTotal }}{{ options.labels.singularReviewCountLabel }}
{{ reviewsTotal }}{{ options.labels.pluralReviewCountLabel }}
{{ options.labels.newReviewButton }}
{{ userData.canReview.message }}

Verifying Email