MagicData
SIGN IN

Total Size: 1.13 GB

Dataset Overview

Dataset Type

ASR speech corpus

Language

yue-Guangdong, Yue Chinese (Guangdong, China)

Speech Style

scripted monologue

Content

digits, commands and queries

Audio Parameters

16 kHz, 16 bits, dual

File Format

WAV (PCM)
TXT (UTF-8)

Recording Equipment

microphone

Recording Environment

in the vehicle

License

Magic Data
open-source license

Open Source
ASR Corpus
5 hours

ASR-SCCantCabSC: A Scripted Chinese Cantonese (Canton) Cabin Speech Corpus

5 hours of transcribed Guangzhou Cantonese scripted speech in the vehicle

This open-source dataset consists of 5 hours of transcribed Guangzhou Cantonese scripted speech in the vehicle focusing on digits, commands and queries, where 6,219 utterances contributed by ten speakers were contained.

Sample:

" 世纪大道塞唔塞车啊 "

Dataset Overview

Dataset Type

ASR speech corpus

Language

yue-Guangdong, Yue Chinese (Guangdong, China)

Speech Style

scripted monologue

Content

digits, commands and queries

Audio Parameters

16 kHz, 16 bits, dual

File Format

WAV (PCM)
TXT (UTF-8)

Recording Equipment

microphone

Recording Environment

in the vehicle

License

Magic Data
open-source license

{{ reviewsTotal }}{{ options.labels.singularReviewCountLabel }}
{{ reviewsTotal }}{{ options.labels.pluralReviewCountLabel }}
{{ options.labels.newReviewButton }}
{{ userData.canReview.message }}

Verifying Email