MagicData

sign in

Total Size: 355 MB

Dataset Overview

Dataset Type

ASR speech corpus

Language

yue-Guangdong,
Yue Chinese (Guangdong, China)

Speech Style

indoor environment

Content

daily use sentences

Audio Parameters

16 kHz, 16 bits, mono

File Format

WAV (PCM)
TXT (UTF8)

Recording Equipment

mobile

Recording Environment

indoor environment

License

Magic Data
open-source license

Open Source
ASR Corpus

ASR-SCCantDuSC: A Scripted Chinese Cantonese (Canton) Daily-use Speech Corpus

4.06 hours of transcribed Guangzhou Cantonese scripted speech
on daily use sentences

This open-source dataset consists of 4.06 hours of transcribed Guangzhou Cantonese scripted speech focusing on daily use sentences, where 4,060 utterances contributed by ten speakers were contained.

Sample:

"我请你食饭两个人几好早啲瞓。"

Dataset Overview

Dataset Type

ASR speech corpus

Language

yue-Guangdong,
Yue Chinese (Guangdong, China)

Speech Style

indoor environment

Content

daily use sentences

Audio Parameters

16 kHz, 16 bits, mono

File Format

WAV (PCM)
TXT (UTF8)

Recording Equipment

mobile

Recording Environment

indoor environment

License

Magic Data
open-source license

{{ reviewsTotal }}{{ options.labels.singularReviewCountLabel }}
{{ reviewsTotal }}{{ options.labels.pluralReviewCountLabel }}
{{ options.labels.newReviewButton }}
{{ userData.canReview.message }}

Verifying Email