MagicData
SIGN IN

Total Size: 158.12 MB

Dataset Overview

Dataset Type

ASR speech corpus

Language

zh-CN,
Wu-accented Mandarin (Wu areas, China)

Speech Style

spontaneous conversation

Content

themed conversations

Audio Parameters

16 kHz, 16 bits, mono

File Format

WAV (PCM)
TXT (UTF-8)

Recording Equipment

mobile

Recording Environment

indoor environment

License

Magic Data
open-source license

Open Source
ASR Corpus
3 hours

ASR-CWAcstCSC: A Chinese Wu-Accent Conversational Speech Corpus

3 hours of transcribed Wu-accented Mandarin conversational speech

This open-source dataset consists of 3 hours of transcribed Wu-accented Mandarin conversational speech on certain topics, where eight conversations between four speakers were contained.

Note: To ensure the conversation is in accented Mandarin instead of a dialect, one side of the dialogists speaks Mandarin Chinese that is relatively standard.

Sample:

Dataset Overview

Dataset Type

ASR speech corpus

Language

zh-CN,
Wu-accented Mandarin (Wu areas, China)

Speech Style

spontaneous conversation

Content

themed conversations

Audio Parameters

16 kHz, 16 bits, mono

File Format

WAV (PCM)
TXT (UTF-8)

Recording Equipment

mobile

Recording Environment

indoor environment

License

Magic Data
open-source license

{{ reviewsTotal }}{{ options.labels.singularReviewCountLabel }}
{{ reviewsTotal }}{{ options.labels.pluralReviewCountLabel }}
{{ options.labels.newReviewButton }}
{{ userData.canReview.message }}

Verifying Email