MagicData
SIGN IN

Total Size: 3.2GB

Dataset Overview

Dataset Type

N/A

Language

Chinese Dialect

Speech Style

Scripted

Content

N/A

Audio Parameters

16 kHz, 16 bits

File Format

WAV

Recording Equipment

microphone

Recording Environment

Diverse Recording Environments
Open Source
ASR Corpus
33 hours

Chuan-Yu 12-City Sub-dialect Speech Dataset

The Chuan-Yu 12-City Sub-dialect Speech Dataset is an open-source Chinese dialect speech dataset focusing on city-level sub-dialect varieties in the Sichuan-Chongqing region. “Chuan-Yu” refers to Sichuan and Chongqing, a region where local dialects are widely used in daily communication and carry distinct pronunciation, intonation, and regional expression patterns.

The dataset is designed to help speech AI systems better understand fine-grained dialect differences within the Chuan-Yu region. Instead of treating Sichuanese or Chongqing dialects as broad categories, this dataset provides city-level coverage across 12 representative cities, making it suitable for research on sub-dialect variation, accent classification, dialect speech recognition, and localized speech technology.

Dataset Overview

Dialect AreaRepresentative CityDuration (h)Utterances
Cheng-Yu AreaChengdu5.181,993
Cheng-Yu AreaChongqing4.992,034
Minjiang AreaLeshan3.521,308
Minjiang AreaYibin3.051,190
Minjiang AreaLuzhou3.261,330
Renfu Sub-areaZigong2.27885
Renfu Sub-areaNeijiang2.68889
Yagan Sub-areaYa’an1.69727
Yagan Sub-areaXichang3.281,222
OthersNanchong1.19476
OthersDazhou1.3478
OthersGuang’an1.38536
Total: 33 hours / 13,068 utterances / 38 native speakers

City-level Sub-dialect Coverage

The dataset covers 12 cities in the Sichuan-Chongqing region, including Chengdu, Chongqing, Leshan, Yibin, Luzhou, Zigong, Neijiang, Ya’an, Xichang, Nanchong, Dazhou, and Guang’an.

Each city is organized as an independent subset. This structure makes it easier to study the pronunciation, rhythm, tone, and accent differences between local varieties. For example, Chengdu, Chongqing, Zigong, and Mianyang-style speech may all be broadly associated with the Chuan-Yu dialect region, but their local pronunciation features and speaking styles can vary significantly.

Native Speaker Recording and Review

All speech data was recorded by local native dialect speakers. Speakers were selected from the corresponding cities and cover different age groups, genders, and occupational backgrounds, helping improve the diversity and representativeness of the dataset.

The annotation and quality review process was also conducted with the support of native speakers familiar with local accents. This “local speaker recording + local speaker review” process helps ensure the authenticity and accuracy of the speech data, transcription, and dialect-related features.

Annotation Information

Each speech segment includes multi-level annotation information:

  • Standard Mandarin transcription
  • Speaker gender
  • Speaker age group
  • Recording city
  • Audio duration

Each utterance is approximately 5 to 45 seconds long, with an average duration of around 10 seconds. The utterances are naturally segmented with punctuation-based sentence boundaries, avoiding unnatural forced cuts.

Speech Content

The recording content covers daily conversations, real-life communication scenarios, and local cultural topics. The dataset is designed to capture practical spoken language rather than isolated dictionary-style dialect words, making it more suitable for real-world speech AI research and application development.

Data Format

  • Audio format: WAV
  • Sampling rate: 16 kHz
  • Bit depth: 16-bit
  • Transcription: Standard Mandarin text
  • Metadata: speaker gender, age group, recording city, and other related information

Potential Applications

This dataset can be used for:

  • Dialect speech recognition model training and fine-tuning
  • Dialect-aware speech synthesis research
  • Dialect-to-Standard Mandarin speech or text conversion
  • Regional speech technology development
  • Dialect culture preservation and digital archiving

By providing city-level sub-dialect speech data from the Chuan-Yu region, this dataset supports the development of speech AI systems that can better understand real-world regional language variation and provide more localized speech interaction experiences.

Dataset Overview

Dataset Type

N/A

Language

Chinese Dialect

Speech Style

Scripted

Content

N/A

Audio Parameters

16 kHz, 16 bits

File Format

WAV

Recording Equipment

microphone

Recording Environment

Diverse Recording Environments
{{ reviewsTotal }}{{ options.labels.singularReviewCountLabel }}
{{ reviewsTotal }}{{ options.labels.pluralReviewCountLabel }}
{{ options.labels.newReviewButton }}
{{ userData.canReview.message }}

Verifying Email