Chuan-Yu 12-City Sub-dialect Speech Dataset

The Chuan-Yu 12-City Sub-dialect Speech Dataset is an open-source Chinese dialect speech dataset focusing on city-level sub-dialect varieties in the Sichuan-Chongqing region. “Chuan-Yu” refers to Sichuan and Chongqing, a region where local dialects are widely used in daily communication and carry distinct pronunciation, intonation, and regional expression patterns.

The dataset is designed to help speech AI systems better understand fine-grained dialect differences within the Chuan-Yu region. Instead of treating Sichuanese or Chongqing dialects as broad categories, this dataset provides city-level coverage across 12 representative cities, making it suitable for research on sub-dialect variation, accent classification, dialect speech recognition, and localized speech technology.

Dataset Overview

Dialect Area	Representative City	Duration (h)	Utterances
Cheng-Yu Area	Chengdu	5.18	1,993
Cheng-Yu Area	Chongqing	4.99	2,034
Minjiang Area	Leshan	3.52	1,308
Minjiang Area	Yibin	3.05	1,190
Minjiang Area	Luzhou	3.26	1,330
Renfu Sub-area	Zigong	2.27	885
Renfu Sub-area	Neijiang	2.68	889
Yagan Sub-area	Ya’an	1.69	727
Yagan Sub-area	Xichang	3.28	1,222
Others	Nanchong	1.19	476
Others	Dazhou	1.3	478
Others	Guang’an	1.38	536

Total: 33 hours / 13,068 utterances / 38 native speakers

City-level Sub-dialect Coverage

The dataset covers 12 cities in the Sichuan-Chongqing region, including Chengdu, Chongqing, Leshan, Yibin, Luzhou, Zigong, Neijiang, Ya’an, Xichang, Nanchong, Dazhou, and Guang’an.

Each city is organized as an independent subset. This structure makes it easier to study the pronunciation, rhythm, tone, and accent differences between local varieties. For example, Chengdu, Chongqing, Zigong, and Mianyang-style speech may all be broadly associated with the Chuan-Yu dialect region, but their local pronunciation features and speaking styles can vary significantly.

Native Speaker Recording and Review

All speech data was recorded by local native dialect speakers. Speakers were selected from the corresponding cities and cover different age groups, genders, and occupational backgrounds, helping improve the diversity and representativeness of the dataset.

The annotation and quality review process was also conducted with the support of native speakers familiar with local accents. This “local speaker recording + local speaker review” process helps ensure the authenticity and accuracy of the speech data, transcription, and dialect-related features.

Annotation Information

Each speech segment includes multi-level annotation information:

Standard Mandarin transcription
Speaker gender
Speaker age group
Recording city
Audio duration

Each utterance is approximately 5 to 45 seconds long, with an average duration of around 10 seconds. The utterances are naturally segmented with punctuation-based sentence boundaries, avoiding unnatural forced cuts.

Speech Content

The recording content covers daily conversations, real-life communication scenarios, and local cultural topics. The dataset is designed to capture practical spoken language rather than isolated dictionary-style dialect words, making it more suitable for real-world speech AI research and application development.

Data Format

Audio format: WAV
Sampling rate: 16 kHz
Bit depth: 16-bit
Transcription: Standard Mandarin text
Metadata: speaker gender, age group, recording city, and other related information

Potential Applications

This dataset can be used for:

Dialect speech recognition model training and fine-tuning
Dialect-aware speech synthesis research
Dialect-to-Standard Mandarin speech or text conversion
Regional speech technology development
Dialect culture preservation and digital archiving

By providing city-level sub-dialect speech data from the Chuan-Yu region, this dataset supports the development of speech AI systems that can better understand real-world regional language variation and provide more localized speech interaction experiences.

SIGN IN

SIGN UP

Total Size: 3.2GB

Dataset Overview

Dataset Type

Language

Speech Style

Content

Audio Parameters

File Format

Recording Equipment

Recording Environment

License

MAGIC DATA OPEN-SOURCE LICENSE

Chuan-Yu 12-City Sub-dialect Speech Dataset

Dataset Overview

City-level Sub-dialect Coverage

Native Speaker Recording and Review

Annotation Information

Speech Content

Data Format

Potential Applications

Dataset Overview

Dataset Type

Language

Speech Style

Content

Audio Parameters

File Format

Recording Equipment

Recording Environment

License

MAGIC DATA OPEN-SOURCE LICENSE

京公网安备 11010802035822号

SIGN IN

SIGN UP

Total Size: 3.2GB

Dataset Overview

Dataset Type

Language

Speech Style

Content

Audio Parameters

File Format

Recording Equipment

Recording Environment

License

MAGIC DATA OPEN-SOURCE LICENSE

Chuan-Yu 12-City Sub-dialect Speech Dataset

Dataset Overview

City-level Sub-dialect Coverage

Native Speaker Recording and Review

Annotation Information

Speech Content

Data Format

Potential Applications

Dataset Overview

Dataset Type

Language

Speech Style

Content

Audio Parameters

File Format

Recording Equipment

Recording Environment

License

MAGIC DATA OPEN-SOURCE LICENSE

京公网安备 11010802035822号

Verifying Email