Dataset Introduction
MagicData-Dialect-TTS-Lite is an open-source Chinese dialect TTS dataset collection released by Magic Data. It includes five Chinese dialect varieties: Northeastern Chinese, Henan Dialect, Sichuanese, Wu Chinese, and Cantonese.
The full collection contains approximately 50 minutes of speech data, recorded by five native dialect speakers aged between 30 and 60. Each dialect subset contains around 10 minutes of audio and is released as an independent open-source dataset.
If you are interested in a specific dialect, please click the corresponding dataset link below for more details.
概览
| Dialect Region | City | Code | Duration | Sentences | Speaker |
| Northeastern Chinese | Siping | NED | 10 minutes | 75 sentences | 1 female, 30 years old |
| Henan Dialect | Zhengzhou | HEN | 10 minutes | 74 sentences | 1 male, 34 years old |
| Sichuanese | Chengdu | SIC | 10 minutes | 77 sentences | 1 female, 40 years old |
| Wu Chinese | Suzhou | JSU | 10 minutes | 102 sentences | 1 female, 50 years old |
| Cantonese | Guangzhou | GUD | 10 minutes | 54 sentences | 1 female, 55 years old |
Total: 50 minutes / 5 native dialect speakers
Dataset Links
- Northeastern Chinese:https://magichub.com/datasets/magicdata-dialect-northeastern-chinese-tts-lite
- Henan Dialect:https://magichub.com/datasets/magicdata-dialect-henan-dialect-tts-lite
- Sichuanese:https://magichub.com/datasets/magicdata-dialect-sichuanese-tts-lite
- Wu Chinese:https://magichub.com/datasets/magicdata-dialect-wu-chinese-tts-lite
- Cantonese:https://magichub.com/datasets/magicdata-dialect-cantonese-tts-lite
Recommended Use
This dataset collection is suitable for:
- Multi-dialect TTS research
- Zero-shot / few-shot TTS baseline testing
- Dialect acoustic analysis
- Academic research and model evaluation
For detailed dataset features, annotation guidelines, and file structure, please refer to each individual dialect dataset page.
Open-source License
This dataset collection is for non-commercial use only under the CC BY-NC-ND 4.0 license. It is suitable for academic research, personal development, and model evaluation.
📧 For the full commercial version, please contact: business@magicdatatech.com
