MagicData
SIGN IN

Magic Data Spontaneous Conversational Datasets, assisting you taking leading position in AI

Posted at 1 year ago

The Off-the-shelf Datasets are always seen as the quickest and most efficient way to help AI dev building and improving their AI models. Together with our business partners, we go deeper into industry, craft comprehensive conversational datasets for various scenarios. We are proud of our efforts. The quality and effectiveness of our data are proved by all our clients.

Here, we give an overview of our datasets in Chinese language. We have more than 90 thousand hours Chinese mandarin in total. For accented Chinese, like Cantonese, Sichuan accent, shanghai accent etc. we have the hours up to 5,000.

Here, the other languages. We have 20,000 hours English, 8,000 hours Korean, 7,000 hours Japanese, etc.

Related Datasets

Datasets Download Rank

ASR-RAMC-BigCCSC: A Chinese Conversational Speech Corpus
Multi-Modal Driver Behaviors Dataset for DMS
ASR-SCKwsptSC: A Scripted Chinese Keyword Spotting Speech Corpus
ASR-SCCantDuSC: A Scripted Chinese Cantonese (Canton) Daily-use Speech Corpus
ASR-SCCantCabSC: A Scripted Chinese Cantonese (Canton) Cabin Speech Corpus
ASR-EgArbCSC: An Egyptian Arabic Conversational Speech Corpus
ASR-CCantCSC: A Chinese Cantonese (Canton) Conversational Speech Corpus
ASR-SpCSC: A Spanish Conversational Speech Corpus
ASR-CabNois: A Cabin Noise Dataset