Datasets Type

Domain

Language

Content Type

Accent

Speak Speed

Industry

Scenario

published at December 15, 2021
Open Source
ASR Corpus
6 hours
This open-source dataset consists of 6 hours of transcribed Mandarin Chinese scripted speech of keyword spotting in fast, normal, and slow speed, where 11,030 utterances contributed by 37 speakers were contained.
published at December 15, 2021
Open Source
NLP Corpus
100 sentences
This dataset consists of 100 daily-use sentences in Guangzhou Cantonese.
published at December 15, 2021
Open Source
NLP Corpus
100 sentences
100 paragraphs
This dataset contains 100 pieces of news.
published at November 24, 2021
Proprietary
NLP Corpus
12600 sentences
MDT-NLP-F027 | 12,600 financial customer service related sentences in Mandarin Chinese
published at November 24, 2021
Proprietary
NLP Corpus
330000 sentences
MDT-NLP-F026 | 330,000 sentences labeled with prosody in Chinese
published at November 24, 2021
Proprietary
NLP Corpus
244630 sentences
MDT-NLP-F025 | 244,630 sentences with 138 polyphonic charactors in Chinese
published at November 24, 2021
Proprietary
NLP Corpus
100736 sentences
MDT-NLP-F024 | 100,736 pieces of Chinese text normalization corpus
published at November 23, 2021
Proprietary
NLP Corpus
2480 paragraphs
MDT-NLP-F023 | 2,480 sets of Mandarin Chinese human–computer interaction text
published at November 23, 2021
Proprietary
NLP Corpus
828114 sentences
MDT-NLP-F017 | 828,114 daily-use sentences in Guangzhou Cantonese
published at November 23, 2021
Proprietary
NLP Corpus
2095686 sentences
MDT-NLP-F016 | 2,095,686 Mandarin Chinese chatting text
published at November 23, 2021
Proprietary
NLP Corpus
750194 sentences
MDT-NLP-F015 | 750,194 sentences of POI data on Chinese addresses
published at November 23, 2021
Proprietary
NLP Corpus
613482 sentences
MDT-NLP-F014 | 613,482 Chinese text in making phone calls and sending text messages scenarios
published at
Proprietary
NLP Corpus
127035 sentences
MDT-NLP-F013 | 127,035 Chinese text in onboard navigation scenario
published at November 23, 2021
Proprietary
NLP Corpus
15264 sentences
MDT-NLP-F012 | 15,264 Mandarin Chinese commands and queries text in smart home scenarios
published at November 23, 2021
Proprietary
NLP Corpus
357486 sentences
MDT-NLP-F011 | 357,486 sentences concerning playing music
published at November 23, 2021
Proprietary
NLP Corpus
10488 sentences
MDT-NLP-F010 | 10,488 human–computer interaction text in Mandarin Chinese
published at November 12, 2021
Open Source
ASR Corpus
3.23 hours
3.23 hours of transcribed Mandarin Chinese scripted speech of keyword spotting in fast, normal, and slow speed
published at November 12, 2021
Open Source
NLP Corpus
600 sentences in Mandarin Chinese in finance-related customer service scenarios
published at November 4, 2021
Proprietary
TTS Corpus
40 hours
MDT-TTS-D003 | 21,343 utterances of annotated female voices in Mandarin Chinese applicable for Text-to-Speech Synthesis
published at November 4, 2021
Proprietary
TTS Corpus
1 hours
MDT-TTS-D007 | 697 utterances of annotated female voices in Mandarin Chinese applicable for Text-to-Speech Synthesis