MagicData
SIGN IN

Dataset Overview

Dataset Type

text corpus for NLP

Language

zh-CN

Speech Style

N/A

Content

Chatting

Audio Parameters

N/A

File Format

TXT (UTF8)

Recording Equipment

N/A

Recording Environment

N/A
Proprietary
NLP Corpus
2095686 sentences

NLP-CCC: A Chinese Chitchat Corpus

MDT-NLP-F016 | 2,095,686 Mandarin Chinese chatting text

This dataset consists of 2,095,686 sentences chatting language in Mandarin Chinese.

Contact business@magicdatatech.com to learn more.

Sample:

咱今天唠点儿啥呀芬芬 休闲娱乐
哎你知道钻石叫啥不 休闲娱乐
也是世界文明的一个旅游景点 衣食住行
都江堰啊青城山这些就比较适合夏天去 衣食住行
文苑是你们学校吗? 人际关系
那天那天天还能一块吃饭呢? 人际关系
他就不畏艰险哦,不不是鉴真,是中国人好像鉴真是中国人 人文科学
就郑和下西洋对 人文科学

Dataset Overview

Dataset Type

text corpus for NLP

Language

zh-CN

Speech Style

N/A

Content

Chatting

Audio Parameters

N/A

File Format

TXT (UTF8)

Recording Equipment

N/A

Recording Environment

N/A

License

{{ reviewsTotal }}{{ options.labels.singularReviewCountLabel }}
{{ reviewsTotal }}{{ options.labels.pluralReviewCountLabel }}
{{ options.labels.newReviewButton }}
{{ userData.canReview.message }}

Verifying Email