MagicData

sign in

Dataset Overview

Dataset Type

ASR speech corpus

Language

zh & en-CN

Speech Style

Scripted

Content

Daily-Use Sentence (Chinese-English Code-Mixing)

Audio Parameters

16 kHz, 16 bits, mono

File Format

WAV (PCM)
TXT (UTF8)

Recording Equipment

mobile

Recording Environment

indoor environment
Proprietary
ASR Corpus
1650 hours

ASR-SCECoMiDuSC: A Scripted Chinese-English Code-Mixing Daily-use Speech Corpus

MDT-ASR-D028 | 1,650 hours of transcribed Chinese-English Code-Mixing scripted speech on daily use sentences

This dataset consists of 1,650 hours of transcribed Chinese-English Code-Mixing scripted speech focusing on daily use sentences contributed by 2,134 speakers.

Contact business@magicdatatech.com to learn more.

Sample:

Dataset Overview

Dataset Type

ASR speech corpus

Language

zh & en-CN

Speech Style

Scripted

Content

Daily-Use Sentence (Chinese-English Code-Mixing)

Audio Parameters

16 kHz, 16 bits, mono

File Format

WAV (PCM)
TXT (UTF8)

Recording Equipment

mobile

Recording Environment

indoor environment

License

{{ reviewsTotal }}{{ options.labels.singularReviewCountLabel }}
{{ reviewsTotal }}{{ options.labels.pluralReviewCountLabel }}
{{ options.labels.newReviewButton }}
{{ userData.canReview.message }}

Verifying Email