MagicData
SIGN IN

Total Size: 420 MB

Dataset Overview

Dataset Type

ASR speech corpus

Language

th-TH,
Thai (Thailand)

Speech Style

scripted monologue

Content

daily use sentences

Audio Parameters

16 kHz, 16 bits, mono

File Format

WAV (PCM)
TXT (UTF-8)

Recording Equipment

mobile

Recording Environment

indoor environment

License

Magic Data
open-source license

Open Source
ASR Corpus
4.56 hours

ASR-STiDuSC: A Scripted Thai Daily-use Speech Corpus

4.56 hours of transcribed Thai scripted speech
on daily use sentences

This open-source dataset consists of 4.56 hours of transcribed Thai scripted speech focusing on daily use sentences, where 5,431 utterances contributed by ten speakers were contained.

Sample:

"เที่ยง ของ วันนั้น ลูกเรือ ก็ เตรียม ทุ่นระเบิด"

Dataset Overview

Dataset Type

ASR speech corpus

Language

th-TH,
Thai (Thailand)

Speech Style

scripted monologue

Content

daily use sentences

Audio Parameters

16 kHz, 16 bits, mono

File Format

WAV (PCM)
TXT (UTF-8)

Recording Equipment

mobile

Recording Environment

indoor environment

License

Magic Data
open-source license

{{ reviewsTotal }}{{ options.labels.singularReviewCountLabel }}
{{ reviewsTotal }}{{ options.labels.pluralReviewCountLabel }}
{{ options.labels.newReviewButton }}
{{ userData.canReview.message }}

Verifying Email