MagicData
SIGN IN

Total Size: 4.2G

Dataset Overview

Dataset Type

ASR speech corpus

Language

English and Czech

Speech Style

spontaneous conversation

Content

themed conversations

Audio Parameters

File Format

WAV (PCM) TXT (UTF8)

Recording Equipment

Recording Environment

/

License

Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0 US)

Third Party
ASR Corpus

ASR-Vystadial: An English and Czech Telephone Conversational Corpus from the Vystadial Project

The data comprise over 41 hours of speech in English and over 15 hours in Czech, plus orthographic transcriptions.

About this resource:

This data is transcribed from telephone conversation data, in English and Czech.

The data collection process and development of these training scripts were partly funded by the Ministry of Education, Youth and Sports of the Czech Republic under the grant agreement LK11221 and core research funding from Charles University in Prague.

You can cite the data using the following BibTeX entry:

@inproceedings{korvas_2014,
  title={{Free English and Czech telephone speech corpus shared under the CC-BY-SA 3.0 license}},
  author={Korvas, Mat\v{e}j and Pl\'{a}tek, Ond\v{r}ej and Du\v{s}ek, Ond\v{r}ej and \v{Z}ilka, Luk\'{a}\v{s} and Jur\v{c}\'{i}\v{c}ek, Filip},
  booktitle={Proceedings of the Eigth International Conference on Language Resources and Evaluation (LREC 2014)},
  pages={To Appear},
  year={2014},
}

Dataset Overview

Dataset Type

ASR speech corpus

Language

English and Czech

Speech Style

spontaneous conversation

Content

themed conversations

Audio Parameters

File Format

WAV (PCM) TXT (UTF8)

Recording Equipment

Recording Environment

/

License

Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0 US)

{{ reviewsTotal }}{{ options.labels.singularReviewCountLabel }}
{{ reviewsTotal }}{{ options.labels.pluralReviewCountLabel }}
{{ options.labels.newReviewButton }}
{{ userData.canReview.message }}

Verifying Email