Total Size: 4.2G

Sign In to Download.

Dataset Overview

Dataset Type

ASR speech corpus

Language

English and Czech

Speech Style

spontaneous conversation

Content

themed conversations

Audio Parameters

File Format

WAV (PCM) TXT (UTF8)

Recording Equipment

Recording Environment

/
Third Party
ASR Corpus

English and Czech telephone converation data from Vystadial

The data comprise over 41 hours of speech in English and over 15 hours in Czech, plus orthographic transcriptions.

This open-source dataset consists of telephone conversations in English and Czech, developed for training acoustic models for automatic speech recognition in spoken dialogue systems. It ships in three parts: Czech data, English data, and scripts.

Dataset Overview

Dataset Type

ASR speech corpus

Language

English and Czech

Speech Style

spontaneous conversation

Content

themed conversations

Audio Parameters

File Format

WAV (PCM) TXT (UTF8)

Recording Equipment

Recording Environment

/
Sign In to Download.
{{ reviewsTotal }}{{ options.labels.singularReviewCountLabel }}
{{ reviewsTotal }}{{ options.labels.pluralReviewCountLabel }}
{{ options.labels.newReviewButton }}
{{ userData.canReview.message }}