MagicData
SIGN IN

概览

数据集类型

语种

语音类型

内容

音频参数

文件格式

录音设备

录音环境

第三方
发音词典

LEX-MSUSwibTrans: A Transcriptions & Lexicon of Switchboard Dataset

About this resource:

This resource mirrors the transcriptions of Switchboard data generated at Mississippi State and the associated lexicon. These were released without any license restrictions.

The Switchboard (SWB) corpus is one of the most important historical benchmarks for recognition tasks involving large vocabulary conversational speech (LVCSR). It contains 2430 conversations averaging 6 minutes in length; in other words, over 240 hours of recorded speech, and about 3 million words of text, spoken by over 500 speakers of both sexes from every major dialect of American English.

The initial transcriptions for SWB have error rates above 10%, resulting in poor recognition performance, particularly on hard-to-recognize words such as monosyllabic words. This release of the SWB transcriptions, which was developed by the Institute for Signal and Information Processing at Mississippi State University in the late 1990s, includes transcriptions that were manually corrected to have error rates below 1%. The release also includes manually-adjusted segmentations and word alignments.

概览

数据集类型

语种

语音类型

内容

音频参数

文件格式

录音设备

录音环境

授权方式

{{ reviewsTotal }}{{ options.labels.singularReviewCountLabel }}
{{ reviewsTotal }}评论
写评论
*访客无法进行评论

Verifying Email