MagicData
SIGN IN

Total Size: 913MB

概览

数据集类型

语音识别(ASR)音频数据集

语种

伊班语

语音类型

内容

新闻

音频参数

文件格式

录音设备

录音环境

授权方式

Attribution-ShareAlike 2.0 Generic (CC BY-SA 2.0)

第三方
ASR数据集

ASR-IbSC: An Iban Speech Corpora

About this resource:

This package contains Iban language text and speech suitable for Automatic Speech Recognition (ASR) experiments. In addition to transcribed speech, 2M tokens corpus crawled from online an newspaper site is provided. News data was provided by a local radio station in Sarawak, Malaysia.

PUBLICATION ON IBAN DATA AND ASR

Details on the corpora and our experiments on IBAN ASR can be found in the following list of publications. We appreciate it if you cite them if you intend to publish them.

@inproceedings{Juan14,
	Author = {Sarah Samson Juan and Laurent Besacier and Solange Rossato},
	Booktitle = {Proceedings of Workshop for Spoken Language Technology for Under-resourced (SLTU)},
	Month = {May},
	Title = {Semi-supervised G2P bootstrapping and its application to ASR for a very under-resourced language: Iban},
	Year = {2014}}


@inproceedings{Juan2015,
  	Title = {Using Resources from a closely-Related language to develop ASR for a very under-resourced Language: A case study for Iban},
  	Author = {Sarah Samson Juan and Laurent Besacier and Benjamin Lecouteux and Mohamed Dyab},
  	Booktitle = {Proceedings of INTERSPEECH},
  	Year = {2015},
  	Address = {Dresden, Germany},
  	Month = {September}}

Original source of the corpus

This OpenSLR release was created from data originally provided by Sarah Juan, but the format was changed to better fit the Kaldi practices. Some of the files were removed, as they are generated now automatically in the Kaldi Iban recipe.

The original source of the corpus is

https://github.com/sarahjuan/iban

See the README there for more details, most of it still applies.

ACKNOWLEDGEMENT

Iban Data collected by Sarah Samson Juan and Laurent Besacier. Prepared by Sarah Samson Juan and Laurent Besacier. Created in GETALP, Grenoble, France

We would like to thank the Ministry of Higher Education Malaysia for providing financial support to conduct this study. We also thank The Borneo Post news agency for providing online materials for building the text corpus and also to Radio Televisyen Malaysia (RTM), Sarawak, Malaysia, for providing the news data.

概览

数据集类型

语音识别(ASR)音频数据集

语种

伊班语

语音类型

内容

新闻

音频参数

文件格式

录音设备

录音环境

授权方式

Attribution-ShareAlike 2.0 Generic (CC BY-SA 2.0)

{{ reviewsTotal }}{{ options.labels.singularReviewCountLabel }}
{{ reviewsTotal }}评论
写评论
*访客无法进行评论

Verifying Email