About this resource:

This package contains Iban language text and speech suitable for Automatic Speech Recognition (ASR) experiments. In addition to transcribed speech, 2M tokens corpus crawled from online an newspaper site is provided. News data was provided by a local radio station in Sarawak, Malaysia.

PUBLICATION ON IBAN DATA AND ASR

Details on the corpora and our experiments on IBAN ASR can be found in the following list of publications. We appreciate it if you cite them if you intend to publish them.

@inproceedings{Juan14,
	Author = {Sarah Samson Juan and Laurent Besacier and Solange Rossato},
	Booktitle = {Proceedings of Workshop for Spoken Language Technology for Under-resourced (SLTU)},
	Month = {May},
	Title = {Semi-supervised G2P bootstrapping and its application to ASR for a very under-resourced language: Iban},
	Year = {2014}}


@inproceedings{Juan2015,
  	Title = {Using Resources from a closely-Related language to develop ASR for a very under-resourced Language: A case study for Iban},
  	Author = {Sarah Samson Juan and Laurent Besacier and Benjamin Lecouteux and Mohamed Dyab},
  	Booktitle = {Proceedings of INTERSPEECH},
  	Year = {2015},
  	Address = {Dresden, Germany},
  	Month = {September}}

Original source of the corpus

This OpenSLR release was created from data originally provided by Sarah Juan, but the format was changed to better fit the Kaldi practices. Some of the files were removed, as they are generated now automatically in the Kaldi Iban recipe.

The original source of the corpus is

https://github.com/sarahjuan/iban

See the README there for more details, most of it still applies.

ACKNOWLEDGEMENT

Iban Data collected by Sarah Samson Juan and Laurent Besacier. Prepared by Sarah Samson Juan and Laurent Besacier. Created in GETALP, Grenoble, France

We would like to thank the Ministry of Higher Education Malaysia for providing financial support to conduct this study. We also thank The Borneo Post news agency for providing online materials for building the text corpus and also to Radio Televisyen Malaysia (RTM), Sarawak, Malaysia, for providing the news data.

SIGN IN

SIGN UP

Total Size: 913MB

Dataset Overview

Dataset Type

Language

Speech Style

Content

Audio Parameters

File Format

Recording Equipment

Recording Environment

License

Attribution-ShareAlike 2.0 Generic (CC BY-SA 2.0)

ASR-IbSC: An Iban Speech Corpora

PUBLICATION ON IBAN DATA AND ASR

Original source of the corpus

ACKNOWLEDGEMENT

Dataset Overview

Dataset Type

Language

Speech Style

Content

Audio Parameters

File Format

Recording Equipment

Recording Environment

License

Attribution-ShareAlike 2.0 Generic (CC BY-SA 2.0)

京公网安备 11010802035822号

SIGN IN

SIGN UP

Total Size: 913MB

Dataset Overview

Dataset Type

Language

Speech Style

Content

Audio Parameters

File Format

Recording Equipment

Recording Environment

License

Attribution-ShareAlike 2.0 Generic (CC BY-SA 2.0)

ASR-IbSC: An Iban Speech Corpora

PUBLICATION ON IBAN DATA AND ASR

Original source of the corpus

ACKNOWLEDGEMENT

Dataset Overview

Dataset Type

Language

Speech Style

Content

Audio Parameters

File Format

Recording Equipment

Recording Environment

License

Attribution-ShareAlike 2.0 Generic (CC BY-SA 2.0)

京公网安备 11010802035822号

Verifying Email