About this resource:
This package contains Iban language text and speech suitable for Automatic Speech Recognition (ASR) experiments. In addition to transcribed speech, 2M tokens corpus crawled from online an newspaper site is provided. News data was provided by a local radio station in Sarawak, Malaysia.
PUBLICATION ON IBAN DATA AND ASR
Details on the corpora and our experiments on IBAN ASR can be found in the following list of publications. We appreciate it if you cite them if you intend to publish them.
@inproceedings{Juan14, Author = {Sarah Samson Juan and Laurent Besacier and Solange Rossato}, Booktitle = {Proceedings of Workshop for Spoken Language Technology for Under-resourced (SLTU)}, Month = {May}, Title = {Semi-supervised G2P bootstrapping and its application to ASR for a very under-resourced language: Iban}, Year = {2014}} @inproceedings{Juan2015, Title = {Using Resources from a closely-Related language to develop ASR for a very under-resourced Language: A case study for Iban}, Author = {Sarah Samson Juan and Laurent Besacier and Benjamin Lecouteux and Mohamed Dyab}, Booktitle = {Proceedings of INTERSPEECH}, Year = {2015}, Address = {Dresden, Germany}, Month = {September}}
Original source of the corpus
This OpenSLR release was created from data originally provided by Sarah Juan, but the format was changed to better fit the Kaldi practices. Some of the files were removed, as they are generated now automatically in the Kaldi Iban recipe.
The original source of the corpus is
https://github.com/sarahjuan/iban
See the README there for more details, most of it still applies.
ACKNOWLEDGEMENT
Iban Data collected by Sarah Samson Juan and Laurent Besacier. Prepared by Sarah Samson Juan and Laurent Besacier. Created in GETALP, Grenoble, France
We would like to thank the Ministry of Higher Education Malaysia for providing financial support to conduct this study. We also thank The Borneo Post news agency for providing online materials for building the text corpus and also to Radio Televisyen Malaysia (RTM), Sarawak, Malaysia, for providing the news data.