This dataset focuses on processing Japanese conversational speech in real-world settings. Designed in a conversation-based style, it captures the interactive and complex nature of everyday communication, thereby enhancing model performance in authentic conversational environments. Recordings were made using mobile devices, a choice that closely mirrors actual usage scenarios and highlights the dataset’s practical relevance. With a total duration of 10 hours, the dataset offers a diverse and realistic collection of conversational speech samples.
Sample:
Two-speaker conversation with separate tracks:
The dataset is not for commercial use. The open-source dataset may be used for academic research and must be properly cited with the source.
Citation Format:Japanese Duplex Conversation Training Dataset. 2025. https://magichub.com/datasets/japanese-duplex-conversation-training-dataset/. Beijing Magic Data Technology Co., Ltd.
For more commercial datasets, please contact business@magicdatatech.com.