On July 6, 2022, ISCSLP 2022 Conversational Short-phrase Speaker Diarization Challenge (CSSD) which is jointly sponsored by the Institute of Acoustics CAS, Northwestern Polytechnical University, Singapore A*STAR Institute of Information and Communication, Shanghai Jiaotong University and Magic Data (Beijing Aishu Smart Technology Co., Ltd.), is officially opened for registration. Groups and individuals from academia and industry are welcome to register for the competition.
Dialogue scenarios are one of the most essential and challenging scenarios for speech processing technology. In daily conversations, people casually respond to each other and continue the conversation with coherent questions and comments rather than bluntly answering each other's questions. Accurately detecting the speech activity of each person in a conversation is critical for many downstream tasks such as natural language processing and machine translation. The evaluation metric for speaker classification systems, the classification error rate (DER), has long been used as a standard evaluation metric for speaker classification. However, it fails to pay enough attention to short dialogue phrases. These short dialogue phrases are short but play an essential role at the semantic level. The speech community also lacks evaluation metrics to effectively assess the accuracy of short speech classification in conversations.
To solve this problem, we open-sourced the MagicData-RAMC Chinese conversational speech dataset, which contains 180 hours of manually annotated conversational speech data. For the CSSD evaluation, we also prepare 20 hours of dialogue data for testing purpose, and manually annotate the speaker's timestamps. For the CSSD challenge, we also design a new accuracy evaluation metric to calculate the accuracy of sentence-level speaker diarization. By advancing research on segmentation and clustering techniques for dialogue data, we aim to further promote reproducible research in this field.
2022年7月6日,中国科学院声学研究所、西北工业大学、新加坡A*STAR信息通信研究所、上海交通大学以及Magic Data (北京爱数智慧科技有限公司) 联合主办的 “ISCSLP2022对话短语音说话人日志挑战赛” (ISCSLP 2022 Conversational Short-phrase Speaker Diarization Challenge, CSSD) 正式开启报名,欢迎学术界、产业界的团体及个人报名参赛。
对话场景是语音处理技术最重要的场景之一,同时也是最具挑战性的场景。在日常对话中,人们以随意的方式相互回应,并以连贯的问题和意见继续对话,而不是生硬地回答对方的问题。精准检测对话中每个人的语音活动,对于自然语言处理、机器翻译等众多下游任务至关重要。说话人分类系统的评价指标是分类错误率(DER)长期以来一直被用作说话人分类的标准评估指标,但它未能对短对话短语给予足够的重视。这些短对话短语持续时间短,但在语义层面上起重要作用。语音社区也缺乏有效评估对话中短语音分类准确性的评估指标。
围绕这一难题,我们开源了 MagicData-RAMC中文对话语音数据集,其中包含 180 小时人工标注对话语音数据。同时针对CSSD测评,我们还准备了 20 小时对话测试数据,并人工对说话人时间点进行了精准标注。针对CSSD挑战,我们同时设计了一个新的准确度评估指标,用于计算句子层面说话人分割聚类的精度。通过推动对话数据分割聚类技术的研究,我们旨在进一步促进该领域的可重复研究。
2022-07-04, Open Registration.
2022-07-22 12:00 am, Registration Deadline.
2022-07-24 12:00 am, Open Training Set and Evaluation Metrics.
2022-09-13 12:00 am, Open Evaluation Set.
2022-09-15 12:00 am, Final Submission Deadline.
2022-09-16 12:00 am, Annoucement of Results and Rankings.
2022-09-24, Paper Submission Deadline.
Registration website: www.magichub.com/join-competition
Number of participants: Less than 5 participants per team (including 5 people)
More details:www.magichub.com
报名地址:www.magichub.com/join-competition
参赛人数:每队参赛人数5人以内 (含5人)
更多详情:www.magichub.com
Participants submit inference results, and competition committee will calculate the score. The file format and evaluation metric will be announced in the open stage of the competition.
All participants should adhere to the following rules:
参赛者提交推理结果,由系统进行计算指标结果,具体文件格式以及评测指标将会在比赛训练开放阶段公布。
For every result submission, source code and model are required for only testing submission when ranking is questioned.
每一份结果数据需要提供原始代码与相应的模型,以便在榜单结果存疑时用于还原测试结果。
Questions related to the challenge could email iscslp.cssd@gmail.com or open@magicdatatech.com with the subject of the email titled "Question about the Conversational Short-phrase Speaker Diarization Challenge".
挑战赛相关问题,可以邮件标题为“对话短语音说话人日志挑战赛疑问”发送邮件至iscslp.cssd@gmail.com 或 open@magicdatatech.com。
比赛分别设置一等奖、二等奖和三等奖,将评选出三组获奖团队/个人。一等奖团队/个人将获得OPPO Find系列手机一部(价值约5000元),同时,获奖者将有机会参加 ISCSLP 2022 会议进行报告分享。
File: <customized-name>.rttm
Encoding: UTF-8 recommended.
Single line content structure (a '_' means one SPACE):
speaker_'key'_'No.'_'seg-start'_'seg-end'_<NA>_<NA>_'id'_<NA>_<NA>
Notice: You can submit your hypothesis results up to 10 times before the submission closed and receive real-time scoring feedback.
If you submit results frequently within a short period of time, remember to refresh the page more often to see the latest results, as we use a caching mechanism to improve the site's access experience.
The MagicData-RAMC corpus contains 180 hours of conversational speech data recorded from native speakers of Mandarin Chinese over mobile phones with a sampling rate of 16 kHz. The dialogs in MagicData-RAMC are classified into 15 diversified domains and tagged with topic labels, ranging from science and technology to ordinary life. Accurate transcription and precise speaker voice activity timestamps are manually labeled for each sample. Speakers' detailed information is also provided. As a Mandarin speech dataset designed for dialog scenarios with high quality and rich annotations, MagicData-RAMC enriches the data diversity in the Mandarin speech community and allows extensive research on a series of speech-related tasks, including automatic speech recognition, speaker diarization, topic detection, keyword search, text-to-speech, etc. Please refer to MagicData RAMC
主办方针对赛道“对话短语音说话人日志(SD)准确率”开放了以下训练数据集:
1、MagicData-RAMC 包括351组多轮普通话对话,时长共计180小时。每组对话的标注信息包括转录文本、语音活动时间戳、说话人信息、录制信息和话题信息。说话人信息包括了性别、年龄和地域,录制信息包括了环境和设备。请参赛者查看邮件进行数据集下载。
2、评估集(Test),将于9月8日开放。
所有参与者都应遵守以下规则:
1. DATA:只允许使用MagicData RAMC(openslr 123)、VoxCeleb Data(openslr 49)和CN-Celeb Corpus(openslr 82)。数据增强可以使用两个噪声数据集,即 MUSAN(openslr17), RIRNoise (openslr 28)。
2. 严禁以任何形式使用测试集,包括但不限于使用测试数据集对模型进行微调或训练。
3.允许多系统融合。然而不鼓励使用具有相同结构的系统进行融合。
4. 所有模型都应在允许的数据集上进行训练。具体来说,预训练模型不允许使用其他数据集(包括未标记的数据)。
5. 最终解释权归主办方所有。
We use VBHMM x-vectors (aka VBx) trained by VoxCeleb Data (openslr-49) and CN-Celeb Corpus (openslr-82) as baseline system. X-vectors embeddings are extracted by ResNet, and besides, agglomerative hierarchical clustering with variational Bayes HMM resegmentation are conducted to get final result. Please refer to MagicData RAMC
We adopt Conversational-DER (CDER) to evaluate the speaker diarization system. In real conversations, there are cases that a shorter duration contains vital information. The evaluation of the speaker diarization system based on the time duration is difficult to reflect the recognition performance of short-term segments. Our basic idea is that for each speaker, regardless of the length of the spoken sentence, all type of mistakes should be equally reflected in the final evaluation metric. Based on this, we intend to evaluate the performance of the speaker diarization system on the sentence level under conversational scenario (utterance level). Please refer to CDER Metric
If you have any questions, please contact us. You could open an issue on github or email us.
为了帮助参赛者快速、高质量完成模型开发和训练,主办方提供了基线系统,提供给参赛者使用。我们使用VBx系统作为我们的基线系统,该系统使用ResNet来进行说话人特征提取,使用AHC与VB-HMM 进行特征向量的聚类。
详细的使用教程请见 :
https://github.com/MagicHub-io/MagicData-RAMC
为了评价说话人日志系统的性能,我们提出了Conversational-DER (CDER) 的指标。传统的DER 可以在时间尺度上评估说话人分类系统的整体性能。但是,在实际对话中,有时较短的持续时间包含重要信息,基于时间尺度的系统评价标准难以反映短时片段的识别性能。因此我们提出了CDER ,在句子级别评估说话人日志系统。
详细信息请见 :
https://github.com/MagicHub-io/CDER_Metric
对基线系统有任何疑问,请访问以下链接获取帮助,将有专家团队给予解答。
Your IP is: 3.238.227.73