MagicData
SIGN IN

ISCSLP 2022 Conversational Short-phrase Speaker Diarization Challenge (CSSD)

ISCSLP2022对话短语音说话人日志挑战赛

Leaderboard

Submission Introduce

File: <customized-name>.rttm

Encoding: UTF-8 recommended.

Single line content structure (a '_' means one SPACE):

speaker_'key'_'No.'_'seg-start'_'seg-end'_<NA>_<NA>_'id'_<NA>_<NA>

Notice

Notice: You can submit your hypothesis results up to 10 times before the submission closed and receive real-time scoring feedback.

If you submit results frequently within a short period of time, remember to refresh the page more often to see the latest results, as we use a caching mechanism to improve the site's access experience.

00
Hour
00
Min
00
Sec
Submission Closed

RANK

Team

Organization

Team Leader

CDER

Mazen's Team
Dataspecialists
Mazen

No score yet

Rp solution tips
Rp solution tips
Rajesh

No score yet

onepeace
bupt
shiqianglang

No score yet

Jop for all
Jop for all
Amr

No score yet

Stupird
合肥闻欣尔悦
Charlie

No score yet

Freelancers Team
Group
Hadeel Dawoud

No score yet

fp_team
sichuan university
fanpeng

No score yet

PhoneScriber
composio.dev
Sawradip Saha

No score yet

havufun
none
none

No score yet

A.MA
A.M.A
Akram Al-Rabasi

No score yet

Datasets

Dataset

The MagicData-RAMC corpus contains 180 hours of conversational speech data recorded from native speakers of Mandarin Chinese over mobile phones with a sampling rate of 16 kHz. The dialogs in MagicData-RAMC are classified into 15 diversified domains and tagged with topic labels, ranging from science and technology to ordinary life. Accurate transcription and precise speaker voice activity timestamps are manually labeled for each sample. Speakers' detailed information is also provided. As a Mandarin speech dataset designed for dialog scenarios with high quality and rich annotations, MagicData-RAMC enriches the data diversity in the Mandarin speech community and allows extensive research on a series of speech-related tasks, including automatic speech recognition, speaker diarization, topic detection, keyword search, text-to-speech, etc. Please refer to MagicData RAMC

开发训练集

主办方针对赛道“对话短语音说话人日志(SD)准确率”开放了以下训练数据集:
1、MagicData-RAMC 包括351组多轮普通话对话,时长共计180小时。每组对话的标注信息包括转录文本、语音活动时间戳、说话人信息、录制信息和话题信息。说话人信息包括了性别、年龄和地域,录制信息包括了环境和设备。请参赛者查看邮件进行数据集下载。

2、评估集(Test),将于9月8日开放。

所有参与者都应遵守以下规则:

1. DATA:只允许使用MagicData RAMC(openslr 123)、VoxCeleb Data(openslr 49)和CN-Celeb Corpus(openslr 82)。数据增强可以使用两个噪声数据集,即 MUSAN(openslr17), RIRNoise (openslr 28)。

2. 严禁以任何形式使用测试集,包括但不限于使用测试数据集对模型进行微调或训练。

3.允许多系统融合。然而不鼓励使用具有相同结构的系统进行融合。

4. 所有模型都应在允许的数据集上进行训练。具体来说,预训练模型不允许使用其他数据集(包括未标记的数据)。

5. 最终解释权归主办方所有。

Evaluation

Baseline

We use VBHMM x-vectors (aka VBx) trained by VoxCeleb Data (openslr-49) and CN-Celeb Corpus (openslr-82) as baseline system. X-vectors embeddings are extracted by ResNet, and besides, agglomerative hierarchical clustering with variational Bayes HMM resegmentation are conducted to get final result. Please refer to MagicData RAMC

Scoring tool

We adopt Conversational-DER (CDER) to evaluate the speaker diarization system. In real conversations, there are cases that a shorter duration contains vital information. The evaluation of the speaker diarization system based on the time duration is difficult to reflect the recognition performance of short-term segments. Our basic idea is that for each speaker, regardless of the length of the spoken sentence, all type of mistakes should be equally reflected in the final evaluation metric. Based on this, we intend to evaluate the performance of the speaker diarization system on the sentence level under conversational scenario (utterance level). Please refer to CDER Metric

Contact

If you have any questions, please contact us. You could open an issue on github or email us.

基线系统介绍

为了帮助参赛者快速、高质量完成模型开发和训练,主办方提供了基线系统,提供给参赛者使用。我们使用VBx系统作为我们的基线系统,该系统使用ResNet来进行说话人特征提取,使用AHC与VB-HMM 进行特征向量的聚类。

详细的使用教程请见 :

https://github.com/MagicHub-io/MagicData-RAMC

打分工具介绍

为了评价说话人日志系统的性能,我们提出了Conversational-DER (CDER) 的指标。传统的DER 可以在时间尺度上评估说话人分类系统的整体性能。但是,在实际对话中,有时较短的持续时间包含重要信息,基于时间尺度的系统评价标准难以反映短时片段的识别性能。因此我们提出了CDER ,在句子级别评估说话人日志系统。

详细信息请见 :

https://github.com/MagicHub-io/CDER_Metric

基线系统答疑指导

对基线系统有任何疑问,请访问以下链接获取帮助,将有专家团队给予解答。

https://github.com/MagicHub-io/MagicData-RAMC#contents