[Not for merge] Diarization workflow with SpeechBrain #1031

desh2608 · 2023-04-17T18:40:45Z

This workflow shows how we can use SpeechBrain x-vectors + sklearn agglomerative clustering to perform a crude speaker diarization. This can be used on top of the whisper workflow to obtain speaker-attributed transcripts.

… diar_workflow

pzelasko

This is cool, what is the reason you don't want to merge it?

desh2608 · 2023-04-18T23:52:11Z

This is cool, what is the reason you don't want to merge it?

Mainly because this approach isn't really benchmarked on anything, and I am not sure how well the ECAPA-TDNN embeddings would work with agglomerative clustering.

flyingleafe · 2023-05-16T11:47:44Z

@desh2608 pyannote.audio is basically ECAPA-TDNN + agglomerative clustering, and it is benchmarked quite well.
(https://github.com/pyannote/pyannote-audio)
Why not use it directly?

desh2608 · 2023-05-16T12:37:42Z

@desh2608 pyannote.audio is basically ECAPA-TDNN + agglomerative clustering, and it is benchmarked quite well. (https://github.com/pyannote/pyannote-audio) Why not use it directly?

I think that was in the older Pyannote, if I'm not mistaken? Pyannote 2.0 uses end-to-end segmentation which performs much better. In any case, this was just a quick DIY workflow. It should be relatively easy for folks to just use Pyannote to create RTTMs and then use the SupervisionSet.from_rttm() to create Lhotse manifests.

flyingleafe · 2023-05-16T14:23:47Z

@desh2608 pyannote.audio is basically ECAPA-TDNN + agglomerative clustering, and it is benchmarked quite well. (https://github.com/pyannote/pyannote-audio) Why not use it directly?

I think that was in the older Pyannote, if I'm not mistaken? Pyannote 2.0 uses end-to-end segmentation which performs much better. In any case, this was just a quick DIY workflow. It should be relatively easy for folks to just use Pyannote to create RTTMs and then use the SupervisionSet.from_rttm() to create Lhotse manifests.

Well, not quite, the segmentation model in Pyannote 2.0 is a first step, the assignment of speakers to the segments is still done with ECAPA-TDNN + clustering. But whatever.

desh2608 added 6 commits November 2, 2022 11:14

remove zero duration segments for indexing

7a2d059

Merge branch 'master' of https://github.com/lhotse-speech/lhotse into…

ea28375

… diar_workflow

add diarization workflow with speechbrain

0c53722

add missing file

33783d3

merge upstream

3063429

remove unwanted change

6c9ce1a

pzelasko reviewed Apr 18, 2023

View reviewed changes

Adel-Moumen mentioned this pull request Jan 27, 2024

Display coverage status in PRs speechbrain/speechbrain#2277

Draft

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Not for merge] Diarization workflow with SpeechBrain #1031

[Not for merge] Diarization workflow with SpeechBrain #1031

desh2608 commented Apr 17, 2023

pzelasko left a comment

desh2608 commented Apr 18, 2023

flyingleafe commented May 16, 2023

desh2608 commented May 16, 2023

flyingleafe commented May 16, 2023

[Not for merge] Diarization workflow with SpeechBrain #1031

Are you sure you want to change the base?

[Not for merge] Diarization workflow with SpeechBrain #1031

Conversation

desh2608 commented Apr 17, 2023

pzelasko left a comment

Choose a reason for hiding this comment

desh2608 commented Apr 18, 2023

flyingleafe commented May 16, 2023

desh2608 commented May 16, 2023

flyingleafe commented May 16, 2023