Resemblyzer icon indicating copy to clipboard operation
Resemblyzer copied to clipboard

After diarization, The timestamps I got are irrelevants from original file

Open teoh79 opened this issue 3 years ago • 3 comments

Hello everybody , first thanks to this community to support the developers.

I tried the resemblyzer diarization and I got irrelevants results on the timestamps for each speaker compare to original files:

For example : 1/ the last timestamps doesn't corresponds to the end time of the wav file even if we speak into the end

2/ is the removing of silence provoque a shift of every timestamps compare to original wav file?

3/ does the original wav is trim out during VAD process or any other one? (Segmentation or clustering...)

Thanks in advance!

teoh79 avatar Apr 30 '22 09:04 teoh79

@teoh79 I have noticed the same issue. The audio length in the output is shorter than the actual audio length.

theashishbhatt avatar May 25 '22 06:05 theashishbhatt

That is probably because the silences of the input audio will be trimmed when preprocess_wav is used. There are similar problems #45 and #63. I am considering trimming the silences in the original audio as well before preprocessing so that it can match the resemblyzer output, which is also mentioned solved in #63, saying that wav is actually the trimmed audio Other than that, hope there are any other solutions.

ConnieZi avatar Jun 17 '22 18:06 ConnieZi

How to extract the timestamps

Nirannoel avatar Nov 16 '22 12:11 Nirannoel