meeteval icon indicating copy to clipboard operation
meeteval copied to clipboard

Does mdeval computes DER taking overlapped speech into account?

Open AntoineBlanot opened this issue 2 years ago • 9 comments

Thank you very much for sharing this repository. It is very useful to have a single repo with many audio metrics :)

Tools like pyannote allows us to choose the collar and if we want to compute DER on overlapped speech regions or not.

With mdeval, we can specify the collar but it seems like there is no option for including overlapped speech in the metric or not. Does that mean that by default it computes over overlapped regions? Or are they excluded for the calculations?

Thank you for your answer !

AntoineBlanot avatar Feb 06 '24 06:02 AntoineBlanot

Hi @AntoineBlanot! I had to check that first. md-eval-22.pl has an option -o

-o to include overlapping speech in MD evaluation. With this option, separate recognition passes are made for each reference speaker.

This option is, however, currently not set by our wrapper.

Even if the option is not set, I found the following things:

  • md_eval ignores self-overlap and treats these regions as if the speaker was active continuously. It does warn about self-overlap though.
meeteval.der.md_eval_22(
    meeteval.io.asseglst([
        {'speaker': 'A', 'start_time': 0, 'end_time': 7, 'session_id': 'X', 'words': ''},
        {'speaker': 'A', 'start_time': 4, 'end_time': 10, 'session_id': 'X', 'words': ''},
    ]),
    meeteval.io.asseglst([{'speaker': 'A', 'start_time': 0, 'end_time': 10, 'session_id': 'X', 'words': ''}]),
)
# WARNING:  speaker A speaking more than once at time 4
# WARNING:  speaker A speaking more than once at time 4
# WARNING:  speaker A speaking more than once at time 4
# WARNING:  speaker A speaking more than once at time 4
# {'X': DiaErrorRate(error_rate=Decimal('0.00'), scored_speaker_time=Decimal('10.000000'), missed_speaker_time=Decimal('0.000000'), falarm_speaker_time=Decimal('0.000000'), speaker_error_time=Decimal('0.000000'))}
  • md_eval seems to compute DER for overlapping regions, even if -o is not set (scored_spaker_time is 20).
meeteval.der.md_eval_22(
    meeteval.io.asseglst([
        {'speaker': 'A', 'start_time': 0, 'end_time': 10, 'session_id': 'X', 'words': ''},
        {'speaker': 'B', 'start_time': 0, 'end_time': 10, 'session_id': 'X', 'words': ''},
    ]),
    meeteval.io.asseglst([
        {'speaker': 'A', 'start_time': 0, 'end_time': 10, 'session_id': 'X', 'words': ''},
        {'speaker': 'B', 'start_time': 0, 'end_time': 10, 'session_id': 'X', 'words': ''},
    ]),
)
# {'X': DiaErrorRate(error_rate=Decimal('0.00'), scored_speaker_time=Decimal('20.000000'), missed_speaker_time=Decimal('0.000000'), falarm_speaker_time=Decimal('0.000000'), speaker_error_time=Decimal('0.000000'))}

@boeddeker Do you know anything about the -o option?

thequilo avatar Feb 06 '24 06:02 thequilo

We looked through md-eval-22.pl and found that -o is ignored unless -w (word-mediated alignment) is set, which we currently do not support. md-eval-22.pl evaluates overlap by default. This can be deactivated with -1, ignoring overlapping regions.

MeetEval does currently not set -1, so overlap is always evaluated. Do you need an option to deactivate scoring in overlapped regions?

thequilo avatar Feb 06 '24 09:02 thequilo

@thequilo Thank you very much for your responses !

Your comments were very clear, thank you for your insights!

Being able to deactivate scoring would help yes, as it can indicate if a model is good on non-overlapped regions or not. If this is something that can be implemented, I think that it would be very nice! :)

AntoineBlanot avatar Feb 07 '24 06:02 AntoineBlanot

Sure, it's just an option that has to be passed to md-eval. The naming of such an option is not that easy though. md-eval uses -1, dscore uses ignore_overlap, pyannote skip_overlap, and spyder uses -r, --regions [all|single|overlap|nonoverlap].

I currently prefer --skip-overlap or --ignore-overlap

thequilo avatar Feb 07 '24 09:02 thequilo

I think we have two options (In the future, we may add more DER backends, e.g. pyannote and/or spyder):

  • Use the native options
  • Use an option that will most likely work with all backends that we introduce in the future

I am against --skip-overlap and --ignore-overlap because they don't work with spyder, and they aren't in the md-eval help. Given that md-eval has plenty of options, and it is not clear, which should be supported in the future, I have a small preference for native names, i.e. keep the name from the tool and don't rename something.

boeddeker avatar Feb 07 '24 10:02 boeddeker

I would agree with you if the native name wasn't -1, which is pretty unclear if you don't already know what the option is doing (The first time I saw it, I thought: Why does the script take a negative integer number as input and what does it do? I then looked into the signature, which only mentions files as arguments, and was even more confused). I prefer a more verbose name that people understand without reading the docs first.

If you are against --skip-overlap and --ignore-overlap, I'd prefer spyder's approach with -r [all|nooverlap] where we can add options when we wrap other backends.

thequilo avatar Feb 07 '24 11:02 thequilo

There is always a trade-off between keeping the original name and introducing a new name. While a new name could be cleaner, it can introduce confusion (e.g. see all those ideological changes in pytorch, where it differs from numpy, where at the end you have to know both).

While the -1 is on the first sight not obvious, once you know it, it is clear. When we provide a wrapper, we have to think about, which options we want to simply forward to the user and which we want to sync between all different implementations.

IMHO, the -1 is a special option and not worth to be a standard option, that we want to sync (@TCord mentioned some cases, where you have to be careful with that option). While it is IMHO not worth to be a standard option, it is useful for a user to set this flag. That is the reason why I would forward this option unmodified.

If we rename that option, yes, the long form of spyder is probably the best (short form -r is already blocked).

boeddeker avatar Feb 09 '24 09:02 boeddeker

I also think that keeping the -1 option from md-eval is not necessary. With md-eval, the dscore python wrapper, pyannote and spyder there are already at least four different option names out there. Simply using a verbose option name (e.g. ignore_overlap) should be the best solution, here. nooverlap could be misleading, however, since it could also mean it enforces that there is no overlap.

TCord avatar Feb 09 '24 10:02 TCord

We could also define a long and short form for the option, like --ignore-overlap, -1, so that we have the more verbose variant and the option from md_eval. I think I could live with that.

Some tools also have the following interface (not for this particular option), but I feel like they pollute the interface namespace:

  • --evaluate-overlap evaluate overlap regions (default)
  • --no-evaluate-overlap don't evaluate overlap regions (mutually exclusive with --evaluate-overlap)

Or one of these?

  • --evaluate-overlap [true|false] with the default true.
  • --exclude-overlap

thequilo avatar Feb 09 '24 11:02 thequilo