NeMo icon indicating copy to clipboard operation
NeMo copied to clipboard

NeMo/tutorials/speaker_tasks/ASR_with_SpeakerDiarization needs confidence estimation

Open jugal-sheth opened this issue 1 year ago • 0 comments

While performing offline ASR_with_SpeakerDiarization the function at nemo>collections>asr>parts>utils>diarization_utils.py ` def convert_word_dict_seq_to_ctm( word_dict_seq_list: List[Dict[str, float]], uniq_id: str = 'null', decimals: int = 3 ) -> Tuple[List[str], str]:

Convert word_dict_seq_list into a list containing transcription in CTM format.

Args:
    word_dict_seq_list (list):
        List containing words and corresponding word timestamps in dictionary format.

        Example:
        >>> word_dict_seq_list = \
        >>> [{'word': 'right', 'start_time': 0.0, 'end_time': 0.34, 'speaker': 'speaker_0'},  
             {'word': 'and', 'start_time': 0.64, 'end_time': 0.81, 'speaker': 'speaker_1'},
               ...],

Returns:
    ctm_lines_list (list):
        List containing the hypothesis transcript in CTM format.

        Example:
        >>> ctm_lines_list= ["my_audio_01 speaker_0 0.0 0.34 right 0",
                              my_audio_01 speaker_0 0.64 0.81 and 0",



ctm_lines = []
confidence = 0
for word_dict in word_dict_seq_list:
    spk = word_dict['speaker']
    stt = word_dict['start_time']
    dur = round(word_dict['end_time'] - word_dict['start_time'], decimals)
    word = word_dict['word']
    ctm_line_str = f"{uniq_id} {spk} {stt} {dur} {word} {confidence}"
    ctm_lines.append(ctm_line_str)
return ctm_lines

` considers confidence as zero is it possible to use confidence estimation to return real confidence for each word

Thank you kind Regards

jugal-sheth avatar Aug 28 '24 10:08 jugal-sheth