Missing spaces after punctuation when merging speaker sentences
Caution! This issue report was written by AI on behalf of a developer who does not know Python. Still I hope its useful ;-)
Description
When speaker identification is enabled and "Merge speaker sentences" is checked, words are concatenated without spaces after punctuation marks (periods, commas, etc.). This causes text like "ok.Yes" instead of "ok. Yes", and is especially noticeable when the first word of a new speaker appears at the end of the previous speaker's line.
Steps to Reproduce
- Transcribe an audio file with multiple speakers
- Enable speaker identification
- Check "Merge speaker sentences"
- Save the speaker labels
- Observe the merged transcript
Expected Behavior
Text should have proper spacing after punctuation marks:
- "ok. Yes" (with space after period)
- "ok, but" (with space after comma)
- "example. That" (with space after period)
Actual Behavior
Text is concatenated without spaces after punctuation:
- "ok.Yes" (no space after period)
- "ok,but" (no space after comma)
- "example.That" (no space after period)
This is especially visible when:
- The first word of a new speaker appears at the end of the previous speaker's line
- Multiple sentences are merged within the same speaker
Root Cause
The issue occurs in buzz/widgets/transcription_viewer/speaker_identification_widget.py at line 285, where punctuation is added to words without ensuring a space follows:
word += labeled_tuple[1] # Punctuation is added without ensuring a space follows
When get_sentences_speaker_mapping() (line 299) concatenates these words, it joins them directly without adding spaces after punctuation marks. This results in words being concatenated like "ok.Yes" instead of "ok. Yes".
The merge code at line 602 (previous_segment['text'] += " " + entry['text']) correctly adds spaces between segments, but the problem exists within individual entry['text'] values that are already incorrectly formatted by get_sentences_speaker_mapping().
Environment
- Buzz version: [latest version]
- OS: [Windows 11]
- Language: All languages (tested with Dutch, but likely affects all languages)
Additional Context
This issue was reported after testing speaker identification functionality. The problem occurs consistently, not just at speaker boundaries, but becomes more visible there because incorrect words appear on the wrong speaker's line in the UI.