[bug] Audio stream named as WAV but encoded as WEBM.

Open Kamal-Eldin opened this issue 7 months ago • 0 comments

Description

The current audio encoding process falsely applies wav encoding to streaming audio. However, in reality the produced audio segments are matroska (webm) encoded.

This results in breaking errors in the backend and downstream external applications due to misaligned expectations caused by false input formats.

Expected Behavior

The frontend should acquire the raw PCM audio data from the input stream into a buffer which then to be encoded; client-side, into wav format; using a high sampling rate to preserve input quality.

The exported data to hugging face should be compatible with the hugging face's datasets[audio] api for encoding and decoding.

The audio file header bytes should be similar to 'RIFF' signifying a wav file

Actual Behavior

The current code saves the audio stream into a basic bold, declared with a MIMETYPE {audio/wav}, however this does NOT apply the necessary wav encoding. Instead, the client browser owns the encoding process, in case of chrome, it produces WebM files.

The header bytes '\x1aE\xdf\xa3' signify a matroska mkv or webm encoded audio

Impact

This may cause tools and scripts that expect WAV audio to fail or produce errors, and may mislead users about the actual format of their data
WebM or matroska are lossy formats reducing the quality of the training data
This breaks the hugging face datasets audio api

Proposed solution

Remove the mediaRecorder
Capture input stream into raw PCM data
Encode PCM data into wav

Sep 23 '25 08:09 Kamal-Eldin