[bug] Audio stream named as WAV but encoded as WEBM.
Description
The current audio encoding process falsely applies wav encoding to streaming audio. However, in reality the produced audio segments are matroska (webm) encoded.
This results in breaking errors in the backend and downstream external applications due to misaligned expectations caused by false input formats.
Expected Behavior
The frontend should acquire the raw PCM audio data from the input stream into a buffer which then to be encoded; client-side, into wav format; using a high sampling rate to preserve input quality.
The exported data to hugging face should be compatible with the hugging face's datasets[audio] api for encoding and decoding.
The audio file header bytes should be similar to
'RIFF'signifying a wav file
Actual Behavior
The current code saves the audio stream into a basic bold, declared with a MIMETYPE {audio/wav}, however this does NOT apply the necessary wav encoding.
Instead, the client browser owns the encoding process, in case of chrome, it produces WebM files.
The header bytes
'\x1aE\xdf\xa3'signify a matroskamkvorwebmencoded audio
Impact
- This may cause tools and scripts that expect WAV audio to fail or produce errors, and may mislead users about the actual format of their data
- WebM or matroska are lossy formats reducing the quality of the training data
- This breaks the hugging face datasets audio api
Proposed solution
- Remove the mediaRecorder
- Capture input stream into raw PCM data
- Encode PCM data into wav