Support audio in multimodal messages
Is your feature request related to a problem? Please describe.
LLM APIs have started supporting audio input, so it would be beneficial for RAIMultimodalMessages to support audio as well.
Describe the solution you'd like MultimodalMessage class (https://github.com/RobotecAI/rai/blob/5d3a8f33f20e6ccfebf4fecceb3ef7d2bc70d0d1/src/rai/rai/messages/multimodal.py#L38) should support audio input.
Describe alternatives you've considered
This is the only suitable solution within the current architecture.
Additional context
from the issue I understood that the changes are to mede in the messages/multimodal.py and the changes to be made are:
- delete the
if self.audios not in [None, []]:check that was blocking audio support - add support for base64 encoded audio files in the
__init__method - create audio content entries similar to how images are handled using appropriate mime type for audio (e.g "audio/wav")
should i create a pull request with these changes?
please assign this issue. ill work on it and create a pr If i'm missing out on something, please let me know
Hi @mdimado, yes, please feel free to create a PR for this task! A fully completed implementation should include:
- A preprocess_audio function, similar to preprocess_image, to handle conversion of various audio formats (e.g., .mp3, .wav, np.array with sampling rate) into a standard format accepted by multimodal vendors.
- Validation to ensure the model can process and understand the provided audio content (e.g., compatibility with gpt-4o-audio-preview).
Let me know if you need any further clarification or assistance (here and/or on discord)
thanks for the clarification and additional details. after reviewing the task, i realize implementing the preprocess_audio function and handling validations might need more learning on my part. to ensure timely and high-quality work, i think someone with more expertise could handle this better. apologies for the inconvenience, and i kindly request to unassign myself for now.
Hey @mdimado, no worries at all! We're all here to learn and grow together—that's what makes this such a great environment. 😊 Feel free to tackle any part of the work you're comfortable with, and don't hesitate to ask for guidance along the way. We’re always happy to help and support you through the process. Looking forward to it! 🚀
@mdimado I have created sub-issues based on your task description: https://github.com/RobotecAI/rai/issues/373 feel free to comment under it so I can assign you.
Due to widespread lack of support for audio, we are postponing this feature.