Request for demo on using emotion2vec with Speech + Text modality
Hello there! I'm currently trying to use the emotion2vec for sentiment analysis tasks and appreciate your work. After reading related papers and documentation, I noticed that you have provided instructions on how to predict using speech or text modal data separately.
However, I am also interested in understanding how to combine both speech and text data (i.e., Speech + Text) for multimodal emotion prediction. According to my findings from literature, this seems like an important application scenario.
Therefore, could you please provide a simple example demonstrating how to integrate these two modalities of data and run the model? I believe this would be highly beneficial for other users as well.
Thank you!
I was wondering the same thing. Any results yet, please?
You can refer to Shi et al.'s(2020) and (2023) papers. We reproduced their methods to align with their numbers.
Is there a plan to open source the speech+text model?
Sorry that we don't have the plan. You can reproduce it.