Adding New Features to LLMUnity
Describe the feature
I made a few features I am going to open source during the week for LLMUnity under the name Project Replicant, this includes
- A system which automatically saves and load conversations in ChatML format
- A way to easily turn a folder of chatml info into the AI
- A shogtounge encoder to compress text (Still WIP)
- RVC based voice output (WIP)
- Whisper based voice input
I also wanted to know if you might add support for multimodal LMs like
https://huggingface.co/NousResearch/Obsidian-3B-V0.5
I'm willing to actively help with development to push LLMUnity further, regardless of if this is a near, far or not plan at all!
Sounds amazing, looking forward to seeing your work! Feel free to also create a PR at any point to add anything missing from LLMUnity. How would you envision the support of multimodal LLMs? For instance having functionality to input/output images?
Thank you! As we speak i'm working on finishing the Shogtounge encoder!
I'm hoping that something like (https://huggingface.co/nisten/obsidian-3b-multimodal-q6-gguf) to run locally!. For now it would start with text + image outputs (With the image having the options as being sent as a path or bytes) with the output being text and a texture. From there it would move to audio and video as well with better texture!
Personally, i'm interested in also seeing if pose animation is possible too (although at a later date)!
I'm closing this issue, because we have another one (#134 ) for image input. Let me know if you have other feature requests :)