LLMUnity Adding New Features to LLMUnity

Describe the feature

I made a few features I am going to open source during the week for LLMUnity under the name Project Replicant, this includes

A system which automatically saves and load conversations in ChatML format
A way to easily turn a folder of chatml info into the AI
A shogtounge encoder to compress text (Still WIP)
RVC based voice output (WIP)
Whisper based voice input

I also wanted to know if you might add support for multimodal LMs like

https://huggingface.co/NousResearch/Obsidian-3B-V0.5

I'm willing to actively help with development to push LLMUnity further, regardless of if this is a near, far or not plan at all!

May 12 '24 18:05 TKTSWalker

Sounds amazing, looking forward to seeing your work! Feel free to also create a PR at any point to add anything missing from LLMUnity. How would you envision the support of multimodal LLMs? For instance having functionality to input/output images?

May 13 '24 07:05 amakropoulos

Thank you! As we speak i'm working on finishing the Shogtounge encoder!

I'm hoping that something like (https://huggingface.co/nisten/obsidian-3b-multimodal-q6-gguf) to run locally!. For now it would start with text + image outputs (With the image having the options as being sent as a path or bytes) with the output being text and a texture. From there it would move to audio and video as well with better texture!

Personally, i'm interested in also seeing if pose animation is possible too (although at a later date)!

May 13 '24 21:05 TKTSWalker

I'm closing this issue, because we have another one (#134 ) for image input. Let me know if you have other feature requests :)

Jul 11 '24 06:07 amakropoulos