MindSearch
MindSearch copied to clipboard
Multi-modal Support
Support for Video and Image understanding in search. I'm looking to use this with VILA. I'd love to use lmdeploy but as vila is not supported I'm wondering how feasible it would be to swap out inference engine to some thing like tiny chat
nvm found https://huggingface.co/OpenGVLab/InternVL2-40B Still wondering if this works with mllm