Feature Request: 如何支持视觉大模型推理

Open youxiudeshouyeren opened this issue 6 months ago • 1 comments

[x] I am running the latest code. Mention the version if possible as well.
[x] I carefully followed the README.md.
[x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
[x] I reviewed the Discussions, and have a new and useful enhancement to share.

有几个问题请教一下： 1）当前支持视觉大模型在npu上的调度吗 2）似乎没有在文档或者其他地方看见有关性能测试的数据，例如运行prompt=64 or 128 的qwen 1B 2B模型这样的数据

希望支持VLM的运行，加速

No response

Jul 24 '25 10:07 youxiudeshouyeren

当前支持视觉大模型在npu上的调度吗

这个backend在hexagon的npu上实现了部分op，所以只要是支持的op，会跑在npu上

似乎没有在文档或者其他地方看见有关性能测试的数据

现阶段有针对op的测试，可以参考下我发在这个discussion里面的comment https://github.com/ggml-org/llama.cpp/discussions/8273#discussioncomment-13274821

Jul 30 '25 08:07 chraac