oreomaker comments

Results 35 comments of


                                            oreomaker

windows/app theme does not referesh the system after switching!

The same problem have bothered me for some time. I just found that when you change the SystemUsesLightTheme in the registry directly to false, the taskbar stay the same while...

为什么预填充和解码不能都在 NPU 上运行？

The prefilling stage takes the prompt as input, initializes the KV Cache, and generates the first token, which is compute-bound. The decoding stage, on the other hand, takes each token...

As I mentioned in the [reply](https://github.com/UbiquitousLearning/mllm/issues/118#issuecomment-2291412695), the decoding graph takes input with sequence length of 1, thus another QNN graph should be built, which is costly. As for the NPU,...

Prefill speed is approximately 4~6 tokens/s for Qwen1.5-1.8B

In the npuExe.run function, the QNN graph is built and then executed, during which the building graph stage takes most of the time. The loading and constructing time, as well...

Prefill speed is approximately 4~6 tokens/s for Qwen1.5-1.8B

> > In the npuExe.run function, the QNN graph is built and then executed, during which the building graph stage takes most of the time. The loading and constructing time,...

Android crashed and forcely rebooted when executing main_qwen_npu

It seems that the problem occurs when constructing QNN computing graphs. The program might crash due to lack of memory. It's had to tell what is the reason from your...

Is Subgraph Heterogeneous Compute Available in MLLM?

We have preliminary supported CPU-NPU compute, the modeling code is in examples/main_qwen_npu.hpp. We currently use a direct way to assign different nets to specific backends manually. The automatic backend assignment...

How did you obtain the two model files, qwen-1.5-1.8b-chat-int8.mllm and qwen-1.5-1.8b-chat-q4k.mllm?

The model files are in [here](https://huggingface.co/mllmTeam/qwen-1.5-1.8b-chat-mllm/tree/main). The link in readme is needed to be fixed.

Running demo_qwen_npu.cpp failed when setting chunk_size = 1

In the log, it seems that QNN tried to create the q::SlicePad_shape_inplace for chunk_size=1 but failed. I tried to run it setting `chunk_size=1` with QNN 2.31. It still can output...

PR: Refine ggml-qnn backend(QNN, Qualcomm Neural Network,aka Qualcomm AI Engine Direct) for latest ggml,whisper.cpp,llama.cpp

How do you handle the QNN graph build-execute-free during inference? As we are also integrating the QNN in our framework, the graph building is time consuming and the memory increases...