oreomaker

Results 35 comments of oreomaker

The same problem have bothered me for some time. I just found that when you change the SystemUsesLightTheme in the registry directly to false, the taskbar stay the same while...

The prefilling stage takes the prompt as input, initializes the KV Cache, and generates the first token, which is compute-bound. The decoding stage, on the other hand, takes each token...

As I mentioned in the [reply](https://github.com/UbiquitousLearning/mllm/issues/118#issuecomment-2291412695), the decoding graph takes input with sequence length of 1, thus another QNN graph should be built, which is costly. As for the NPU,...

In the npuExe.run function, the QNN graph is built and then executed, during which the building graph stage takes most of the time. The loading and constructing time, as well...

> > In the npuExe.run function, the QNN graph is built and then executed, during which the building graph stage takes most of the time. The loading and constructing time,...

It seems that the problem occurs when constructing QNN computing graphs. The program might crash due to lack of memory. It's had to tell what is the reason from your...

We have preliminary supported CPU-NPU compute, the modeling code is in examples/main_qwen_npu.hpp. We currently use a direct way to assign different nets to specific backends manually. The automatic backend assignment...

The model files are in [here](https://huggingface.co/mllmTeam/qwen-1.5-1.8b-chat-mllm/tree/main). The link in readme is needed to be fixed.

In the log, it seems that QNN tried to create the q::SlicePad_shape_inplace for chunk_size=1 but failed. I tried to run it setting `chunk_size=1` with QNN 2.31. It still can output...

How do you handle the QNN graph build-execute-free during inference? As we are also integrating the QNN in our framework, the graph building is time consuming and the memory increases...