shifeiwen comments

Results 16 comments of


                                            shifeiwen

Support LLM large models

@quic-mangal In CNN, we often use per-channel or per-layer granularity to quantify the convolution kernel. But the main operation in LLM is matrix multiplication. When performing matrix multiplication, we can...

will this project support on device npu like qualcomm hexagon？

Are there any new updates to this discussion currently?

[Question] decode Func What operations exist between two adjacent operator operations?

update ![image](https://github.com/mlc-ai/mlc-llm/assets/147359299/e4fc42ee-3fa3-4c83-ab9c-2084a8817b2e) I found that there are some small operations in the middle of each operator, and these operations will take a lot of time. I don’t know if these...

[Feature Request] run the LLM model on the Qualcomm Hexagon NPU in Android OS

@FdyCN The problem seems to be that htp backend has many limitations, including the size of memory requested and the speed of memory. However, Qualcomm has promoted in some videos...

[Feature Request] run the LLM model on the Qualcomm Hexagon NPU in Android OS

I have tried to implement 1.1b llama in hexagon backend before and it was very slow because I did not use cpu scheduling and only added hvx compilation instructions when...

[Feature Request] run the LLM model on the Qualcomm Hexagon NPU in Android OS

@FdyCN Yes, there are currently some ways to support mlc running in hexagon backend, but I tested it very slowly. Each token of 1.1b llama takes more than 60s (there...

[Tracking] Create a CPU Compatible PagedKVCache

+1 Some NPUs cannot bind threads like CPUs, so PagedKVCache cannot be used. Is there any latest progress?

Cannot compile with mlc-llm

Same error. Is there any progress on this issue so far? @0x1997

Cannot compile with mlc-llm

@ChenMnZ Do you have any progress or tips on this? It can be that I successfully loaded and the quant weight in mlc.

[Bug] android opencl kernel matmul Too large unroll parameter causes out of memory

Hi, @MasterJH5574 I followed the instructions in gemv and set the loop unrolling value to 8. Running the opencl kernel on 8gen2 does not cause errors. There is no significant...