Holden X

Results 16 issues of Holden X

Even though some neurons in the FFN have been split onto the GPU, their weights are still retained on the CPU in current implementations, which is unnecessary and occupies an...

enhancement

1. For FFN networks, an unnecessary synchronization is introduced between the CPU and GPU hybrid computing of FC1 and FC2. 2. In the open-sourced code, the selective synchronization is not...

enhancement

As for now, PowerInfer uses CUDA cores for sparse operator computation, which is not efficient for prompt phase computation. In order to further support multi batch services, PowerInfer plans to...

enhancement

Related issues/proposals: - [ ] #95 - [ ] #96 - [ ] #97

tracker

PowerInfer currently optimizes for LLMs (Large Language Models) that utilize the ReLU activation function, leveraging their internal activation locality. However, many of the trending models do not use ReLU activation,...

tracker

As we embark on the initial phase of PowerInfer's development, our primary goal is to introduce the hybrid inference feature across all major desktop hardware and software platforms. Our current...

tracker

To fully harness the power of Mac, especially on M Chips, integrating Metal backend is key. The core task ahead is adapting our key sparse operators, including `mul_mat_sparse` and `axpy`,...

tracker

PowerInfer encountered unexpected errors in WSL, mostly due to CUDA APIs. Related issues: #42, #46, #63. Since our WSL test bed has been set up, we can try to reproduce...

tracker

After releasing online FFN offloading, we have found new issues in: - [x] Decoding bug: #77. - [x] Python module issue: #55, #78. - [ ] Inaccuracy when offloading under...

tracker

It requires modifying a series of POSIX API calls that are supported by MSVC to build PowerInfer on Windows under CPU inference and hybrid inference mode, including: * Atomic operations...

tracker