Wang, Zhe
Wang, Zhe
> Hi @zhewang1-intc, that's great! The first option seem more reasonable. There could be an extras_require like `auto_gptq[itrex]`. Maybe your lowbit GEMM kernel could be default on CPU (instead of...
Hi @Qubitium , We greatly appreciate your interest in QBits. For a comprehensive introduction to QBits, please refer to the [RFC](https://github.com/AutoGPTQ/AutoGPTQ/issues/597). It's worth noting that QBits is still under active...
> I think the current Qbits can replace all parts except 2 and 3 bits of Qigen. Qigen code:https://github.com/IST-DASLab/QIGen/tree/master hi, ITREX will release next version in late may, which support...
Hi, i guess this issue may caused by the shape of compressed-weight is not match with the raw-weight-shape. e.g. model.layers.0.self_attn.q_proj.weight may need 5120\*5120\*sizeof(float) bytes data, but after woq compress, we...
hi, you can use gdb to find out the crash position in this repo, pls refer to this issue https://github.com/intel/intel-extension-for-transformers/issues/944. once you get the crash position, e.g. intel-extension-for-transformers/intel_extension_for_transformers/llm/library/jblas/jblas/jit_blas_wrapper.h:152, you can...
echo, i think this is a valuable feature.
> Hi @PenghuiCheng and @zhewang1-intc. This is incredibly exciting work. I will attempt to find time soon to properly review this PR and to see how well it works on...
@casper-hansen Hi, we are not sure if we have done everything appropriately, but we expect your review. Please let us know if there's anything we can do to improve it...
note: we done this benchmark on INTEL(R) XEON(R) PLATINUM 8592+ with 8-channel 4800MT/s memory.
> Benchmarks are looking good for CPU! Thanks for providing them. Can you address the comments I have left? But I don’t see any new comments 🤔