Wang, Zhe comments

Results 31 comments of


                                            Wang, Zhe

[RFC] options about low-bit GEMM kernels contribution on x86 CPUs

> Hi @zhewang1-intc, that's great! The first option seem more reasonable. There could be an extras_require like `auto_gptq[itrex]`. Maybe your lowbit GEMM kernel could be default on CPU (instead of...

[DEPRECATION] Discussion on Fused attention and QiGEN

Hi @Qubitium , We greatly appreciate your interest in QBits. For a comprehensive introduction to QBits, please refer to the [RFC](https://github.com/AutoGPTQ/AutoGPTQ/issues/597). It's worth noting that QBits is still under active...

[DEPRECATION] Discussion on Fused attention and QiGEN

> I think the current Qbits can replace all parts except 2 and 3 bits of Qigen. Qigen code:https://github.com/IST-DASLab/QIGen/tree/master hi, ITREX will release next version in late may, which support...

Can't load woq int4 model

Hi, i guess this issue may caused by the shape of compressed-weight is not match with the raw-weight-shape. e.g. model.layers.0.self_attn.q_proj.weight may need 5120\*5120\*sizeof(float) bytes data, but after woq compress, we...

Deploying on virtual machines?

hi, you can use gdb to find out the crash position in this repo, pls refer to this issue https://github.com/intel/intel-extension-for-transformers/issues/944. once you get the crash position, e.g. intel-extension-for-transformers/intel_extension_for_transformers/llm/library/jblas/jblas/jit_blas_wrapper.h:152, you can...

enhancement: better tts

echo, i think this is a valuable feature.

Support Weight-Only quantization on CPU device with QBits backend

> Hi @PenghuiCheng and @zhewang1-intc. This is incredibly exciting work. I will attempt to find time soon to properly review this PR and to see how well it works on...

Support Weight-Only quantization on CPU device with QBits backend

@casper-hansen Hi, we are not sure if we have done everything appropriately, but we expect your review. Please let us know if there's anything we can do to improve it...

Support Weight-Only quantization on CPU device with QBits backend

note: we done this benchmark on INTEL(R) XEON(R) PLATINUM 8592+ with 8-channel 4800MT/s memory.

Support Weight-Only quantization on CPU device with QBits backend

> Benchmarks are looking good for CPU! Thanks for providing them. Can you address the comments I have left? But I don’t see any new comments 🤔