Heng Guo comments

Results 16 comments of


                                            Heng Guo

qat examples with new api

[retest](https://inteltf-jenk.sh.intel.com/job/intel-lpot-validation-top-mr-extension/3950/)

qat examples with new api

skx-8180 batch_size: 100 resnet50: ori: 0.63272 qat: 0.61474 resnet101: ori: 0.63438 qat: 0.54072

qat examples with new api

[retest](https://inteltf-jenk.sh.intel.com/job/intel-lpot-validation-top-mr-extension/4007/artifact/report.html)

h2o for kv cache compression

> Could we add a document introducing what h2o is? add in the example/readme

> Hi [@wenhuach21](https://github.com/wenhuach21) [@n1ck-guo](https://github.com/n1ck-guo), does export for q4_k work right now? I tried to adapt [that](https://github.com/intel/auto-round/blob/9a6f325adf3724537271891ae9240f18e5612382/auto_round/export/export_to_gguf/convert.py#L1145-L1188) for torchao, and tried to serve with vllm `vllm serve ./phi4-mini-torchao-ar-gguf-q4_k-3.8B-Q4_K_S.gguf --tokenizer microsoft/Phi-4-mini-instruct --device...

Heng Guo

qat examples with new api

qat examples with new api

qat examples with new api

qat examples with new api

h2o for kv cache compression

h2o for kv cache compression

Support q2-k to q4-k

Support q2-k to q4-k

Support q2-k to q4-k

unexpected large ram/vram usage for gguf