Yufeng Li issues

Results 5 issues of


                                            Yufeng Li

What is variable-length and comparison with onnxruntime.

Could you explain a little bit more of the support of variable-length? Does it mean the runtime can support inputs with different sequences in a single session, like [batch, 8],...

documentation

do not quantize Relu/Clip if their inputs are not quantized

Fix bug: #12556.

turn on neural_speed by default

### Description the crash caused by the neural_speed turns out to be a very corn case. Turn it on by default. ### Motivation and Context

heap-buffer-overflow while packing weight

N * blks can be odd. There is no need to iterate by 2 and scales[i + 1] / 16 causes heap-buffer-overflow: https://github.com/intel/neural-speed/blob/bc5ee16f73d941afe80914bdf9c9c9523c39c576/bestla/bestla/bestla_prologue_b.h#L460

bug

support of fake backend

The original awq(https://github.com/mit-han-lab/llm-awq) has a fake backend support. Could we add a support of it? It is very useful to save the model as fp16/fp32 and convert them to other...