Mengzhao Chen comments

Results 26 comments of


                                            Mengzhao Chen

The final sparsity is small than preset.

I solved the problem with the following code ``` temp = new_mask.flatten() i = 0 for index,m in enumerate(idx): if not temp[m]: i += 1 if i == total_regrowth: break...

[ptb perplexity is different from paper]

I also meet this problem.

Question: CUDA memory usage in the evaluation phase

I also encountered this problem. Did you solve it later?

from deit.datasets import get_post_process

Sorry for this omission. I have added `get_post_process` into the `deit.datasets`.

can this support lower bit quant?

I am also curious about this.

Questions regarding Infusing Omniquant into MLC

Hi, Sorry for the late response. I am so busy recently. To integrate OmniQuant into MLC, you just save the fake quantization model. And than, you can quantized the fake...

Llama-3-8B

@hsb1995 LLaMA-3-8B uses GQA (Group Query Attention), which is not supported by current ‘let’.

Why is the compressed file one file instead of the pre trained weights, where there are many files for training the mode

We only save the parameters for `let` and `lwc` for training. To save the quantized model, set the `--save_dir` argument.

When reproducing evaluation results for Llama-2-13b w4a4, I got nan

@linloong @FelixMessi @NewDriverLee Sorry for the late response. The Llama-2-13B W4A4 checkpoint is destroyed due to some instability. We have retrain the Llama-2-13B on the latest code and update the...

AutoGPTQ or AutoGPTQ-bugfix?

AutoGPTQ-bugfix is ok. Sorry for previous confusion, the official AutoGPTQ repo have merged the "zeros +- 1" solution before. However, the solution was reverted due to some incompatibility, please refer...