Wenhua Cheng
Wenhua Cheng
> Found intel's results of quantization test of lm-head. There was minimal accuracy loss: > > @wenhuach21 Do you know how much ram/vram the intel llama3-8B lm_head quantized test saved...
In my opinion, using perplexity (ppl) as a metric might not be optimal due to its sensitivity to outliers, and a high perplexity score doesn't necessarily indicate poor model performance....
> I change wikiText2 as calibration dataset, then the ppl is 6.62,which use c4 as calibration dataset is 17.5. Now the ppl can basically met expectations. I didn't realize before...
> > Using the same dataset for both calibration and evaluation could lead to overfitting, resulting in poor generalization when applied to other tasks. We have already observed this issue...
Due to different algorithms may not share the same best hyperparameter, we have tried others. The main difference is we only use the second to last layer for distillation and...
When evaluating my local llama3 quantized checkpoint, I found several issues. 1 When using AutoGPTQForCausalLM to load the model without quantized LM-head, some weights remain unloaded, unlike when using AutoModelForCausalLM...
Hello, may I know when this PR will be merged? We are preparing to release the new version of AutoRound, and we would like autogptq to support lm-head quantization.
> @wenhuach21 I don't have the authority to merge this. Waiting for @fxmarty or @PanQiWei > > @fxmarty This is a 3 line PR (exlcuding comments/tests) which adds `lm_head` config...
As discussed, bias shift will be a separate module, so this pr will not be merged
I don't have any questions about the PR, but would you mind if I ask several questions about the algorithm? I've only had a quick look at several papers in...