wchen61 issues

Results 4 issues of


                                            wchen61

[BERT/PyTorch] Unable to reproduce bert benchmark under A100

Hi, I have notice that on A100 80G, bert Phase1 and Phase 2 can have a throughput of 853 and 289 sequences/sec respectively. ![image](https://user-images.githubusercontent.com/24752948/175768130-19f74769-30e6-441c-a87b-b4a01dddcd9d.png) I want to reproduce this result...

bug

[PyTorch/DLRM] __restrict issue

Is this code a typo and should be `half *__restrict grad`? https://github.com/NVIDIA/DeepLearningExamples/blob/b1fc3c46f508201405c76e4811c036b05e9773d7/PyTorch/Recommendation/DLRM/dlrm/cuda_src/dot_based_interact/dot_based_interact_tf32_bwd.cu#L32

[BUG]Unable to quantize Falcon-7b

**Describe the bug** https://huggingface.co/tiiuae/falcon-7b Unable to quantize Falcon-7b model, throws an assertion error. auto_gptq/nn_modules/qlinear/qlinear_exllama.py: line 69 assert infeatures % self.group_size == 0 auto_gptq v0.7.1 transformers 4.40.0

bug

Why not need evict_first Hint in vllm Marlin kernel？

Hi, I want to know why the marlin kernel in vllm do not need evict_first hint when copy B from global to shared memory. https://github.com/vllm-project/vllm/blob/7abba39ee64c1e2c84f48d7c38b2cd1c24bb0ebb/csrc/quantization/gptq_marlin/marlin.cuh#L71 As this optimization is specifically...

stale