Soumendu Kumar Ghosh

Results 3 issues of Soumendu Kumar Ghosh

**Describe the bug** I am working on quantization of few timm models using Torch FX Graph Mode Quantization. Specifically, I am looking into post training static quantization. For static models...

bug

Testing the impact of KV cache quantization on the performance of llama2 model demonstrates decrease in tokens/sec as the cache bits is reduced. However, the reduction in cache memory is...

The following two package versions, as present in requirements.txt, are not found when using pip install command. ``` torch==2.5.0.dev20240723+cu121 pytorch-triton==3.0.0+dedb7bdf33 ```