jpyo0803 issues

Repositories
Issues
Comments

Results 3 issues of


                                            jpyo0803

How to quantize llama3?

Hi, I am wondering how to quantize llama3-8B with smoothquant. What dataset did you use to generate activation scale? Or do you plan to upload act_scales, model weights (to huggingface),...

Is matrix multiplication with 32-bit integer inputs / output possible with triton?

I am wondring if a matrix multiplication with 32-bit integer inputs / output possible with triton?

What transformers version is "int_llama_layer.py" based on?

Hi, I am wondering what "transformers" version you did refer to write "int_llama_layer.py"? Thanks in advance!