Yoga Sri Varshan V

Results 6 comments of Yoga Sri Varshan V

I think you should use model_save_quantized_weights(model) to save the quantized weights.

By scaling factor I guess you mean alpha?I tried setting up the quantized bits() with integer=bits-1 and it worked with alpha=1,but when I set it to none,shouldn't it automatically take...

So any idea about what the integer parameter really specifies? It is said to be the number of bits left of the decimal point. I get the scaling factor point,but...

So I am not to going to get a 7 bit number(8th bit - sign) as weight when I use quantized_bits(bits=8,integer=7) right? Instead I'll be getting a scaled down version...

Thanks for the replies @jurevreca12! So we now have the quantized weights that have to be scaled up to integer weights in case of INT8 or INT4 GPU Inferencing.In NVIDIA...

By the way @jurevreca12, is there any way to see what scale is being used for lets say some kernel quantizer or bias quantizer?