Fix cnt overflow issue

Open tairenpiao opened this issue 10 months ago • 1 comments

Dear AIMET team,

In the C++/CUDA quantization kernel, many functions use int cnt. However, for some large models (e.g., LLaMA, Stable Diffusion), cnt can overflow the range of a 32-bit int. Since cnt should always be non-negative, I modified its type to uint64_t. I did consider uint32_t, but it still carries a risk of overflow.

The issue I faced was when I applied quantization to a large ONNX model. The parameter size of a layer exceeded the integer limit, so the calibration process entered an infinite loop.

Jun 25 '25 06:06 tairenpiao

PR: https://github.com/quic/aimet/pull/4003

Jun 25 '25 06:06 tairenpiao

Resolved by #4003 / https://github.com/quic/aimet/commit/70029c596cff1d188fcfbc308cc06f99bdff1fdf

Jun 27 '25 18:06 quic-kyunggeu