Fix cnt overflow issue
Dear AIMET team,
In the C++/CUDA quantization kernel, many functions use int cnt. However, for some large models (e.g., LLaMA, Stable Diffusion), cnt can overflow the range of a 32-bit int. Since cnt should always be non-negative, I modified its type to uint64_t. I did consider uint32_t, but it still carries a risk of overflow.
The issue I faced was when I applied quantization to a large ONNX model. The parameter size of a layer exceeded the integer limit, so the calibration process entered an infinite loop.
PR: https://github.com/quic/aimet/pull/4003
Resolved by #4003 / https://github.com/quic/aimet/commit/70029c596cff1d188fcfbc308cc06f99bdff1fdf