TensorRT-LLM
TensorRT-LLM copied to clipboard
why fp8_e4m3 min_scaling_factor divide 512?
https://github.com/NVIDIA/TensorRT-LLM/blob/main/cpp/tensorrt_llm/common/cudaFp8Utils.cu#L219 constexpr float min_scaling_factor = 1.0f / (FP8_E4M3_MAX * 512.f); why is it 512?
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."
This issue was closed because it has been stalled for 15 days with no activity.