Hudayday
Hudayday
Thank you, but are there any available TP+PP examples?
Falcon and other model also suffer from same issues when TP level greater than 4 with int4 quantization
The bug occurs with various models and different types of quantization (including float16) when using tp = 4 or tp = 8. Occasionally, SM Utilization spikes to 100% and the...
> Could you try adding `--use_custom_all_reduce disable` during building engine? The issue still happens when disabling the use_custom_all_reduce. It happens randomly after running hundreds of batch = 1 requests. Each...