Hudayday

Results 4 comments of Hudayday

Thank you, but are there any available TP+PP examples?

Falcon and other model also suffer from same issues when TP level greater than 4 with int4 quantization

The bug occurs with various models and different types of quantization (including float16) when using tp = 4 or tp = 8. Occasionally, SM Utilization spikes to 100% and the...

> Could you try adding `--use_custom_all_reduce disable` during building engine? The issue still happens when disabling the use_custom_all_reduce. It happens randomly after running hundreds of batch = 1 requests. Each...