QZH-eng
QZH-eng
## Description I tried to convert the Flux Dit model on L40S with TensorRT10.5, and found that the peak gpu memory exceeded 46068MiB, but 23597MiB gpu memory was occupied during...
## Description When I used TensorRT 10.5 to infer Flux Dit on A800 using BF16 dataType, I found that there was a significant decrease in accuracy, while there was no...
### Description Hi When I called single_prefill_with_kv_cache, a large number of zeros appeared. At the same time, I compared it with the eager implementation and found that the same position...