TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

flux-demo failure of TensorRT 10.5 when running a single L40 GPU, how to implement 2-GPUs with L40

Open algorithmconquer opened this issue 1 year ago • 3 comments

Currently, when I run Flux on a device with a single L40 GPU, I encounter an OutOfMemory error. I found another device L40 with two GPUs. How can I implement multi-GPU usage to run flux?

algorithmconquer avatar Oct 17 '24 02:10 algorithmconquer

You can split your model.

lix19937 avatar Oct 18 '24 14:10 lix19937

also cc: @asfiyab-nvidia

yuanyao-nv avatar Oct 18 '24 22:10 yuanyao-nv

@lix19937 How to split the model for this issue?Could you provide relevant codes and resources?

algorithmconquer avatar Oct 21 '24 02:10 algorithmconquer

Like follow:

assume model = cnn_backbone + cnn_neck + transformer_with_cnn_head    
then you can export `cnn_backbone + cnn_neck`   as onnx_a,  
                    `transformer_with_cnn_head` as onnx_b, 
then use trtexec make `onnx_a -> plan_a`  
                      `onnx_b -> plan_b`   

plan_a run at decive 0, plan_b run at device 1.   

More deatiled:
Each ICudaEngine object is bound to a specific GPU when it is instantiated, either by the builder or on deserialization. To select the GPU, use cudaSetDevice() before calling the builder or deserializing the engine. Each IExecutionContext is bound to the same GPU as the engine from which it was created. When calling execute() or enqueue(), ensure that the thread is associated with the correct device by calling cudaSetDevice() if necessary.

lix19937 avatar Oct 22 '24 09:10 lix19937