Haohang Huang

Results 55 comments of Haohang Huang

> Tried `python3 run.py compare BART --variant facebook/bart-base --working-dir temp` also get error: Collecting Data for onnxrt Traceback (most recent call last): File "run.py", line 297, in main() File "run.py",...

> Thank you for the info! It seems a large batch size is also not supported yet. Could you please confirm? Can you add more information for this? Running with...

@Luckick I see. TensorRT inference includes two phases: (1) engine building (2) execution. The above error shows it's at step 2, and it's because your built engines still have fixed...

Yes, the `as_trt_engine` lines are where the engines got really built. Did you see some log in the notebook like TRT is building the engine (and usually this engine building...

> I see. I was using the old engine built with batch_size=1. > > Another modification: for decode profile, we should have **hidden_dim** = BARTModelTRTConfig.ENCODER_HIDDEN_SIZE[model_name] > > decoder_profile.add( "encoder_hidden_states", min=(batch_size,...

> Is it a command or convenient way to set up the engine for a local checkpoint of fine-tuned bart model, or a customized bart model? The easiest way I...

@eycheung thanks for the contribution, and thanks @Eddie-Wang1120 for the advice (I lost track of what the m=0 error was since the author has modified the PR description I think)...

Hi @shannonphu , yes we're working on it. Right now it's at the stage of adding the C++ runtime. Tentative date for Triton enc-dec support is around mid to late...

> is it also included continuous batching? Our current plan is to reach there by steps: (1) C++ runtime (2) regular Triton support (3) continous batching. Eventually we want to...