AttributeError: 'PluginConfig' object has no attribute '_remove_input_padding'. Did you mean: '_remove_input_padding'?
System Info
A10 tensorrt-cu12-10.2.0.post1 tensorrt-cu12-bindings-10.2.0.post1 tensorrt-cu12-libs-10.2.0.post1 tensorrt_llm-0.12.0.dev2024072300 python==3.10
Who can help?
@Tracin
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [X] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
(distil-asr) root@ip-172-31-17-245:~/TensorRT-LLM/examples/whisper# trtllm-build --checkpoint_dir distil_whisper_large_v3_weights_int8/encoder --output_dir distil_whisper_large_v3_int8/encoder --paged_kv_cache disable --moe_plugin disable --enable_xqa disable --max_batch_size 8 --gemm_plugin disable --bert_attention_plugin float16 --remove_input_padding disable --max_input_len 1500
Expected behavior
I'm using distill-large-v3 and I get an error with trtllm-build
actual behavior
[TensorRT-LLM] TensorRT-LLM version: 0.12.0.dev2024072300
[07/29/2024-15:10:41] [TRT-LLM] [W] Implicitly setting PretrainedConfig.n_mels = 128
[07/29/2024-15:10:41] [TRT-LLM] [W] Implicitly setting PretrainedConfig.n_audio_ctx = 1500
[07/29/2024-15:10:41] [TRT-LLM] [W] Implicitly setting PretrainedConfig.num_languages = 100
[07/29/2024-15:10:41] [TRT-LLM] [I] max_seq_len is not specified, using value 2048
Traceback (most recent call last):
File "/opt/conda/envs/distil-asr/bin/trtllm-build", line 8, in
additional notes
Suspect a problem with the version
I've got exactly the same problem after running the whisper example using large v3, windows 10, tensorrt-llm 11.0 on a fresh virtualenv.
why do we need to set 'remove_input_padding disable' ?
why do we need to set 'remove_input_padding disable' ?
I'm guessing we don't, I'm just following the example. But I've tried for example just manually setting input_padding and removing it from the PluginConfig, but then you get errors for every other parameter that is in PluginConfig. It's like a game of whack-a-mole where every time you fix one you get the next. Haven't figured out an overall fix, but I suspect that even if I do there'll be something else that breaks because it does seem like a versioning issue.
@xinliu9451 @tjongsma @Kefeng-Duan, for distill-whisper, would you mind adding model=model.half() here https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/whisper/distil_whisper/convert_from_distil_whisper.py#L60 for now?
The code fix will be synced to github later.
why do we need to set 'remove_input_padding disable' ?
Sorry, remove_input_padding option for distill-whisper would be support in the future.
See https://github.com/NVIDIA/TensorRT-LLM/issues/2118#issuecomment-2292413603 also.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."