Kyungmin Lee comments

Results 11 comments of


                                            Kyungmin Lee

[BUG] Wrong logits/outputs when using HFOPTLayerPolicy on OPT model

Hi @molly-smith! `replace_with_kernel_inject=True` on OPT (galactica-6.7b) still produces garbage on `deepspeed==0.8.3`. Is this problem being fixed?

[Long seq length] GPT Seq length constrain

Hi, for 1 question, there seems to be a constraint on softmax. https://github.com/NVIDIA/FasterTransformer/issues/663

compile my own backend, libtriton_fastertransformer.so undefined symbol:

@byshiue I have same problem https://github.com/triton-inference-server/fastertransformer_backend#rebuilding-fastertransformer-backend-optional ``` cmake \ -D CMAKE_EXPORT_COMPILE_COMMANDS=1 \ -D CMAKE_BUILD_TYPE=Release \ -D ENABLE_FP8=OFF \ -D BUILD_MULTI_GPU=ON \ -D BUILD_PYT=ON \ -D SM=80 \ -D CMAKE_INSTALL_PREFIX=/opt/tritonserver \...

NotImplementedError: Cannot copy out of meta tensor; no data

related to https://github.com/NVIDIA/TensorRT-LLM/issues/1440

Under the main branch, stress testing the in-flight Triton Server with multiple threads can result in the Triton Server getting stuck.

I have same error on the latest main branch even though trt_overlap is disabled.

Under the main branch, stress testing the in-flight Triton Server with multiple threads can result in the Triton Server getting stuck.

Hi @MrD005 @ARomoH @jellysnack Can you try `use_custom_all_reduce` set disable. related to https://github.com/triton-inference-server/tensorrtllm_backend/issues/390

TensorRT-LLM often hangs using both `tp_size 2` and `enable_context_fmha`.

@PerkzZheng I solved it temporarily. My solution is disable `use_custom_all_reduce` in trtllm-build.

RemoteDisconnected('Remote end closed connection without response')

same error in latest version.

fix: Fix converting EXAONE when using model_weights_loader

Hi, @byshiue can you check this MR? If "exaone" is in model_name.lower(), then --model_dir should point to a path that includes "exaone". For example: [export HF_MODEL_DIR=hf_models/exaone](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/exaone#exaone-deep) On the other hand,...

fix: Fix converting EXAONE when using model_weights_loader

The same condition is being used here too. https://github.com/NVIDIA/TensorRT-LLM/blob/eb2d51a42990b8d0b30bc6c29fad4fd491da749f/tensorrt_llm/models/llama/convert.py#L419-L420