Kelei Jiang comments

Results 5 comments of


                                            Kelei Jiang

How to load multiple Lora weights and multiple text inputs to inference？

``` python build.py --model_dir /workspace/qllama-7b-chat \ --dtype float16 \ --remove_input_padding \ --use_gpt_attention_plugin float16 \ --enable_context_fmha \ --use_gemm_plugin float16 \ --output_dir "/tmp/new_lora_7b/trt_engines/fp16/2-gpu/" \ --max_batch_size 1 \ --max_input_len 512 \ --max_output_len 50...

How to load multiple Lora weights and multiple text inputs to inference？

Thank you, but when is it expected to support loading multiple Lora weights？

[Roadmap] vLLM Roadmap Q3 2024

Do you have plans to support Ascend 910B in the future?

[Hardware][Ascend] Add Ascend NPU backend

感谢对国产化的支持！

onnx inference code

https://github.com/om-ai-lab/OmDet/blob/main/omdet/omdet_v2_turbo/infer_model.py#L66 Thank you very much for your issue. 1. The exported onnx model does not include image preprocessing and NMS post-processing. 2. And because onnx requires the input to be...