Alireza comments

Results 5 comments of


                                            Alireza

Different output with transformers lib and tensorrt llm when using lora

Hi @QiJune This problem solved with tritonserver 24.07-trtllm-python-py3. I have also another question. tensorrt_llm supports ['attn_q', 'attn_v', 'attn_k', 'attn_qkv', ...] layers in lora. But not support "lm_head". Why?

problem with output_log_probs

@hello-11 For example i send this request: ``` curl -X POST my_ip:8000/v2/models/ensemble/generate_stream -d '{"text_input": "hello!, How are you?", "max_tokens":1024, "temperature":0.1, "top_p":0.9, "top_k":1, "repetition_penalty":1.15, "stream":true}' ``` answer is: ``` data: {"context_logits":0.0,"cum_log_probs":0.0,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[0.0,0.0,0.0,0.0,0.0,0.0,0.0],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":""}...

Alireza

Different output with transformers lib and tensorrt llm when using lora

problem with output_log_probs

Use 2 Lora in one request

[model support] please support gemma2

[model support] please support gemma2