Alireza

Results 5 comments of Alireza

Hi @QiJune This problem solved with tritonserver 24.07-trtllm-python-py3. I have also another question. tensorrt_llm supports ['attn_q', 'attn_v', 'attn_k', 'attn_qkv', ...] layers in lora. But not support "lm_head". Why?

@hello-11 For example i send this request: ``` curl -X POST my_ip:8000/v2/models/ensemble/generate_stream -d '{"text_input": "hello!, How are you?", "max_tokens":1024, "temperature":0.1, "top_p":0.9, "top_k":1, "repetition_penalty":1.15, "stream":true}' ``` answer is: ``` data: {"context_logits":0.0,"cum_log_probs":0.0,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[0.0,0.0,0.0,0.0,0.0,0.0,0.0],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":""}...

> Can you just use your base model to be model_1? Or are you needing to call model_0 still? > > Might be worth going through the matrix operations to...

when i convert Gemma2, i get this error: ``` Don't know how to rename transformer.model.layers.0.pre_feedforward_layernorm.weight ```

I build gemma 2. but i use sliding window which is a little different with gemma 2 architecture in transformers. you can build gemma 2 with gemma 1 files with...