Lei Zhang

Results 11 comments of Lei Zhang

We really need this feature.🔥Does anyone know if there are any alternatives that can replace this project?

Can we use `transformer` directly to enable 128k context inference without deploying with vLLM?

I also want to know the difference between the two of them, have you figure it out?

@ytxmobile98 I think you need to set the `--max-model-len` to a larger number, like 8192. BTW, you may check the log file to locate the issues.

> it can solve the bug: export LD_LIBRARY_PATH=/data/home/user/anaconda3/envs/vllm/lib/python3.10 /site-packages/nvidia/nvjitlink/lib:$LD_LIBRARY_PATH Very helpful.

> Hi [@Hambaobao](https://github.com/Hambaobao) , could you check if this works when using `litellm` directly? > > ``` > messages = [ > {"role": "system", "content": "Respond in pirate speak."}, >...

Hi, I also have the same need. I hope to store the `hidden_states` during model inference so that I can conduct some interpretability research.

没人提 PR 吗?那我提一个吧

Hi @enyst , thank you very much for your response, I'll see what I can do.