ChristineSeven

Results 21 comments of ChristineSeven

Having the same issues, too.

The same issue, do not know where is wrong.

@mrwyattii yes, this solved! But when I do requests, another issue cames. Would you help to check this? Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib/python3.8/threading.py", line...

> > @mrwyattii yes, this solved! But when I do requests, another issue cames. Would you help to check this? Exception in thread Thread-1: Traceback (most recent call last): File...

the server code is like this: ``` python client = mii.serve( "mistralai/Mistral-7B-v0.1", deployment_name="mistral-deployment", enable_restful_api=True, restful_api_port=28080, ) ```

Whether I use the CUDA graph or not, there is memory issue.

> > I also have problems with a memory leak with vllm 0.2.7. For me it's not limited to Ray but also concerns the API server itself, no matter whether...

Not only in version 0.2.7, I tested 0.2.6, 0.2.3 also has this issue. The reproduce step is start the server and stay for serveral hours, then the issue comes. @zhuohan123...

use your code, i got this error, module 'lightseq.inference' has no attribute 'Llama' . could you tell how you bypass this? @HandH1998