JT
JT
In chapter 9, I am getting the right loss and accuracy values but when I print out the loss_activation.output[:5] and the dweights and dbiases, I get the wrong values. Here...
> v0.2.7 is fairly old; can you try with the current v0.4.1? > > Also, output from 'collect_env.py' from within the container would be helpful, e.g. > > $ kubectl...
Okay so I followed your example with a few modifications ``` apiVersion: apps/v1 kind: Deployment metadata: name: vllm labels: app: vllm spec: replicas: 1 revisionHistoryLimit: 1 strategy: type: Recreate selector:...
> I'm seeing the same issue > > ``` > python3 -m vllm.entrypoints.openai.api_server --model /model/model.file --port 8001 --trust-remote-code --gpu-memory-utilization 0.95: no such file or directory > ``` > > any...
> @jayteaftw I'm seeing RH has a ubi vllm image, and it does work for me, you might want to try this out as well. `quay.io/rh-aiservices-bu/vllm-openai-ubi9:0.4.2` > > it will...
Hey @tjbck I just wanted to follow up as I have added more functionality to the embedding API to allow for prefixing queries and documents before sending them to an...
Hi has anyone got it to successfully work with vllm?
@tjbck I have tested this locally with an openai compatible embedding engine that I created. Anything else I need to test? The only thing that hasnt been added is a...
Might have came across a solution reference in https://github.com/vllm-project/vllm/issues/5484 Setting ```NCCL_P2P_DISABLE=1``` seems to fix the issue. However as mentioned in the post might cause performance degradation. Follow up, ```NCCL_P2P_LEVEL=NVL``` also...
@mdobbali Hmm I am on 0.7.3. Maybe try setting NCCL_P2P_LEVEL instead of disabling? Also what hardware are you using?