JT comments

Results 14 comments of

JT

Outputs not matching on newer versions of numpy.

In chapter 9, I am getting the right loss and accuracy values but when I print out the loss_activation.output[:5] and the dweights and dbiases, I get the wrong values. Here...

[Usage]: How do you setup vllm to work in k8s/openshift cluster

> v0.2.7 is fairly old; can you try with the current v0.4.1? > > Also, output from 'collect_env.py' from within the container would be helpful, e.g. > > $ kubectl...

[Usage]: How do you setup vllm to work in k8s/openshift cluster

Okay so I followed your example with a few modifications ``` apiVersion: apps/v1 kind: Deployment metadata: name: vllm labels: app: vllm spec: replicas: 1 revisionHistoryLimit: 1 strategy: type: Recreate selector:...

[Usage]: How do you setup vllm to work in k8s/openshift cluster

> I'm seeing the same issue > > ``` > python3 -m vllm.entrypoints.openai.api_server --model /model/model.file --port 8001 --trust-remote-code --gpu-memory-utilization 0.95: no such file or directory > ``` > > any...

[Usage]: How do you setup vllm to work in k8s/openshift cluster

> @jayteaftw I'm seeing RH has a ubi vllm image, and it does work for me, you might want to try this out as well. `quay.io/rh-aiservices-bu/vllm-openai-ubi9:0.4.2` > > it will...

Add `/api/v1/embeddings` endpoint for 100% OpenAI compatibility

Hey @tjbck I just wanted to follow up as I have added more functionality to the embedding API to allow for prefixing queries and documents before sending them to an...

Codex with gpt-oss:120b model appears to forget the task

Hi has anyone got it to successfully work with vllm?

feat: Support for instruct/prefixing embeddings

@tjbck I have tested this locally with an openai compatible embedding engine that I created. Anything else I need to test? The only thing that hasnt been added is a...

[Bug]: Waiting for output from MQLLMEngine. Hangs and then crashes after about an 1 hour

Might have came across a solution reference in https://github.com/vllm-project/vllm/issues/5484 Setting ```NCCL_P2P_DISABLE=1``` seems to fix the issue. However as mentioned in the post might cause performance degradation. Follow up, ```NCCL_P2P_LEVEL=NVL``` also...

[Bug]: Waiting for output from MQLLMEngine. Hangs and then crashes after about an 1 hour

@mdobbali Hmm I am on 0.7.3. Maybe try setting NCCL_P2P_LEVEL instead of disabling? Also what hardware are you using?