JT issues

Results 6 issues of

JT

Failing to start a TGI pod with 2 or more GPUs. Sharding fails.

### System Info Platform: OpenShift Nvidia GPU Operator already installed Image: ghcr.io/huggingface/text-generation-inference:1.4 Device: L40s ``` oc exec -n nvidia-gpu-operator daemonset.apps/nvidia-driver-daemonset-415.92.202403270524-0 -- nvidia-smi Wed May 1 00:12:25 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI...

Podman Error on red hat 9?

I am trying to deploy rayLLM locally, and these are the commands I am running ``` cache_dir=${XDG_CACHE_HOME:-$HOME/.cache} podman run -it --device nvidia.com/gpu=0 --security-opt=label=disable --shm-size 20g -p 8000:8000 -e HF_HOME=/home/ray/data -v...

[Usage]: How do you setup vllm to work in k8s/openshift cluster

### Your current environment Edit 1 ```text Collecting environment information... PyTorch version: 2.2.1+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS:...

usage

feat: Support for instruct/prefixing embeddings

# Changelog Entry ### Description - Adds support for [Sentence Transformer Instructor Models](https://sbert.net/docs/sentence_transformer/pretrained_models.html#instructor-models) and Custom Ollama and OpenAI Compatible APIs that host an instruction embedding model. This will allow for...

[Bug]: Waiting for output from MQLLMEngine. Hangs and then crashes after about an 1 hour

### Your current environment The output of `python collect_env.py` ```text Collecting environment information... PyTorch version: 2.5.1+cu124 Is debug build: False CUDA used to build PyTorch: 12.4 ROCM used to build...

bug

feat: Confirming inputs passed to MCP/tools

### Check Existing Issues - [x] I have searched the existing issues and discussions. ### Problem Description I want to be able to determine what data the LLM wants to...