Carsten Owerfeldt

Results 1 comments of Carsten Owerfeldt

Here is a working version of call_api that uses the OpenAI client to connect to a model hosted by vLLM using: ``` python -m vllm.entrypoints.openai.api_server \ --model hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4 \ --quantization...