Carsten Owerfeldt
Results
1
comments of
Carsten Owerfeldt
Here is a working version of call_api that uses the OpenAI client to connect to a model hosted by vLLM using: ``` python -m vllm.entrypoints.openai.api_server \ --model hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4 \ --quantization...