Rane2021

Results 8 comments of Rane2021

Hi @JoeLoser , I had done as your steps, but I still get error: 06:06:24.438 INFO: 3209999 MainThread: max.serve: Settings: api_types=[, ] offline_inference=False host='0.0.0.0' port=8000 metrics_port=8001 allowed_image_roots=[] max_local_image_bytes=20000000 logs_console_level='INFO' logs_otlp_level=None...

From the error message, I can see that I didn't use the modified source code, but still used your compiled max. "./bazelw build //...". After the compilation, how can I...

@zRzRzRzRzRzRzR 请问 vllm 工具调用这个问题能解决吗?想接入很多agent 应用都受限,非常影响模型使用

> You can visit https://huggingface.co/models?search=gptq to download our DeepSeek R1 distlled 7B model but we currently do not provide the full R1 model. You can use our toolkit to quantize...

One more question, have you tested if there are any issues with DeepSeek R1 GPTQ inference? Can it be used for inference with the `vllm serve --quantization gptq` method?