O.T

Results 9 comments of O.T

@wilson1yan Can you share the shell/bash script for setting up the inference server via vLLM for PyTorch model, FP16? > If using vLLM for inference (PyTorch model, FP16), I believe...

Also just commenting to prevent closure of the issue since it is one that I am also tracking!

Restarted, still dont see anything ... (on windows)

i dont think ive ever went over the 120k token limit, and it also works once i restart my vscode for some reason...

So does vLLM support it now or not?