Fireblade2534

Results 62 comments of Fireblade2534

When I use this command: `vllm serve Qwen/Qwen3-30B-A3B-FP8 --tensor-parallel-size 2 --enable-reasoning --reasoning-parser deepseek_r1 --host 0.0.0.0 --port 6060` I get this error: ``` [multiproc_executor.py:470] ValueError("type fp8e4nv not supported in this architecture....

@wangjia184 why not use the api_key param so it is compatible with openai specs and API design in general

@wangjia184 https://platform.openai.com/docs/api-reference/authentication @RBEmerson970 I think that having the option for authentication is a good idea as long as it can be disabled. Also the implementation in this pr is not...

Agreed authentication should be optional

How is this an issue and it achieves concurrency because every request is run in a different thread ( this is default fast-api behaviour). As far as I know this...

I also do not even have cuda 12.4 installed

@shivarajd does #350 fix your issue. (U will have to clone the branch to test btw)

@shivarajd The demo here: https://huggingface.co/spaces/Remsky/Kokoro-TTS-Zero is quite old. Do the voices sound muffled when you run the api locally. Also as far as I know kokoro was not really trained...

@shivarajd Can you please give me some examples and test sentecnes