Andy Dai

Results 12 comments of Andy Dai

Previously, I encountered an issue: When running vLLM(older version) on a machine with multiple GPUs(a CI environment) without specifying `CUDA_VISIBLE_DEVICE`, it keeps allocating memory on one single GPU which then...

That's true.....😥 I tried several ways to deal with the order of args, unfortunately there could always be some corner cases to break it and it is not graceful at...

Cool, let me work on this. Also, in the future, do you think is there a need to clean the logic of the arguments a bit?😥

That's understandable... We need to maintain what users are using already. I don't have good ideas right now tho. Let me just work on this issue first😄

I tried the method you mentioned, but it seems like everytime it needs to pass a model_tag value to parse and previously parsed results are not stored. Let me know...

> > I tried the method you mentioned, but it seems like everytime it needs to pass a model_tag value to parse and previously parsed results are not stored. >...

> > I believe that's the intended function as earlier args take precedence over future ones. How about we create a new namespace object to store the args in each...

I see.😥 But I just wonder if the first issue is there anyway, why bother parsing for multiple times which introduces the second issue. I feel like the current PR...

I see. Then let me look into the `vllm chat/complete` as well. Isn't there unit test on these cases? `pytest tests/test_utils.py` did not fail because of my change locally, it...

My solution deals with `serve` separately, command format for `serve` like below is required. I think before my fix it is already required like this so my solution is somehow...