shyringo comments

Results 10 comments of


                                            shyringo

Are parameters "use_decomp" and "use_esd" meanningless for function "detect_anoms"?

I've also discovered this issue. Wondering if aynone is interested in solving it. It would only take a few judgements and lines.

Is there a way to terminate vllm.LLM and release the GPU memory

> I find that we need to explicitly run "del llm.llm_engine.driver_worker" to release in when using a single worker. Can anybody explain why this is the case? I tried the...

Is there a way to terminate vllm.LLM and release the GPU memory

> I tried the above code block and also this line "del llm.llm_engine.driver_worker". Both failed for me. > > But I managed, with the following code, to terminate the vllm.LLM(),...

Is there a way to terminate vllm.LLM and release the GPU memory

> Tried this including `ray.shutdown()` but the memory is not released on my end, any other suggestion? could try the "del llm.llm_engine.model_executor" in the following code instead: > update: the...

Is there a way to terminate vllm.LLM and release the GPU memory

> did that as well, still no change in gpu memory allocation. Not sure how to go further Then I do not have a clue either. Meanwhile, I should add...

vLLM stops all processing when CPU KV cache is used, has to be shut down and restarted.

> > this issue makes vllm impossible for production use > > At present, we have found a workaround and set the swap space directly to 0. This way, we...

GPU KV cache usage: 100.0%以后就卡住

Met the same issue in Offline Batched Inference. Wouldn't continue when stuck in the line `LLM()`. GPU memory usage was occupied, but GPU utilization was 0%.

vllm hangs when reinitializing ray

#1908 might be related, but in 'Offline Batched Inference' mode.

[QUESTION] Backend nccl does not support reduce_scatter_tensor_coalesced, how could I solve it

Same error while slime was using megatron to train model. Detailed logs: [36m(MegatronTrainRayActor pid=57143)[0m rollout 0: {'rollout/raw_reward': 0.46875, 'rollout/total_lengths': 6880.0625, 'rollout/response_lengths': 6724.4375, 'rollout/rewards': 3.725290298461914e-09, 'rollout/truncated': 0.515625, 'rollout/rollout_log_probs': -0.3078222069889307, 'rollout/ref_log_probs': -0.30862119793891907,...