shyringo
shyringo
I've also discovered this issue. Wondering if aynone is interested in solving it. It would only take a few judgements and lines.
> I find that we need to explicitly run "del llm.llm_engine.driver_worker" to release in when using a single worker. Can anybody explain why this is the case? I tried the...
> I tried the above code block and also this line "del llm.llm_engine.driver_worker". Both failed for me. > > But I managed, with the following code, to terminate the vllm.LLM(),...
> Tried this including `ray.shutdown()` but the memory is not released on my end, any other suggestion? could try the "del llm.llm_engine.model_executor" in the following code instead: > update: the...
> did that as well, still no change in gpu memory allocation. Not sure how to go further Then I do not have a clue either. Meanwhile, I should add...
> > this issue makes vllm impossible for production use > > At present, we have found a workaround and set the swap space directly to 0. This way, we...
Met the same issue in Offline Batched Inference. Wouldn't continue when stuck in the line `LLM()`. GPU memory usage was occupied, but GPU utilization was 0%.
#1908 might be related, but in 'Offline Batched Inference' mode.
Same error while slime was using megatron to train model. Detailed logs: [36m(MegatronTrainRayActor pid=57143)[0m rollout 0: {'rollout/raw_reward': 0.46875, 'rollout/total_lengths': 6880.0625, 'rollout/response_lengths': 6724.4375, 'rollout/rewards': 3.725290298461914e-09, 'rollout/truncated': 0.515625, 'rollout/rollout_log_probs': -0.3078222069889307, 'rollout/ref_log_probs': -0.30862119793891907,...