verl icon indicating copy to clipboard operation
verl copied to clipboard

[env] fix: Move uvloop==0.21.0 dependency to requirement files for avoiding task errors

Open duesdues opened this issue 3 months ago • 6 comments

What does this PR do?

The installation of an incompatible high version of uvloop is causing coroutine and asynchronous task failures. To resolve this problem, uvloop==0.21.0 should move to requirements-npu.txt and requirements.txt. Reference related issue: https://github.com/volcengine/verl/issues/3806.

Checklist Before Starting

  • [x] Search for similar PRs. Paste at least one query link here: ...
  • [x] Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

Not related.

API and Usage Example

Not related.

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Not related.

[!IMPORTANT] Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

duesdues avatar Nov 21 '25 10:11 duesdues

There should be testcases in the CI. Since there were no strict version constraints before, why can those cases pass the test? Can u figure it out? @duesdues

tardis-key avatar Nov 25 '25 07:11 tardis-key

There should be testcases in the CI. Since there were no strict version constraints before, why can those cases pass the test? Can u figure it out? @duesdues

The earliest issue discovered in the community was found on October 18 (ref to https://github.com/volcengine/verl/issues/3806). When checking the uvloop on the PyPI source, it was noticed that the problematic version 0.22.1 was released on October 17.

ref to https://pypi.org/project/uvloop/#history


Perhaps the community's CI should reinstall everything according to requirement.txt each time, so that issues triggered by updating other third-party packages can be detected.

@tardis-key @duesdues @wuxibin89 @FightingZhen

wlf-darkmatter avatar Nov 25 '25 08:11 wlf-darkmatter

I have checked that latest sglang ci image have uvloop==0.22.1

docker run --rm -it --entrypoint /bin/bash verlai/verl:vllm011.dev77
pip list | grep uvloop
uvloop                             0.21.0
docker run --rm -it verlai/verl:sgl055.dev2 -- bash bash
pip list | grep uvloop
uvloop                             0.22.1

wuxibin89 avatar Nov 26 '25 02:11 wuxibin89

verl' requirements are set in setup.py: https://github.com/volcengine/verl/blob/main/setup.py#L26-L45, so we should pin uvloop in setup.py, requirements.txt is not used anymore.

wuxibin89 avatar Nov 26 '25 02:11 wuxibin89

I have checked that latest sglang ci image have uvloop==0.22.1

docker run --rm -it --entrypoint /bin/bash verlai/verl:vllm011.dev77
pip list | grep uvloop
uvloop                             0.21.0
docker run --rm -it verlai/verl:sgl055.dev2 -- bash bash
pip list | grep uvloop
uvloop                             0.22.1

issue like https://github.com/volcengine/verl/issues/3822 https://github.com/volcengine/verl/issues/3806 using vllm as rollout backen, no issue associate with sglang found for now. Maybe there is diff between sglang and vllm to figure out

wlf-darkmatter avatar Nov 26 '25 04:11 wlf-darkmatter

verl' requirements are set in setup.py: https://github.com/volcengine/verl/blob/main/setup.py#L26-L45, so we should pin uvloop in setup.py, requirements.txt is not used anymore.

Okay, I have moved the uvloop to setup.py.

duesdues avatar Nov 27 '25 06:11 duesdues