torchx icon indicating copy to clipboard operation
torchx copied to clipboard

Support Ray 2.0 by getting rid of ray.train.util dependency

Open ntlm1686 opened this issue 3 years ago • 9 comments

Ray 2.0 made get_address_and_port no longer public + added a bunch of pyre changes. We need to find a replacement for get_address_and_port, fix the types and then upgrade to Ray 2.0.

#584

Test plan:

CI

ntlm1686 avatar Aug 23 '22 18:08 ntlm1686

OK, duplicated #586

ntlm1686 avatar Aug 23 '22 18:08 ntlm1686

you'll have to unpin ray version as well https://github.com/pytorch/torchx/commit/b70811e11ed4ed5d3b9dca62f177177135b7990f

d4l3k avatar Aug 23 '22 19:08 d4l3k

@d4l3k I can. But I think there is something else needs to be fixed as well. For example, ray.node.Node is moved to ray._private.node.Node.

Should we find a replacement or just use ray._private?

ntlm1686 avatar Aug 23 '22 19:08 ntlm1686

@ljjsalt you can just change that type to "object" since we don't actually interact with it (just pass it back to Ray) and it seems like those ray methods are untyped

d4l3k avatar Aug 23 '22 19:08 d4l3k

Codecov Report

Merging #587 (7eee11a) into main (d3f38a2) will increase coverage by 0.17%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #587      +/-   ##
==========================================
+ Coverage   94.77%   94.94%   +0.17%     
==========================================
  Files          67       67              
  Lines        4055     4137      +82     
==========================================
+ Hits         3843     3928      +85     
+ Misses        212      209       -3     
Impacted Files Coverage Δ
torchx/pipelines/kfp/adapter.py 98.52% <ø> (ø)
torchx/runner/events/__init__.py 100.00% <ø> (ø)
torchx/schedulers/kubernetes_scheduler.py 93.80% <ø> (ø)
torchx/schedulers/ray_scheduler.py 95.23% <ø> (ø)
torchx/specs/finder.py 96.98% <ø> (ø)
torchx/components/dist.py 96.42% <100.00%> (+7.06%) :arrow_up:
torchx/runner/api.py 96.91% <100.00%> (+0.01%) :arrow_up:
torchx/runner/events/api.py 67.64% <100.00%> (+0.98%) :arrow_up:
torchx/schedulers/ray/ray_common.py 100.00% <100.00%> (ø)
torchx/schedulers/ray/ray_driver.py 98.29% <100.00%> (+2.45%) :arrow_up:
... and 1 more

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov[bot] avatar Aug 23 '22 19:08 codecov[bot]

@ljjsalt can you update the pyre and fix the unit tests?

pyre --output=json | pyre-upgrade fixme

@d4l3k It seems this command generated many pyre errors.

ntlm1686 avatar Aug 28 '22 19:08 ntlm1686

ah yeah -- that probably wasn't quite the right thing to do -- looks like you were missing some local dependencies. Did you install all of dev-requirements.txt? Are you running in a virtualenv?

d4l3k avatar Aug 30 '22 17:08 d4l3k

@d4l3k Dependencies are installed with pip install -e '.[dev]'. Except from pyre-extensions==0.0.21, only 0.0.29 works for me. Then I keep getting different pyre outputs from CI.

ntlm1686 avatar Sep 02 '22 01:09 ntlm1686

I didn't even see the code which causes the pyre error image image

ntlm1686 avatar Sep 02 '22 01:09 ntlm1686

Closing since we support ray 2.x now

d4l3k avatar Mar 17 '23 21:03 d4l3k