Jiajie Li
Jiajie Li
> @ljjsalt @d4l3k > > I have been thinking more about this and I am wondering if ray.util.queue is a better way of implementing this. > > you basically create...
> @ljjsalt @d4l3k > > I have been thinking more about this and I am wondering if ray.util.queue is a better way of implementing this. > > you basically create...
> If we did want to separate the concerns here with scheduling vs retries we could use the builtin task max_retries config though that has some other implications. > >...
> ``` > TimeoutError: Placement group creation timed out. Make sure your cluster either has enough resources or use an autoscaling cluster. Current resources available: {'memory': 18038862642.0, 'CPU': 8.0, 'node:10.130.6.66':...
@d4l3k May I ask do you have a plan on supporting elastic training on ray? I am looking for making contributions in this feature.
@d4l3k Yes, I am interested, let's find a time to sync up. Is Thursday next week a good time for you?
OK, duplicated #586
@d4l3k I can. But I think there is something else needs to be fixed as well. For example, `ray.node.Node` is moved to `ray._private.node.Node`. Should we find a replacement or just...
> @ljjsalt can you update the pyre and fix the unit tests? > > `pyre --output=json | pyre-upgrade fixme` @d4l3k It seems this command generated many pyre errors.
@d4l3k Dependencies are installed with `pip install -e '.[dev]'`. Except from `pyre-extensions==0.0.21`, only `0.0.29` works for me. Then I keep getting different pyre outputs from CI.