Oobleck icon indicating copy to clipboard operation
Oobleck copied to clipboard

How to add nodes during training?

Open laochonlam opened this issue 1 year ago • 3 comments

Hi @insujang ,

Thanks for open-sourcing Oobleck, great work!

From the paper, it seems that the experiments show it supports both adding and removing nodes during training.

I successfully ran Oobleck with node failures (removing nodes), but I couldn't find a way to add nodes dynamically during training. Could you let me know how to make it work?

Thank you! Lam

laochonlam avatar Oct 09 '24 17:10 laochonlam

Hi @laochonlam ,

All experiments in the paper were done with a Bamboo simulator, by measuring throughput and overheads of reconfiguration in every configuration and combining them. Current code does not include implementation for adding nodes. This is a future work; I think simply running reconfiguration would be enough, but need to try.

insujang avatar Oct 09 '24 17:10 insujang

Got it—I'll give that a try. Thank you for your prompt response!

Lam

laochonlam avatar Oct 09 '24 20:10 laochonlam

Let me leave it open so that later I can work on it :) You are also welcome to make a PR that adds a feature for node addition.

insujang avatar Oct 09 '24 20:10 insujang