dstack icon indicating copy to clipboard operation
dstack copied to clipboard

[Feature]: Shut off instances when they have finished a task and they don't hear from the server for some time

Open spott opened this issue 1 year ago • 9 comments

Problem

Currently, the server is what shuts down an instance. So if it isn't running for whatever reason, an instance will sit idle running up an expensive bill in compute.

Solution

If an instance hasn't heard from the server for some set period of time (5 min? 10?), and is idle, then have it shut itself down.

Benefit

Save money, benefits the laptop users who only need the server to submit jobs during the day, and at night can shut their laptop and let jobs finish and shutdown the instances on their own overnight. Is a nice failsafe against any potential bugs in the server.

Alternatives

Find some way to keep the server running?

Would you like to help contributing this feature?

Yes

spott avatar Mar 26 '24 15:03 spott

@spott What if the job is very important, and the user expect it to keep on running even though the server is temporarily down?

peterschmidt85 avatar Mar 27 '24 17:03 peterschmidt85

Sorry, the instance shouldn't stop the job, but after the job is done, if the instance hasn't heard from the server for 5 minutes and is idling, it should shut down.

spott avatar Mar 28 '24 00:03 spott

This issue is stale because it has been open for 30 days with no activity.

peterschmidt85 avatar Apr 27 '24 01:04 peterschmidt85

This issue was closed because it has been inactive for 14 days since being marked as stale.

peterschmidt85 avatar May 14 '24 01:05 peterschmidt85

I'm still interested in this feature.

spott avatar May 24 '24 15:05 spott

@spott, dstack used to have this functionality: the instance would shut down itself eventually if the server failed to do so for some reason. And we only recently finished removing it. There were several reasons for the decision, mainly:

  • A new backend integration and its maintenance were more complicated since the instance (the runner) had backend-specific logic. We dropped it to be able to support more clouds faster.
  • It required passing credentials to the instance or granting permissions to create and assume instance roles. Both are unacceptable or undesirable for many dstack users.

So, realistically speaking, we're unlikely to resurrect this feature any time soon. That said, I totally see the need to mitigate the possibility of forgotten idle instances when you start the dstack server locally via CLI. For this case, I'd suggest a different feature that we could implement:

When you interrupt dstack server, it could check for idle instances, and if there are any, it would tell you that and ask you if you're willing to exit anyway. It would solve most of the problems with forgetting idle instances. What do you say?

r4victor avatar May 27 '24 05:05 r4victor

When you interrupt dstack server, it could check for idle instances

Unfortunately, this doesn't really do much. An instance is only idle for 5 minutes by default after a job has finished, so unless you shutdown the server within that 5 minutes, you would always see no idle instances.

The issue is really that having a server that must always be online kills usability for all the laptop users out there, as they now have to keep their laptop awake in order to kill jobs when they are done (and if they aren't aware of that, it costs them money!). A lot of hobbyists who are using something like this don't want to provision a dstack server in the cloud. Thankfully dstack sky mitigates some of these issues for hobbyists, but that means any credits or money you have already put into GPU clouds is now forfeit.

I fully understand the issues that this causes, but I still think this would be valuable in reducing onboarding friction. I'm not alone either, as skypilot has deemed this important enough to add the option (it's called auto-stop).

spott avatar May 29 '24 21:05 spott

Thank you @spott for valuable feedback. Let us discuss this internally and come up with options.

peterschmidt85 avatar May 29 '24 21:05 peterschmidt85

@spott, we've just released a new version of https://sky.dstack.ai that allows you to configure backends with your own cloud credentials just as you would do in dstack! It seems like it could be a solution to your problem.

r4victor avatar Jun 11 '24 05:06 r4victor

This issue is stale because it has been open for 30 days with no activity.

peterschmidt85 avatar Jul 12 '24 01:07 peterschmidt85

Because dstack Sky supports this now, I'm closing this issue. In case the issue becomes relevant again, we can re-open it!

peterschmidt85 avatar Jul 16 '24 13:07 peterschmidt85