Egor Sklyarov
Egor Sklyarov
### Problem Running a service with multiple replicas starts them sequentially, not parallel. This limitation has no benefits ### Solution Process all available jobs in parallel, similar to `process_runs` ###...
### Problem Interrupted instances (spot or removed from outside of dstack) are shown as `idle` for a long time ### Solution If the shim doesn't respond, ask the backend about...
### Problem Resubmitted jobs cause 409 responses from GCP if an interrupted instance is not fully deleted. It leads to provisioning in more expensive regions ### Solution Use truly unique...
### dstack version master ### Python version - ### Host OS - ### Host Arch - ### What happened? Resubmitting interrupted job leads to duplicates in instance names (while the...
Since SSH keys are added to the instance only during creation, other users cannot add their keys later. dstack server must update SSH keys on job submission (docker image submission,...
### Problem In case of a problem with the dstack server, it's hard to investigate without debug logs, which are not always turned on. Re-running the server with a debug...