[Bug]: Cannot run local backend on Linux
Steps to reproduce
https://github.com/dstackai/dstack/blob/087b090582d60b933c00112cd4f4307220bf0c70/runner/README.md
Actual behaviour
The run is stuck in status provisioning for a few minutes and then errors out.
CLI:
Cannot create container. Error: Failed to run docker pull or docker create: Check CLI and server logs
for more details.
shim:
2024/04/08 18:22:47 Running container, name=selfish-jellyfish-1-0-0, id=995341e472a8fbc9f748bc546f5e4747a6bbd156f17ce07f8d238a1ad7db7b5d
2024/04/08 18:25:13 Container finished successfully, name=selfish-jellyfish-1-0-0, id=995341e472a8fbc9f748bc546f5e4747a6bbd156f17ce07f8d238a1ad7db7b5d
2024/04/08 18:25:13 Cannot open file /tmp/dstack-runner/runners/20240408-182247/runner.log: open /tmp/dstack-runner/runners/20240408-182247/runner.log: no such file or directory
runner:
time=2024-04-08T16:23:12.953532Z level=error msg=Server failed err=listen tcp :10999: bind: address already in use
Expected behaviour
The configuration runs successfully.
dstack version
master
Server logs
No response
Additional information
What happens:
- When submitting a run, the dstack client reserves local port 10999 to then receive logs through it.
- dstack-shim starts dstack-runner in a container in host network mode, so dstack-runner and the client share network interfaces.
- dstack-runner cannot bind to port 10999, as it is reserved by the client.
Workaround: use dstack run --detach .. In that case the client will not reserve the port.
Possible solution: reserve the port on the client only when it is needed, once dstack-runner is up. The client will choose another port, as 10999 will already be taken by dstack-runner.
This issue is stale because it has been open for 30 days with no activity.
This issue was closed because it has been inactive for 14 days since being marked as stale. Please reopen the issue if it is still relevant.