dstack icon indicating copy to clipboard operation
dstack copied to clipboard

[Bug]: Cannot run local backend on Linux

Open jvstme opened this issue 1 year ago • 1 comments

Steps to reproduce

https://github.com/dstackai/dstack/blob/087b090582d60b933c00112cd4f4307220bf0c70/runner/README.md

Actual behaviour

The run is stuck in status provisioning for a few minutes and then errors out. CLI:

Cannot create container. Error: Failed to run docker pull or docker create:  Check CLI and server logs
for more details.

shim:

2024/04/08 18:22:47 Running container, name=selfish-jellyfish-1-0-0, id=995341e472a8fbc9f748bc546f5e4747a6bbd156f17ce07f8d238a1ad7db7b5d
2024/04/08 18:25:13 Container finished successfully, name=selfish-jellyfish-1-0-0, id=995341e472a8fbc9f748bc546f5e4747a6bbd156f17ce07f8d238a1ad7db7b5d
2024/04/08 18:25:13 Cannot open file /tmp/dstack-runner/runners/20240408-182247/runner.log: open /tmp/dstack-runner/runners/20240408-182247/runner.log: no such file or directory

runner:

time=2024-04-08T16:23:12.953532Z level=error msg=Server failed err=listen tcp :10999: bind: address already in use

Expected behaviour

The configuration runs successfully.

dstack version

master

Server logs

No response

Additional information

What happens:

  • When submitting a run, the dstack client reserves local port 10999 to then receive logs through it.
  • dstack-shim starts dstack-runner in a container in host network mode, so dstack-runner and the client share network interfaces.
  • dstack-runner cannot bind to port 10999, as it is reserved by the client.

Workaround: use dstack run --detach .. In that case the client will not reserve the port.

Possible solution: reserve the port on the client only when it is needed, once dstack-runner is up. The client will choose another port, as 10999 will already be taken by dstack-runner.

jvstme avatar Apr 09 '24 13:04 jvstme

This issue is stale because it has been open for 30 days with no activity.

peterschmidt85 avatar May 10 '24 01:05 peterschmidt85

This issue was closed because it has been inactive for 14 days since being marked as stale. Please reopen the issue if it is still relevant.

peterschmidt85 avatar May 24 '24 01:05 peterschmidt85