backend.ai icon indicating copy to clipboard operation
backend.ai copied to clipboard

Support committing and pushing a container image from a running session

Open achimnol opened this issue 3 years ago • 1 comments

There are several customer requests for supporting "freezing" their current compute sessions.

Historically, we intentionally have not added this feature due to inherent volatility and security restrictions of Docker containers. The filesystem that users see in the sessions is a composition of multiple local/remote filesystems, such as the container image, vfolders, and scratch directories. For security, we don't expose the root user inside the container by default, so there is usually no way to modify the container filesystem provided by the image because all system packages and directories are owned by root.

To allow installation of additional packages using user-site paths in Python (pip install --user) and make it persistent across different compute sessions, we have added the following features:

  • #98
  • #99
  • auto-mounting dot-prefixed vfolders (such as ~/.config, ~/.local)

Nevertheless, many HPC/AI customers want to use the containers like VMs, and it is not easy to fill the conceptual gap between the volatile & hermetic nature of containers and the full ownership of volume data of VMs.

So, despite whatever additional cautions required when committing a Backend.AI compute session, let's make it technically available.

achimnol avatar Jul 18 '22 07:07 achimnol

Requirements for this feature:

  • [x] (Web-UI) Add a "commit" button in the "Controls" column per compute session.
    • The commit button should be disabled or displayed as a spinning icon if the commit task for the session is on-going.
  • [x] (Manager) API handler that accepts the request from the client and then relays it to the Agent that runs the target container.
    • Only the owner of the compute session can issue the commit request.
    • A new config parameter that specifies the save location of the .tar.gz file. The location should usually be on a network filesystem.
    • API handler that returns the status of the commit task.
  • [x] (Agent) RPC handler that executes the actual commit operation for the target container.
    • The resulting .tar.gz should be saved on the specified location.
    • RPC handler that returns the status of the commit task.

adrysn avatar Jul 25 '22 04:07 adrysn