ml-hub icon indicating copy to clipboard operation
ml-hub copied to clipboard

Supporting `SystemUserSpawner` and using the `--user $UID:$GID` flags

Open jmuchovej opened this issue 6 years ago • 3 comments

Feature description: Broadly: Support for PAMAuthenticator, SystemUserSpawner, and --user $UID:$GID flags.

Tying these together, this would allow ml-hub to take advantage of local system users. The primary benefit of this is that in a setting where each user can log in and spin up their own ml-workspace, they now have a way to tie into their home directory on the host file-system. This allows for a single-location, transportable configuration across multiple workspaces, in the cases where a workspace is used as a "project sandbox" (if you will).

Problem and motivation:

  • Why is this change important to you? I've been using ml-hub for a bit and it's great, but I (and other users on my system) find that we're setting up our shell configurations (and cloning projects) quite a bit.
  • What is the problem this feature would solve?
    1. Transporting user configurations and credentials (e.g. ECDSA keys) between workspaces.
    2. Allow ml-hub to work from local datasets (e.g. someone working on YouTube-8M – it's difficult to redownload the entire dataset in a reasonable timeframe.)
    3. Users with a file-sync service running on the host can have their changes reflected from ml-hub.
    4. Work within ml-workspaces is more transparently accessible.
  • How would you use it? Personally, all the problems this solves are exactly what I'm looking for. While it's challenging to do things like mount datasets directly, I could solve that with some hard-linking. Though this brings to mind another possible feature for admins of ml-hub – specify dataset repositories.
  • How can it benefit other users? I'm not too sure how it would benefit other users, explicitly, but I have a general feeling that once ml-hub supports local user mappings, and if there's a way to port this to singularity, HPCs could be interested in using this along with some smaller teams of ML researchers/developers.

Is this something you're interested in working on? Yea! I was planning to do some digging later this week to figure out how challenging an implementation is would be.

jmuchovej avatar Jan 19 '20 16:01 jmuchovej

Hey @ionlights , thanks for the detailed feature request! We really appreciate your effort to make MLHub adaptable for more scenarios. One remark with regards to the --user flag: in case you refer to the user who is used within the started workspace container, here is a related issue: https://github.com/ml-tooling/ml-workspace/issues/11 Currently, all processes (tools, scripts etc.) within the workspace container are executed as the root user. We have not looked into this yet and I am not sure whether we can do so soon. But perhaps this note helps you.

raethlein avatar Feb 05 '20 16:02 raethlein

Hmm... as far as I understand --user just maps root inside the container to "my" system-wide $UID/$GID. I'm not sure it makes a big difference that ml-workspace currently runs the root user by default. (Definitely not ideal, but that's a "fundamental limitation" of Docker, [at least] last I checked.)

Just to be clear, I was referring to mimicking PAMAuthenticator and SystemUserSpawner. With a possible addition of the --user flag when spinning up the containers – mostly so multi-user systems don't fall into any kind of "permissions hell."

jmuchovej avatar Feb 05 '20 16:02 jmuchovej

I thought that this is the user used within the container. Hence, if something inside the container needs root permissions, it might not work, but I have no experience with the --user flag and could be wrong here.

Besides that, making those functionalities (like PAMAuthenticator and SystemUserSpawner) compatible with ml-hub and ml-workspace would be great!

raethlein avatar Feb 05 '20 17:02 raethlein