ml-hub Supporting `SystemUserSpawner` and using the `--user $UID:$GID` flags

Feature description: Broadly: Support for PAMAuthenticator, SystemUserSpawner, and --user $UID:$GID flags.

Tying these together, this would allow ml-hub to take advantage of local system users. The primary benefit of this is that in a setting where each user can log in and spin up their own ml-workspace, they now have a way to tie into their home directory on the host file-system. This allows for a single-location, transportable configuration across multiple workspaces, in the cases where a workspace is used as a "project sandbox" (if you will).

Problem and motivation:

Why is this change important to you? I've been using ml-hub for a bit and it's great, but I (and other users on my system) find that we're setting up our shell configurations (and cloning projects) quite a bit.
What is the problem this feature would solve?
1. Transporting user configurations and credentials (e.g. ECDSA keys) between workspaces.
2. Allow ml-hub to work from local datasets (e.g. someone working on YouTube-8M – it's difficult to redownload the entire dataset in a reasonable timeframe.)
3. Users with a file-sync service running on the host can have their changes reflected from ml-hub.
4. Work within ml-workspaces is more transparently accessible.
How would you use it? Personally, all the problems this solves are exactly what I'm looking for. While it's challenging to do things like mount datasets directly, I could solve that with some hard-linking. Though this brings to mind another possible feature for admins of ml-hub – specify dataset repositories.
How can it benefit other users? I'm not too sure how it would benefit other users, explicitly, but I have a general feeling that once ml-hub supports local user mappings, and if there's a way to port this to singularity, HPCs could be interested in using this along with some smaller teams of ML researchers/developers.

Is this something you're interested in working on? Yea! I was planning to do some digging later this week to figure out how challenging an implementation is would be.

Jan 19 '20 16:01 jmuchovej

Hey @ionlights , thanks for the detailed feature request! We really appreciate your effort to make MLHub adaptable for more scenarios. One remark with regards to the --user flag: in case you refer to the user who is used within the started workspace container, here is a related issue: https://github.com/ml-tooling/ml-workspace/issues/11 Currently, all processes (tools, scripts etc.) within the workspace container are executed as the root user. We have not looked into this yet and I am not sure whether we can do so soon. But perhaps this note helps you.

Feb 05 '20 16:02 raethlein

Hmm... as far as I understand --user just maps root inside the container to "my" system-wide $UID/$GID. I'm not sure it makes a big difference that ml-workspace currently runs the root user by default. (Definitely not ideal, but that's a "fundamental limitation" of Docker, [at least] last I checked.)

Just to be clear, I was referring to mimicking PAMAuthenticator and SystemUserSpawner. With a possible addition of the --user flag when spinning up the containers – mostly so multi-user systems don't fall into any kind of "permissions hell."

Feb 05 '20 16:02 jmuchovej

I thought that this is the user used within the container. Hence, if something inside the container needs root permissions, it might not work, but I have no experience with the --user flag and could be wrong here.

Besides that, making those functionalities (like PAMAuthenticator and SystemUserSpawner) compatible with ml-hub and ml-workspace would be great!

Feb 05 '20 17:02 raethlein