Supporting `SystemUserSpawner` and using the `--user $UID:$GID` flags
Feature description:
Broadly: Support for PAMAuthenticator, SystemUserSpawner, and --user $UID:$GID flags.
Tying these together, this would allow ml-hub to take advantage of local system users. The primary benefit of this is that in a setting where each user can log in and spin up their own ml-workspace, they now have a way to tie into their home directory on the host file-system. This allows for a single-location, transportable configuration across multiple workspaces, in the cases where a workspace is used as a "project sandbox" (if you will).
Problem and motivation:
-
Why is this change important to you? I've been using
ml-hubfor a bit and it's great, but I (and other users on my system) find that we're setting up our shell configurations (and cloning projects) quite a bit. -
What is the problem this feature would solve?
- Transporting user configurations and credentials (e.g. ECDSA keys) between workspaces.
- Allow
ml-hubto work from local datasets (e.g. someone working on YouTube-8M – it's difficult to redownload the entire dataset in a reasonable timeframe.) - Users with a file-sync service running on the host can have their changes reflected from
ml-hub. - Work within
ml-workspacesis more transparently accessible.
-
How would you use it? Personally, all the problems this solves are exactly what I'm looking for. While it's challenging to do things like mount datasets directly, I could solve that with some hard-linking. Though this brings to mind another possible feature for admins of
ml-hub– specify dataset repositories. -
How can it benefit other users? I'm not too sure how it would benefit other users, explicitly, but I have a general feeling that once
ml-hubsupports local user mappings, and if there's a way to port this tosingularity, HPCs could be interested in using this along with some smaller teams of ML researchers/developers.
Is this something you're interested in working on? Yea! I was planning to do some digging later this week to figure out how challenging an implementation is would be.
Hey @ionlights , thanks for the detailed feature request!
We really appreciate your effort to make MLHub adaptable for more scenarios.
One remark with regards to the --user flag: in case you refer to the user who is used within the started workspace container, here is a related issue: https://github.com/ml-tooling/ml-workspace/issues/11
Currently, all processes (tools, scripts etc.) within the workspace container are executed as the root user. We have not looked into this yet and I am not sure whether we can do so soon. But perhaps this note helps you.
Hmm... as far as I understand --user just maps root inside the container to "my" system-wide $UID/$GID. I'm not sure it makes a big difference that ml-workspace currently runs the root user by default. (Definitely not ideal, but that's a "fundamental limitation" of Docker, [at least] last I checked.)
Just to be clear, I was referring to mimicking PAMAuthenticator and SystemUserSpawner. With a possible addition of the --user flag when spinning up the containers – mostly so multi-user systems don't fall into any kind of "permissions hell."
I thought that this is the user used within the container. Hence, if something inside the container needs root permissions, it might not work, but I have no experience with the --user flag and could be wrong here.
Besides that, making those functionalities (like PAMAuthenticator and SystemUserSpawner) compatible with ml-hub and ml-workspace would be great!