SageMaker inference should be able to run as non-root user.
Describe the bug
When running as a non-root user within a container, sagemaker-inference fails to start the multi-model-server. This works when all packages are installed as root, and the entrypoint script is run as root. The entrypoint script starts the model server using:
sagemaker_inference.model_server.start_model_server(......)
To reproduce
- Install the libraries as in the Dockerfile snippet:
RUN ["useradd", "-ms", "/bin/bash", "-d", "/home/<user>", "<user>" ]
ENV CUSTOM_INFERENCE_DIR=/home/<user>/custom_inference
RUN mkdir -p ${CUSTOM_INFERENCE_DIR}
COPY code/* ${CUSTOM_INFERENCE_DIR}/
RUN chown -R <user>:root ${CUSTOM_INFERENCE_DIR}
RUN chmod -R +rwx ${CUSTOM_INFERENCE_DIR}
USER <user>
RUN pip install mxnet-model-server multi-model-server sagemaker-inference
RUN pip install retrying
NOTE: Running a CLI
Expected behavior
SageMaker MMS should start without any issues.
Screenshots or logs
File "/home/<user>/.local/lib/python3.6/site-packages/retrying.py", line 49, in wrapped_f
return Retrying(*dargs, **dkw).call(f, *args, **kw)
File "/home/<user>/.local/lib/python3.6/site-packages/retrying.py", line 206, in call
return attempt.get(self._wrap_exception)
File "/home/<user>/.local/lib/python3.6/site-packages/retrying.py", line 247, in get
six.reraise(self.value[0], self.value[1], self.value[2])
File "/usr/local/lib/python3.6/dist-packages/six.py", line 703, in reraise
raise value
File "/home/<user>/.local/lib/python3.6/site-packages/retrying.py", line 200, in call
attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
File "/home/<user>/custom_inference/entrypoint.py", line 21, in _start_mms
model_server.start_model_server(handler_service=HANDLER_SERVICE)
File "/home/<user>/.local/lib/python3.6/site-packages/sagemaker_inference/model_server.py", line 77, in start_model_server
_create_model_server_config_file()
File "/home/<user>/.local/lib/python3.6/site-packages/sagemaker_inference/model_server.py", line 143, in _create_model_server_config_file
utils.write_file(MMS_CONFIG_FILE, configuration_properties)
File "/home/<user>/.local/lib/python3.6/site-packages/sagemaker_inference/utils.py", line 47, in write_file
with open(path, mode) as f:
PermissionError: [Errno 13] Permission denied: '/etc/sagemaker-mms.properties'
Checking on my development machine as well, it doesn't seem like non-root user has access to /etc.
Can this library be updated so as to run as non-root user?
System information
sagemaker-inference==1.5.2
-
Custom Docker image, ubuntu based.
-
framework name: tensorflow
-
framework version: 2.3.0
-
Python version: 3.6
-
processing unit type: CPU
-
Additional context
I worked-around this initial problem by granting write access to the /etc folder but it would be ideal if the configuration were stored in a user-writeable directory.
If I may, could I suggest alternative paths to /etc
join( base_dir, "etc") join( base_dir, "conf") join( base_dir, "config")
Rationale
sagemaker-inference already has a concept of base_dir.
base_dir defaults to /opt/ml, which is already pretty reasonable.
It is also configurable through the SAGEMAKER_BASE_DIR environment variable, which is great.
sagemaker-inference already uses join(base_dir, "models") to store models, and join(base_dir, "models/code") to store python code. With that, adding the configs into the base_dir is a reasonable extension.
I am not super sure about the name to choose for the leaf directory to store the config files. All etc, conf and config sound reasonable to me. Any suggestions from the maintainers ?
Context I would actually recommend to mark this ticket as a bug rather than an enchancement. I am more than happy to add some more context. This may be stating the obvious rather than adding much new information.
sagemaker-inference library may be designed to be run in multi-model-endpoints and writing to the /etc path may be OK for the runtime in sagemaker endpoints. However, manipulating the /etc directory has several other problems in enterprise software packaging and delivery.
In the bring your own container approach, we would like to build a container ourselves using the best practices and toolchains from our enterprise CICD stack. Due to myriad of security and compliance reasons, container build systems drop the root user and restrict many capabilities on the only allowed non-privileged user in a container. After building our container in our CICD stack, we may also like to run some integration tests on that container, which may require spawning the multi-model-server via the sagemaker-inference. The non-privileged user in those integration tests also does not have permission to modify the /etc directory.
Any updates here? Thanks
Coming up against this as well.
To an extent you can hack around it in the container, by setting locations to ones owned by a non-root user/chowning/chmodding things. This will get the model server/this toolkit running nicely locally. (chmod a+rw /tmp etc.)
However, the more fundamental issue is that sagemaker itself isn't compatible with running inference code as a non-root user, so as soon as you try to use this container on an inference endpoint, you'll hit permissions issues again, when it attempts to copy across your model etc.
Worse, if you try to work around this, the solution will always be brittle. Knowing the currently required permissions isn't enough - AWS might change their implementation and deployments/autoscaling will just start failing :/
The only options I can come up with are all not great:
- Give up on security best-practice, beg for security scan exemptions.
- Violate the "SageMaker needs root" requirement stated in the docs, patch it to work, pray AWS doesn't change anything important.
- Don't use SageMaker inference endpoints :(