sagemaker-inference-toolkit SageMaker inference should be able to run as non-root user.

Describe the bug

When running as a non-root user within a container, sagemaker-inference fails to start the multi-model-server. This works when all packages are installed as root, and the entrypoint script is run as root. The entrypoint script starts the model server using:

sagemaker_inference.model_server.start_model_server(......)

To reproduce

Install the libraries as in the Dockerfile snippet:

RUN ["useradd", "-ms", "/bin/bash", "-d", "/home/<user>", "<user>" ]

ENV CUSTOM_INFERENCE_DIR=/home/<user>/custom_inference

RUN mkdir -p ${CUSTOM_INFERENCE_DIR}

COPY code/* ${CUSTOM_INFERENCE_DIR}/

RUN chown -R <user>:root ${CUSTOM_INFERENCE_DIR}

RUN chmod -R +rwx ${CUSTOM_INFERENCE_DIR}

USER <user>

RUN pip install mxnet-model-server multi-model-server sagemaker-inference

RUN pip install retrying

NOTE: Running a CLI

Expected behavior

SageMaker MMS should start without any issues.

Screenshots or logs

File "/home/<user>/.local/lib/python3.6/site-packages/retrying.py", line 49, in wrapped_f

    return Retrying(*dargs, **dkw).call(f, *args, **kw)

  File "/home/<user>/.local/lib/python3.6/site-packages/retrying.py", line 206, in call

    return attempt.get(self._wrap_exception)

  File "/home/<user>/.local/lib/python3.6/site-packages/retrying.py", line 247, in get

    six.reraise(self.value[0], self.value[1], self.value[2])

  File "/usr/local/lib/python3.6/dist-packages/six.py", line 703, in reraise

    raise value

  File "/home/<user>/.local/lib/python3.6/site-packages/retrying.py", line 200, in call

    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)

  File "/home/<user>/custom_inference/entrypoint.py", line 21, in _start_mms

    model_server.start_model_server(handler_service=HANDLER_SERVICE)

  File "/home/<user>/.local/lib/python3.6/site-packages/sagemaker_inference/model_server.py", line 77, in start_model_server

    _create_model_server_config_file()

  File "/home/<user>/.local/lib/python3.6/site-packages/sagemaker_inference/model_server.py", line 143, in _create_model_server_config_file

    utils.write_file(MMS_CONFIG_FILE, configuration_properties)

  File "/home/<user>/.local/lib/python3.6/site-packages/sagemaker_inference/utils.py", line 47, in write_file

    with open(path, mode) as f:

PermissionError: [Errno 13] Permission denied: '/etc/sagemaker-mms.properties'

Checking on my development machine as well, it doesn't seem like non-root user has access to /etc.

Can this library be updated so as to run as non-root user?

System information

sagemaker-inference==1.5.2

Custom Docker image, ubuntu based.
- framework name: tensorflow
- framework version: 2.3.0
- Python version: 3.6
- processing unit type: CPU

Additional context

I worked-around this initial problem by granting write access to the /etc folder but it would be ideal if the configuration were stored in a user-writeable directory.

Oct 13 '20 14:10 aykulkarni

If I may, could I suggest alternative paths to /etc

join( base_dir, "etc") join( base_dir, "conf") join( base_dir, "config")

Rationale

sagemaker-inference already has a concept of base_dir.

base_dir defaults to /opt/ml, which is already pretty reasonable. It is also configurable through the SAGEMAKER_BASE_DIR environment variable, which is great.

sagemaker-inference already uses join(base_dir, "models") to store models, and join(base_dir, "models/code") to store python code. With that, adding the configs into the base_dir is a reasonable extension.

I am not super sure about the name to choose for the leaf directory to store the config files. All etc, conf and config sound reasonable to me. Any suggestions from the maintainers ?

Context I would actually recommend to mark this ticket as a bug rather than an enchancement. I am more than happy to add some more context. This may be stating the obvious rather than adding much new information.

sagemaker-inference library may be designed to be run in multi-model-endpoints and writing to the /etc path may be OK for the runtime in sagemaker endpoints. However, manipulating the /etc directory has several other problems in enterprise software packaging and delivery.

In the bring your own container approach, we would like to build a container ourselves using the best practices and toolchains from our enterprise CICD stack. Due to myriad of security and compliance reasons, container build systems drop the root user and restrict many capabilities on the only allowed non-privileged user in a container. After building our container in our CICD stack, we may also like to run some integration tests on that container, which may require spawning the multi-model-server via the sagemaker-inference. The non-privileged user in those integration tests also does not have permission to modify the /etc directory.

Oct 29 '20 20:10 ahakanbaba

Any updates here? Thanks

May 23 '24 18:05 ArtemioPadilla

Coming up against this as well. To an extent you can hack around it in the container, by setting locations to ones owned by a non-root user/chowning/chmodding things. This will get the model server/this toolkit running nicely locally. (chmod a+rw /tmp etc.)

However, the more fundamental issue is that sagemaker itself isn't compatible with running inference code as a non-root user, so as soon as you try to use this container on an inference endpoint, you'll hit permissions issues again, when it attempts to copy across your model etc.

Worse, if you try to work around this, the solution will always be brittle. Knowing the currently required permissions isn't enough - AWS might change their implementation and deployments/autoscaling will just start failing :/

The only options I can come up with are all not great:

Give up on security best-practice, beg for security scan exemptions.
Violate the "SageMaker needs root" requirement stated in the docs, patch it to work, pray AWS doesn't change anything important.
Don't use SageMaker inference endpoints :(

Jun 27 '24 23:06 fahran-wallace