sagemaker-python-sdk icon indicating copy to clipboard operation
sagemaker-python-sdk copied to clipboard

Local mode does not work on EC2 instances

Open MatthewCaseres opened this issue 3 years ago • 0 comments

Describe the bug This is the AMI that I am using - torch-ubuntu

I installed docker-compose by setting up the repository as described here - https://docs.docker.com/engine/install/ubuntu/

It is telling me ImportError: 'docker-compose' is not installed.

To reproduce Use the same EC2 AMI, install docker-compose, and attempt to run PyTorchProcessor in local mode.

Expected behavior docker-compose is installed so it should not tell me that there is an error.

Screenshots or logs If applicable, add screenshots or logs to help explain your problem.

Job Name:  local_processor_constructor-2022-07-11-18-52-12-390
Inputs:  [{'InputName': 'code', 'AppManaged': False, 'S3Input': {'S3Uri': 's3://sagemaker-us-east-1-507925425112/local_processor_constructor-2022-07-11-18-52-12-390/source/sourcedir.tar.gz', 'LocalPath': '/opt/ml/processing/input/code/', 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', 'S3DataDistributionType': 'FullyReplicated', 'S3CompressionType': 'None'}}, {'InputName': 'entrypoint', 'AppManaged': False, 'S3Input': {'S3Uri': 's3://sagemaker-us-east-1-507925425122/local_processor_constructor-2022-07-11-18-52-12-390/source/runproc.sh', 'LocalPath': '/opt/ml/processing/input/entrypoint', 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', 'S3DataDistributionType': 'FullyReplicated', 'S3CompressionType': 'None'}}]
Outputs:  []
Traceback (most recent call last):
  File "/home/ubuntu/torch/runner.py", line 14, in <module>
    torch_procesor.run(
  File "/home/ubuntu/.local/lib/python3.9/site-packages/sagemaker/processing.py", line 1608, in run
    return super().run(
  File "/home/ubuntu/.local/lib/python3.9/site-packages/sagemaker/workflow/pipeline_context.py", line 209, in wrapper
    return run_func(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.9/site-packages/sagemaker/processing.py", line 554, in run
    self.latest_job = ProcessingJob.start_new(
  File "/home/ubuntu/.local/lib/python3.9/site-packages/sagemaker/processing.py", line 778, in start_new
    processor.sagemaker_session.process(**process_args)
  File "/home/ubuntu/.local/lib/python3.9/site-packages/sagemaker/session.py", line 943, in process
    self._intercept_create_request(process_request, submit, self.process.__name__)
  File "/home/ubuntu/.local/lib/python3.9/site-packages/sagemaker/session.py", line 4230, in _intercept_create_request
    return create(request)
  File "/home/ubuntu/.local/lib/python3.9/site-packages/sagemaker/session.py", line 941, in submit
    self.sagemaker_client.create_processing_job(**request)
  File "/home/ubuntu/.local/lib/python3.9/site-packages/sagemaker/local/local_session.py", line 115, in create_processing_job
    container = _SageMakerContainer(
  File "/home/ubuntu/.local/lib/python3.9/site-packages/sagemaker/local/image.py", line 91, in __init__
    raise ImportError(
ImportError: 'docker-compose' is not installed. Local Mode features will not work without docker-compose. For more information on how to install 'docker-compose', please, see https://docs.docker.com/compose/install/

System information

This is the code -

from sagemaker.pytorch import PyTorchProcessor

torch_procesor = PyTorchProcessor(
    framework_version="1.9.0",
    role="arn:aws:iam::507925425112:role/sagemaker-studio-execution-role",
    instance_count=1,
    instance_type="local",
    py_version="py38",
    base_job_name="local_processor_constructor",
)

torch_procesor.run(
    code="sleeper.py",
    source_dir=".",
)

This is my environment

PyTorch version: 1.12.0
Is debug build: False
CUDA used to build PyTorch: 11.6
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.4 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.22.3
Libc version: glibc-2.31

Python version: 3.9.13 | packaged by conda-forge | (main, May 27 2022, 16:56:21)  [GCC 10.3.0] (64-bit runtime)
Python platform: Linux-5.13.0-1031-aws-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.6.124
GPU models and configuration: GPU 0: Tesla T4
Nvidia driver version: 510.73.08
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.22.4
[pip3] torch==1.12.0
[pip3] torch-model-archiver==0.5.3b20220226
[pip3] torch-workflow-archiver==0.2.4b20220513
[pip3] torchaudio==0.12.0
[pip3] torchdata==0.4.0
[pip3] torchserve==0.6.0b20220513
[pip3] torchtext==0.13.0
[pip3] torchvision==0.13.0
[conda] blas                      2.115                       mkl    conda-forge
[conda] blas-devel                3.9.0            15_linux64_mkl    conda-forge
[conda] captum                    0.5.0                         0    pytorch
[conda] cudatoolkit               11.6.0              hecad31d_10    conda-forge
[conda] libblas                   3.9.0            15_linux64_mkl    conda-forge
[conda] libcblas                  3.9.0            15_linux64_mkl    conda-forge
[conda] liblapack                 3.9.0            15_linux64_mkl    conda-forge
[conda] liblapacke                3.9.0            15_linux64_mkl    conda-forge
[conda] magma-cuda116             2.6.1                         0    pytorch
[conda] mkl                       2022.1.0           h84fe81f_915    conda-forge
[conda] mkl-devel                 2022.1.0           ha770c72_916    conda-forge
[conda] mkl-include               2022.1.0           h84fe81f_915    conda-forge
[conda] numpy                     1.22.4                   pypi_0    pypi
[conda] pytorch                   1.12.0          py3.9_cuda11.6_cudnn8.3.2_0    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] torch-model-archiver      0.5.3                    py39_0    pytorch
[conda] torch-workflow-archiver   0.2.4                    py39_0    pytorch
[conda] torchaudio                0.12.0               py39_cu116    pytorch
[conda] torchserve                0.6.0                    py39_0    pytorch
[conda] torchtext                 0.13.0                     py39    pytorch
[conda] torchvision               0.13.0               py39_cu116    pytorch

MatthewCaseres avatar Jul 11 '22 19:07 MatthewCaseres