slim icon indicating copy to clipboard operation
slim copied to clipboard

please help me how to slim gpu docker

Open xs818818 opened this issue 3 years ago • 7 comments

like nvcr.io/nvidia/pytorch:21.10-py3

xs818818 avatar Sep 20 '22 08:09 xs818818

do you have examples for pytorch and how you use it?

kcq avatar Sep 29 '22 06:09 kcq

I have been trying this too. @kcq

This is my Dockerfile

FROM nvcr.io/nvidia/pytorch:22.08-py3
WORKDIR /
ADD run.py /
CMD [ "python", "run.py" ]

and this is run.py

import torch
from torchvision.models.resnet import resnet18

model = resnet18()
model.eval().to("cuda:0").half()
x = torch.rand(1, 3, 224, 224).to("cuda:0").half()
_ = model(x)

I ran these commands to make the slim image

docker build -t pytorch_fat:1.0 .
docker-slim build --http-probe=false --cro-runtime=nvidia pytorch_fat:1.0

I can see the two images

pytorch_fat.slim    latest         e29f67494610   2 minutes ago    2.9GB
pytorch_fat            1.0         8f7954c308c3   10 minutes ago   14.6GB

but if I run the slim image I get this error due to .so files being removed

nvidia-docker run --rm -it pytorch_fat.slim:latest
Error processing line 1 of /opt/conda/lib/python3.8/site-packages/google_auth-2.9.1-py3.10-nspkg.pth:

  Traceback (most recent call last):
    File "/opt/conda/lib/python3.8/site.py", line 169, in addpackage
      exec(line)
    File "<string>", line 1, in <module>
    File "<frozen importlib._bootstrap>", line 553, in module_from_spec
  AttributeError: 'NoneType' object has no attribute 'loader'

Remainder of file ignored
Error processing line 1 of /opt/conda/lib/python3.8/site-packages/matplotlib-3.5.2-py3.8-nspkg.pth:

  Traceback (most recent call last):
    File "/opt/conda/lib/python3.8/site.py", line 169, in addpackage
      exec(line)
    File "<string>", line 1, in <module>
    File "<frozen importlib._bootstrap>", line 553, in module_from_spec
  AttributeError: 'NoneType' object has no attribute 'loader'

Remainder of file ignored
Error processing line 1 of /opt/conda/lib/python3.8/site-packages/protobuf-3.20.1-py3.8-nspkg.pth:

  Traceback (most recent call last):
    File "/opt/conda/lib/python3.8/site.py", line 169, in addpackage
      exec(line)
    File "<string>", line 1, in <module>
    File "<frozen importlib._bootstrap>", line 553, in module_from_spec
  AttributeError: 'NoneType' object has no attribute 'loader'

Remainder of file ignored
Error processing line 1 of /opt/conda/lib/python3.8/site-packages/sphinxcontrib_applehelp-1.0.2-py3.8-nspkg.pth:

  Traceback (most recent call last):
    File "/opt/conda/lib/python3.8/site.py", line 169, in addpackage
      exec(line)
    File "<string>", line 1, in <module>
    File "<frozen importlib._bootstrap>", line 553, in module_from_spec
  AttributeError: 'NoneType' object has no attribute 'loader'

Remainder of file ignored
Error processing line 1 of /opt/conda/lib/python3.8/site-packages/sphinxcontrib_devhelp-1.0.2-py3.8-nspkg.pth:

  Traceback (most recent call last):
    File "/opt/conda/lib/python3.8/site.py", line 169, in addpackage
      exec(line)
    File "<string>", line 1, in <module>
    File "<frozen importlib._bootstrap>", line 553, in module_from_spec
  AttributeError: 'NoneType' object has no attribute 'loader'

Remainder of file ignored
Error processing line 1 of /opt/conda/lib/python3.8/site-packages/sphinxcontrib_htmlhelp-2.0.0-py3.9-nspkg.pth:

  Traceback (most recent call last):
    File "/opt/conda/lib/python3.8/site.py", line 169, in addpackage
      exec(line)
    File "<string>", line 1, in <module>
    File "<frozen importlib._bootstrap>", line 553, in module_from_spec
  AttributeError: 'NoneType' object has no attribute 'loader'

Remainder of file ignored
Error processing line 1 of /opt/conda/lib/python3.8/site-packages/sphinxcontrib_jsmath-1.0.1-py3.7-nspkg.pth:

  Traceback (most recent call last):
    File "/opt/conda/lib/python3.8/site.py", line 169, in addpackage
      exec(line)
    File "<string>", line 1, in <module>
    File "<frozen importlib._bootstrap>", line 553, in module_from_spec
  AttributeError: 'NoneType' object has no attribute 'loader'

Remainder of file ignored
Error processing line 1 of /opt/conda/lib/python3.8/site-packages/sphinxcontrib_qthelp-1.0.3-py3.8-nspkg.pth:

  Traceback (most recent call last):
    File "/opt/conda/lib/python3.8/site.py", line 169, in addpackage
      exec(line)
    File "<string>", line 1, in <module>
    File "<frozen importlib._bootstrap>", line 553, in module_from_spec
  AttributeError: 'NoneType' object has no attribute 'loader'

Remainder of file ignored
Error processing line 1 of /opt/conda/lib/python3.8/site-packages/sphinxcontrib_serializinghtml-1.1.5-py3.9-nspkg.pth:

  Traceback (most recent call last):
    File "/opt/conda/lib/python3.8/site.py", line 169, in addpackage
      exec(line)
    File "<string>", line 1, in <module>
    File "<frozen importlib._bootstrap>", line 553, in module_from_spec
  AttributeError: 'NoneType' object has no attribute 'loader'

Remainder of file ignored
Traceback (most recent call last):
  File "run.py", line 1, in <module>
    import torch
  File "/opt/conda/lib/python3.8/site-packages/torch/__init__.py", line 201, in <module>
    _load_global_deps()
  File "/opt/conda/lib/python3.8/site-packages/torch/__init__.py", line 154, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/opt/conda/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libopen-rte.so.40: cannot open shared object file: No such file or directory

Can slim be used for applications which use Pytorch?

ganessh22 avatar Apr 12 '23 16:04 ganessh22

Thank you for sharing your Dockerfile and app info @ganessh22 It's super helpful for the repro!

I have been trying this too. @kcq

This is my Dockerfile

FROM nvcr.io/nvidia/pytorch:22.08-py3
WORKDIR /
ADD run.py /
CMD [ "python", "run.py" ]

and this is run.py

import torch
from torchvision.models.resnet import resnet18

model = resnet18()
model.eval().to("cuda:0").half()
x = torch.rand(1, 3, 224, 224).to("cuda:0").half()
_ = model(x)

I ran these commands to make the slim image

docker build -t pytorch_fat:1.0 .
docker-slim build --http-probe=false --cro-runtime=nvidia pytorch_fat:1.0

I can see the two images

pytorch_fat.slim    latest         e29f67494610   2 minutes ago    2.9GB
pytorch_fat            1.0         8f7954c308c3   10 minutes ago   14.6GB

but if I run the slim image I get this error due to .so files being removed

nvidia-docker run --rm -it pytorch_fat.slim:latest
Error processing line 1 of /opt/conda/lib/python3.8/site-packages/google_auth-2.9.1-py3.10-nspkg.pth:

  Traceback (most recent call last):
    File "/opt/conda/lib/python3.8/site.py", line 169, in addpackage
      exec(line)
    File "<string>", line 1, in <module>
    File "<frozen importlib._bootstrap>", line 553, in module_from_spec
  AttributeError: 'NoneType' object has no attribute 'loader'

Remainder of file ignored
Error processing line 1 of /opt/conda/lib/python3.8/site-packages/matplotlib-3.5.2-py3.8-nspkg.pth:

  Traceback (most recent call last):
    File "/opt/conda/lib/python3.8/site.py", line 169, in addpackage
      exec(line)
    File "<string>", line 1, in <module>
    File "<frozen importlib._bootstrap>", line 553, in module_from_spec
  AttributeError: 'NoneType' object has no attribute 'loader'

Remainder of file ignored
Error processing line 1 of /opt/conda/lib/python3.8/site-packages/protobuf-3.20.1-py3.8-nspkg.pth:

  Traceback (most recent call last):
    File "/opt/conda/lib/python3.8/site.py", line 169, in addpackage
      exec(line)
    File "<string>", line 1, in <module>
    File "<frozen importlib._bootstrap>", line 553, in module_from_spec
  AttributeError: 'NoneType' object has no attribute 'loader'

Remainder of file ignored
Error processing line 1 of /opt/conda/lib/python3.8/site-packages/sphinxcontrib_applehelp-1.0.2-py3.8-nspkg.pth:

  Traceback (most recent call last):
    File "/opt/conda/lib/python3.8/site.py", line 169, in addpackage
      exec(line)
    File "<string>", line 1, in <module>
    File "<frozen importlib._bootstrap>", line 553, in module_from_spec
  AttributeError: 'NoneType' object has no attribute 'loader'

Remainder of file ignored
Error processing line 1 of /opt/conda/lib/python3.8/site-packages/sphinxcontrib_devhelp-1.0.2-py3.8-nspkg.pth:

  Traceback (most recent call last):
    File "/opt/conda/lib/python3.8/site.py", line 169, in addpackage
      exec(line)
    File "<string>", line 1, in <module>
    File "<frozen importlib._bootstrap>", line 553, in module_from_spec
  AttributeError: 'NoneType' object has no attribute 'loader'

Remainder of file ignored
Error processing line 1 of /opt/conda/lib/python3.8/site-packages/sphinxcontrib_htmlhelp-2.0.0-py3.9-nspkg.pth:

  Traceback (most recent call last):
    File "/opt/conda/lib/python3.8/site.py", line 169, in addpackage
      exec(line)
    File "<string>", line 1, in <module>
    File "<frozen importlib._bootstrap>", line 553, in module_from_spec
  AttributeError: 'NoneType' object has no attribute 'loader'

Remainder of file ignored
Error processing line 1 of /opt/conda/lib/python3.8/site-packages/sphinxcontrib_jsmath-1.0.1-py3.7-nspkg.pth:

  Traceback (most recent call last):
    File "/opt/conda/lib/python3.8/site.py", line 169, in addpackage
      exec(line)
    File "<string>", line 1, in <module>
    File "<frozen importlib._bootstrap>", line 553, in module_from_spec
  AttributeError: 'NoneType' object has no attribute 'loader'

Remainder of file ignored
Error processing line 1 of /opt/conda/lib/python3.8/site-packages/sphinxcontrib_qthelp-1.0.3-py3.8-nspkg.pth:

  Traceback (most recent call last):
    File "/opt/conda/lib/python3.8/site.py", line 169, in addpackage
      exec(line)
    File "<string>", line 1, in <module>
    File "<frozen importlib._bootstrap>", line 553, in module_from_spec
  AttributeError: 'NoneType' object has no attribute 'loader'

Remainder of file ignored
Error processing line 1 of /opt/conda/lib/python3.8/site-packages/sphinxcontrib_serializinghtml-1.1.5-py3.9-nspkg.pth:

  Traceback (most recent call last):
    File "/opt/conda/lib/python3.8/site.py", line 169, in addpackage
      exec(line)
    File "<string>", line 1, in <module>
    File "<frozen importlib._bootstrap>", line 553, in module_from_spec
  AttributeError: 'NoneType' object has no attribute 'loader'

Remainder of file ignored
Traceback (most recent call last):
  File "run.py", line 1, in <module>
    import torch
  File "/opt/conda/lib/python3.8/site-packages/torch/__init__.py", line 201, in <module>
    _load_global_deps()
  File "/opt/conda/lib/python3.8/site-packages/torch/__init__.py", line 154, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/opt/conda/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libopen-rte.so.40: cannot open shared object file: No such file or directory

Can slim be used for applications which use Pytorch?

kcq avatar Apr 12 '23 17:04 kcq

So, do we need GPU in CI runner servers? haha

maxpain avatar Sep 09 '23 02:09 maxpain

Haven't had enough cycles to investigate. Don't have a local machine with nvidia. Will try to repro it with AWS.

kcq avatar Sep 11 '23 04:09 kcq