Ability to run without Docker?
I tried to convert the replicator into a Singularity image to be able to use it on a Docker-less cluster:
singularity pull docker://deepops/replicator:201015
This worked just fine and generated a replicator_201015.sif. Then off to replicating (note: needed PYTHONNOUSERSITE=1 otherwise stuff from ~/.local/lib/python3.6 was getting in the way... I might suggest defining this variable in the container proactively):
singularity run --env=PYTHONNOUSERSITE=1 -B /tmp:/output \
replicator_201015.sif --project=nvidia --min-version=17.12 \
--image=tensorflow --image=pytorch --image=tensorrt \
--singularity \
--dry-run \
--api-key=`cat ~/.ngc_api_key.txt`
Unfortunately, the run crashes citing the lack of Docker daemon:
2021-03-24 11:35:39,056 - ngc_replicator.ngc_replicator - 30 - INFO - Initializing Replicator
2021-03-24 11:35:40,501 - nvidia_deepops.docker.registry.ngcregistry - 126 - INFO - GET https://api.ngc.nvidia.com/v2/orgs - took 0.5812202040106058 sec
WARNING! Using --password via the CLI is insecure. Use --password-stdin.
Warning: failed to get default registry endpoint from daemon (Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?). Using system default: https://index.docker.io/v1/
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Traceback (most recent call last):
File "/usr/local/bin/ngc_replicator", line 33, in <module>
sys.exit(load_entry_point('ngc-replicator==0.4.0', 'console_scripts', 'ngc_replicator')())
File "/usr/local/lib/python3.6/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.6/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.6/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/ngc_replicator-0.4.0-py3.6.egg/ngc_replicator/ngc_replicator.py", line 344, in main
replicator = Replicator(**config)
File "/usr/local/lib/python3.6/site-packages/ngc_replicator-0.4.0-py3.6.egg/ngc_replicator/ngc_replicator.py", line 39, in __init__
self.nvcr_client.login(username="$oauthtoken", password=api_key, registry="nvcr.io/v2")
File "/usr/local/lib/python3.6/site-packages/nvidia_deepops-0.4.2-py3.6.egg/nvidia_deepops/docker/client/dockercli.py", line 62, in login
"docker login -u {} -p {} {}".format(username, password, registry))
File "/usr/local/lib/python3.6/site-packages/nvidia_deepops-0.4.2-py3.6.egg/nvidia_deepops/docker/client/dockercli.py", line 58, in call
stderr=stderr)
File "/usr/local/lib/python3.6/subprocess.py", line 311, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['docker', 'login', '-u', '$oauthtoken', '-p', '<_my_API_key_here_', 'nvcr.io/v2']' returned non-zero exit status 1.
From a naive user prospective, if I run from singularity (i.e. outside of Docker ecosystem) and all I want is to dump a bunch of image files, I should not be needing a functional Docker daemon on the host, right? Would it be possible for the replicator to detect such condition?
Our actual implementation for downloading containers from NGC makes use of the Docker SDK, which makes use of a connection to the host's docker daemon. So this workflow does rely on a functional Docker daemon.
We'd be open to removing this dependency, but we don't have any plans to work on this right now.
Thank you Adam, looking forward to this! I mean, even a '--no-docker-daemon' plug would work for my use case :)
This is a show stopper for me at my company. Looks like we are going to have create our own tool. Sad no one is addressing this.