OpenCue icon indicating copy to clipboard operation
OpenCue copied to clipboard

Add sample RQD Dockerfile with CUDA base image

Open n-jay opened this issue 2 years ago • 8 comments

Link the Issue(s) this Pull Request is related to. Related to PR https://github.com/AcademySoftwareFoundation/OpenCue/pull/1309. As discussed in the mail thread Blender plugin development.

Summarize your change. Adds a sample RQD Dockerfile with CUDA supported base image for GPU rendering. Tested with new derived RQD Blender image.

Works in combination with Nvidia Container Toolkit.

n-jay avatar Nov 07 '23 17:11 n-jay

General thought here -- the Dockerfile here seems to duplicate much of the standard RQD image, though I understand it's using a different base image.

It's possible to have a Dockerfile which selectively copies files from another image using COPY --from. I wonder if this CUDA Dockerfile could do that -- start with the CUDA base image, copy the RQD tarball from the base RQD image, then do anything else that's CUDA specific.

This would help keep this image in sync with the base RQD image. Otherwise we will make changes to the base RQD image and forget to update this one.

Could you give that a try?

bcipriano avatar Nov 08 '23 15:11 bcipriano

I'm with Brian on the idea that it would be better if we could copy the rqd tarball from the base image

DiegoTavares avatar Nov 08 '23 18:11 DiegoTavares

Noted @bcipriano @DiegoTavares, that's a lot more efficient. Just to clarify, would this be the tarball generated in the /opt/opencue directory of the image?

...then do anything else that's CUDA specific.

Actually, there are no other changes done on the Dockerfile other than the inclusion of the base CUDA image. Everything else is dependent on the Nvidia driver and Container Toolkit installation on the host machine beforehand. But now that you mentioned it, I'll include the nvidia-smi command at end of this Dockerfile to verify correct installation of all CUDA components.

Additionally, should add some documentation about this but not quite sure where. Perhaps as an amendment to the Deploying RQD page?

n-jay avatar Nov 09 '23 16:11 n-jay

Just to clarify, would this be the tarball generated in the /opt/opencue directory of the image?

Yeah, that's probably the best way as you'll just need to copy that single file, everything is self-contained. Will be stored as /opt/opencue/rqd-{version}-all.tar.gz I believe.

Once the file is copied you'll still need to do any steps needed to install and run RQD.

Additionally, should add some documentation about this but not quite sure where. Perhaps as an amendment to the Deploying RQD page?

Hmm, how about this -- we have Customizing RQD, you could add a section there called like "Sample Dockerfiles" which links to the samples/rqd/ directory in the repo.

You could also add the Customizing RQD page to the Deploying RQD "What's Next?" section. That seems like it would flow nicely.

bcipriano avatar Nov 09 '23 22:11 bcipriano

@bcipriano, as suggested I implemented the COPY --from command for the tarball as well as RQD config file and the proto directory which were not included with the tarball extraction. Tested with CueBot and seems to be connecting and working as expected.

I'll include the nvidia-smi command at end of this Dockerfile

Turns out that the command only works when mounted to a GPU as in the docker run command, which seemed unnecessary just for this Dockerfile, and also doesn't seem to be possible.

we have Customizing RQD, you could add a section there called like "Sample Dockerfiles" which links to the samples/rqd/ directory in the repo. You could also add the Customizing RQD page to the Deploying RQD "What's Next?" section. That seems like it would flow nicely.

Noted. That sounds good, will get on it.

Also, a couple of things to clarify:

  1. Would there be a graceful way to resolve the version number used for the tarball name via a variable? Would help dynamically get it for use in the tarball filename. https://github.com/AcademySoftwareFoundation/OpenCue/blob/e05eb27059ef36406442d32d790fc7b3cd6db135/samples/rqd/cuda/Dockerfile#L19

    I tried this out but seem to be having some trouble with my current test implementation extracting the value from the VERSION file and assigning it to an ARG or ENV variable.

    COPY --from=opencue/rqd /opt/opencue/VERSION /opt/opencue/VERSION
    ENV VERSION=""
    RUN cat /opt/opencue/VERSION > $VERSION
    
    COPY --from=opencue/rqd /opt/opencue/rqd-${VERSION}-custom-all.tar.gz /opt/opencue/rqd-${VERSION}-custom-all.tar.gz
    
  2. Are the gRPC related instructions like the one below required for the installation since I'm importing the proto directory also from the opencue/rqd image? If its redundant, will remove. https://github.com/AcademySoftwareFoundation/OpenCue/blob/e05eb27059ef36406442d32d790fc7b3cd6db135/samples/rqd/cuda/Dockerfile#L29-L33

n-jay avatar Nov 21 '23 04:11 n-jay

@bcipriano, as suggested I implemented the COPY --from command for the tarball as well as RQD config file and the proto directory which were not included with the tarball extraction. Tested with CueBot and seems to be connecting and working as expected.

I'll include the nvidia-smi command at end of this Dockerfile

Turns out that the command only works when mounted to a GPU as in the docker run command, which seemed unnecessary just for this Dockerfile, and also doesn't seem to be possible.

we have Customizing RQD, you could add a section there called like "Sample Dockerfiles" which links to the samples/rqd/ directory in the repo. You could also add the Customizing RQD page to the Deploying RQD "What's Next?" section. That seems like it would flow nicely.

Noted. That sounds good, will get on it.

Also, a couple of things to clarify:

  1. Would there be a graceful way to resolve the version number used for the tarball name via a variable? Would help dynamically get it for use in the tarball filename. https://github.com/AcademySoftwareFoundation/OpenCue/blob/e05eb27059ef36406442d32d790fc7b3cd6db135/samples/rqd/cuda/Dockerfile#L19

    I tried this out but seem to be having some trouble with my current test implementation extracting the value from the VERSION file and assigning it to an ARG or ENV variable.

    COPY --from=opencue/rqd /opt/opencue/VERSION /opt/opencue/VERSION
    ENV VERSION=""
    RUN cat /opt/opencue/VERSION > $VERSION
    
    COPY --from=opencue/rqd /opt/opencue/rqd-${VERSION}-custom-all.tar.gz /opt/opencue/rqd-${VERSION}-custom-all.tar.gz
    
  2. Are the gRPC related instructions like the one below required for the installation since I'm importing the proto directory also from the opencue/rqd image? If its redundant, will remove. https://github.com/AcademySoftwareFoundation/OpenCue/blob/e05eb27059ef36406442d32d790fc7b3cd6db135/samples/rqd/cuda/Dockerfile#L29-L33

Sorry for taking ages to reply to this:

  1. Unfortunately COPY runs on the build environment, so it doesn't have access to variables set by RUN. I don't see a simple solution here, maybe try copying with a regex:
COPY --from=opencue/rqd /opt/opencue/rqd-*-all.tar.gz /opt/opencue/
  1. If you're copying the tarball from the rqd image, you don't need to run the build steps.

DiegoTavares avatar Feb 14 '24 19:02 DiegoTavares

No worries @DiegoTavares 😄. Took me a while to get back to this also.

Unfortunately COPY runs on the build environment, so it doesn't have access to variables set by RUN. I don't see a simple solution here, maybe try copying with a regex

Noted. This worked, thanks! Resolved in https://github.com/AcademySoftwareFoundation/OpenCue/pull/1327/commits/85dae74749da0bde9ecf135b860a0974b8ec4b64 However there seems to be a linting issue in an unrelated service.py wrapper.

n-jay avatar Mar 08 '24 09:03 n-jay

Ping @DiegoTavares

n-jay avatar Apr 14 '24 12:04 n-jay