OpenCue icon indicating copy to clipboard operation
OpenCue copied to clipboard

sandbox quick start guide fails to submit jobs

Open IdrisMiles opened this issue 4 years ago • 4 comments

Describe the bug

Following the linux quickstart guide for spinning up a deployment with docker-compose fails

To Reproduce Steps to reproduce the behavior:

  1. Follow guide and spin up sandbox deployment with docker-compse
  2. Follow guide and install python modules into virtual env
  3. Submit a job with APIusing following script
#!/usr/bin/env python
from outline import Outline
from outline.cuerun import OutlineLauncher
from outline.modules.shell import Shell
from outline.depend import DependType

layer1 = Shell('layer1', command=['sleep 1'], range='1001-1100', threads=1, threadable=True)
layer2 = Shell('layer2', command=['echo $CUE_IFRAME'], range='1001-1100', threads=1, threadable=True)
layer2.depend_on(on_layer=layer1, depend_type=DependType.FrameByFrame)

ol = Outline(name='testing', name_unique=True)
ol.add_layer(layer1)
ol.add_layer(layer2)

launcher = OutlineLauncher(ol)

jobs = launcher.launch(False)
print(jobs)

Get the following error

    jobs = launcher.launch(False)
  File "/home/idris/projects/opensource/ASWF/OpenCue/venv_py2/local/lib/python2.7/site-packages/outline/cuerun.py", line 219, in launch
    return self.__get_backend_module().launch(self, use_pycuerun=use_pycuerun)
  File "/home/idris/projects/opensource/ASWF/OpenCue/venv_py2/local/lib/python2.7/site-packages/outline/backend/cue.py", line 124, in launch
    jobs = opencue.api.launchSpecAndWait(launcher.serialize(use_pycuerun=use_pycuerun))
  File "/home/idris/projects/opensource/ASWF/OpenCue/venv_py2/local/lib/python2.7/site-packages/opencue/util.py", line 57, in _decorator
    exception(exception.failMsg.format(details=details)))
  File "/home/idris/projects/opensource/ASWF/OpenCue/venv_py2/local/lib/python2.7/site-packages/opencue/util.py", line 44, in _decorator
    return grpcFunc(*args, **kwargs)
  File "/home/idris/projects/opensource/ASWF/OpenCue/venv_py2/local/lib/python2.7/site-packages/opencue/api.py", line 378, in launchSpecAndWait
    job_pb2.JobLaunchSpecAndWaitRequest(spec=spec), timeout=Cuebot.Timeout).jobs
  File "/home/idris/projects/opensource/ASWF/OpenCue/venv_py2/local/lib/python2.7/site-packages/grpc/_channel.py", line 533, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/home/idris/projects/opensource/ASWF/OpenCue/venv_py2/local/lib/python2.7/site-packages/grpc/_channel.py", line 467, in _end_unary_response_blocking
    raise _Rendezvous(state, None, None, deadline)
opencue.exception.CueInternalErrorException: Server caught an internal exception. Failed to launch and add job: Failed to parse job spec XML, java.io.FileNotFoundException: http://localhost:8080/spcue/dtd/cjsl-1.11.dtd

Expected behavior

Job should submit without error

Additional context

@larsbijl comment here highlights the cause of the inssue. The sandbox docker-compose is pulling an outdated cuebot image from opencue/cuebot

cuebot:
    image: opencue/cuebot

Possible Solutions

One solution is to build the docker image from source by modifying the docker-compose.yml:

cuebot:
    build:
      context: ./
      dockerfile: ./cuebot/Dockerfile

And then doing a build before running:

docker-compose --project-directory . -f sandbox/docker-compose.yml build

This has the drawback that the build process is quite slow, and this still requires updating the docs to add the build step.

The other, more desirable, solution is to push updated docker images.

IdrisMiles avatar Feb 08 '21 22:02 IdrisMiles

@bcipriano should we cut a new release since we have merged a few DB changes.

larsbijl avatar Feb 08 '21 22:02 larsbijl

Yeah, the issue here is that Docker images are only pushed to Docker Hub on release, while Docker compose runs from master.

Quick fix is as you said, we can do a new release to push new images. I'll work on this ASAP.

But ultimately this will keep happening, as master will always lead the release, so we will need a better long term solution. A couple of possibilities:

  1. Build the images directly from master as Idris mentioned, though this will be slow.
  2. Change our Github pipelines to publish Docker images on every commit to master. This should be fine for Docker compose but could cause issues in other places as the latest tag on Docker Hub will no longer point to the latest release, but rather the latest commit to master. Maybe this is ok? I would have to think through this some more.
  3. Change the Docker compose setup or instructions to use a specific tag from the repo -- basically you will need to check out a specific release locally to ensure it matches the released version on Github.

(2) sounds like the best option to me, any thoughts?

bcipriano avatar Feb 09 '21 16:02 bcipriano

I think the main time we will run into issues with this is when there have been non backwards compatible changes between cuebot and the client packages. In the case above:

  • pyoutline in master using the cjsl-1.11.dtd job spec
  • but the published cuebot image not containing that spec yet

When these sorts of changes are merged we tend to incrememnt the minor version. So perhaps we can configure the Github pipeline to publish docker images when that changes?

IdrisMiles avatar Feb 09 '21 19:02 IdrisMiles

I have a similar issue. The difference is that I run the Docker Hub images (I simply run docker-compose up -d at the repo root, which spawn db, cuebot and rqd services based on Docker Hub images), and I use the tools from my cloned repo (so I install them with python setup.py install etc.).

I can create jobs using cuesubmit and I can see that they are inserted in the DB, but they are pending forever:

image

Would you know from where could come the bug? Why the rqd service don't see the pending jobs?

romainf-ubi avatar Nov 18 '22 21:11 romainf-ubi