sandbox quick start guide fails to submit jobs
Describe the bug
Following the linux quickstart guide for spinning up a deployment with docker-compose fails
To Reproduce Steps to reproduce the behavior:
- Follow guide and spin up sandbox deployment with docker-compse
- Follow guide and install python modules into virtual env
- Submit a job with APIusing following script
#!/usr/bin/env python
from outline import Outline
from outline.cuerun import OutlineLauncher
from outline.modules.shell import Shell
from outline.depend import DependType
layer1 = Shell('layer1', command=['sleep 1'], range='1001-1100', threads=1, threadable=True)
layer2 = Shell('layer2', command=['echo $CUE_IFRAME'], range='1001-1100', threads=1, threadable=True)
layer2.depend_on(on_layer=layer1, depend_type=DependType.FrameByFrame)
ol = Outline(name='testing', name_unique=True)
ol.add_layer(layer1)
ol.add_layer(layer2)
launcher = OutlineLauncher(ol)
jobs = launcher.launch(False)
print(jobs)
Get the following error
jobs = launcher.launch(False)
File "/home/idris/projects/opensource/ASWF/OpenCue/venv_py2/local/lib/python2.7/site-packages/outline/cuerun.py", line 219, in launch
return self.__get_backend_module().launch(self, use_pycuerun=use_pycuerun)
File "/home/idris/projects/opensource/ASWF/OpenCue/venv_py2/local/lib/python2.7/site-packages/outline/backend/cue.py", line 124, in launch
jobs = opencue.api.launchSpecAndWait(launcher.serialize(use_pycuerun=use_pycuerun))
File "/home/idris/projects/opensource/ASWF/OpenCue/venv_py2/local/lib/python2.7/site-packages/opencue/util.py", line 57, in _decorator
exception(exception.failMsg.format(details=details)))
File "/home/idris/projects/opensource/ASWF/OpenCue/venv_py2/local/lib/python2.7/site-packages/opencue/util.py", line 44, in _decorator
return grpcFunc(*args, **kwargs)
File "/home/idris/projects/opensource/ASWF/OpenCue/venv_py2/local/lib/python2.7/site-packages/opencue/api.py", line 378, in launchSpecAndWait
job_pb2.JobLaunchSpecAndWaitRequest(spec=spec), timeout=Cuebot.Timeout).jobs
File "/home/idris/projects/opensource/ASWF/OpenCue/venv_py2/local/lib/python2.7/site-packages/grpc/_channel.py", line 533, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/home/idris/projects/opensource/ASWF/OpenCue/venv_py2/local/lib/python2.7/site-packages/grpc/_channel.py", line 467, in _end_unary_response_blocking
raise _Rendezvous(state, None, None, deadline)
opencue.exception.CueInternalErrorException: Server caught an internal exception. Failed to launch and add job: Failed to parse job spec XML, java.io.FileNotFoundException: http://localhost:8080/spcue/dtd/cjsl-1.11.dtd
Expected behavior
Job should submit without error
Additional context
@larsbijl comment here highlights the cause of the inssue. The sandbox docker-compose is pulling an outdated cuebot image from opencue/cuebot
cuebot:
image: opencue/cuebot
Possible Solutions
One solution is to build the docker image from source by modifying the docker-compose.yml:
cuebot:
build:
context: ./
dockerfile: ./cuebot/Dockerfile
And then doing a build before running:
docker-compose --project-directory . -f sandbox/docker-compose.yml build
This has the drawback that the build process is quite slow, and this still requires updating the docs to add the build step.
The other, more desirable, solution is to push updated docker images.
@bcipriano should we cut a new release since we have merged a few DB changes.
Yeah, the issue here is that Docker images are only pushed to Docker Hub on release, while Docker compose runs from master.
Quick fix is as you said, we can do a new release to push new images. I'll work on this ASAP.
But ultimately this will keep happening, as master will always lead the release, so we will need a better long term solution. A couple of possibilities:
- Build the images directly from master as Idris mentioned, though this will be slow.
- Change our Github pipelines to publish Docker images on every commit to master. This should be fine for Docker compose but could cause issues in other places as the
latesttag on Docker Hub will no longer point to the latest release, but rather the latest commit to master. Maybe this is ok? I would have to think through this some more. - Change the Docker compose setup or instructions to use a specific tag from the repo -- basically you will need to check out a specific release locally to ensure it matches the released version on Github.
(2) sounds like the best option to me, any thoughts?
I think the main time we will run into issues with this is when there have been non backwards compatible changes between cuebot and the client packages. In the case above:
- pyoutline in master using the
cjsl-1.11.dtdjob spec - but the published cuebot image not containing that spec yet
When these sorts of changes are merged we tend to incrememnt the minor version. So perhaps we can configure the Github pipeline to publish docker images when that changes?
I have a similar issue. The difference is that I run the Docker Hub images (I simply run docker-compose up -d at the repo root, which spawn db, cuebot and rqd services based on Docker Hub images), and I use the tools from my cloned repo (so I install them with python setup.py install etc.).
I can create jobs using cuesubmit and I can see that they are inserted in the DB, but they are pending forever:

Would you know from where could come the bug? Why the rqd service don't see the pending jobs?