Generalise submission and cancellation arguments
Closes #640
I found that the HTCondor class already had something similar. I've added this to the base Job class in core. The downside is that this adds some extra boilerplate.
I've reworked the HTCondorJob and SLURMJob to make use of the new functionality
Also, the test failures in slurm (and maybe others) looks related to this change.
CI is now mostly happy on main so I've just merged in so we can see up to date CI failures.
Sorry for the long lead time on this, everyone. I got myself tripped up between the methods on the base and inheriting classes - as noted by @guillaumeeb's comment. I believe I've got this all sorted now. Hopefully the CI will catch any lingering issues.
Thanks @AlecThomson! Looks like there are some linting issues (make sure you run pre-commit install) and some slurm issues.
Hmm - looks like some kind of timing error on the test. I don't quite understand why it's failing... 🤔
> assert time() < start + QUEUE_WAIT
E assert 1723021595.3378592 < (1723021535.2718754 + 60)
E + where 1723021595.3378592 = time()
https://github.com/dask/dask-jobqueue/actions/runs/10278490598/job/28450471395#step:7:425
It is calling cluster.scale(n) and then waiting for the cluster to scale. The time assertion is just a timeout, so it's not scaling to the correct number in the time allowed.
Note: We don't use client.wait_for_workers(n) because that checks for "at least n workers" so doesn't wait when scaling down (xref https://github.com/dask/distributed/pull/6374).