update to run_sorter_jobs() and slurm
I am trying to run spikeinterface sorters on a HPC cluster and needed to pass specific parameters to slurm/sbatch.
I added the option to pass arbitrary arguments to sbatch in the engine_kwargs.
Let me know if that's causing any issues
@MarinManuel thank you! This is great!
The SLURM deployment has been unexplored.on our side, so we are very glad that it works for you.
Would you mind sharing the script that you are using? I'd love to add an How To page on how to deploy multiple spike sorting jobs on SLURM
Hey @MarinManuel sorry for the delay in testing this locally, I tried it on our cluster and seems to be working well! I had a couple of other comments below, it might also be worth adding a test for this. I have pasted one in the dropdown below, I don't think I can push to your fork, but feel free to add it to src/spikeinterface/sorters/tests/test_launcher.py underneath test_run_sorter_jobs_slurm.
If you think of any functionality this is missing, please let me know / feel free to expand the test. This will also require adding pytest-mock to the dev dependencies in pyproject.toml in the main repo folder. It would be cool to also test the generated .py files sometime, but that's outside the scope of this PR.
The test
def test_run_sorter_jobs_slurm_kwargs(mocker, tmp_path, job_list):
"""
Mock `subprocess.run()` to check that engine_kwargs are
propagated to the call as expected.
"""
# First, mock `subprocess.run()`, set up a call to `run_sorter_jobs`
# then check the mocked `subprocess.run()` was called with the
# expected signature. Two jobs are passed in `jobs_list`, first
# check the most recent call.
mock_subprocess_run = mocker.patch(
"spikeinterface.sorters.launcher.subprocess.run"
)
tmp_script_folder = tmp_path / "slurm_scripts"
engine_kwargs = engine_kwargs=dict(
tmp_script_folder=tmp_script_folder,
sbatch_args={
"cpus-per-task": 32,
"mem": "32G",
"gres": "gpu:1",
"any_random_kwarg": 12322,
}
)
run_sorter_jobs(
job_list,
engine="slurm",
engine_kwargs=engine_kwargs,
)
script_0_path = f"{tmp_script_folder}/si_script_0.py"
script_1_path = f"{tmp_script_folder}/si_script_1.py"
expected_command = [
"sbatch", "--cpus-per-task", "32", "--mem", "32G", "--gres", "gpu:1", "--any_random_kwarg", "12322", script_1_path
]
mock_subprocess_run.assert_called_with(expected_command, capture_output=True, text=True)
# Next, check the fisrt call (which sets up `si_script_0.py`)
# also has the expected arguments.
expected_command[9] = script_0_path
assert mock_subprocess_run.call_args_list[0].args[0] == expected_command
# Next, check that defaults are used properly when no kwargs are
# passed. This will default to `_default_engine_kwargs` as
# set in `launcher.py`
run_sorter_jobs(
job_list,
engine="slurm",
engine_kwargs={"tmp_script_folder": tmp_script_folder},
)
expected_command = [
"sbatch", "--cpus-per-task", "1", "--mem", "1G", script_1_path
]
mock_subprocess_run.assert_called_with(expected_command, capture_output=True, text=True)
# Finally, check that the `tmp_script_folder` is generated on the
# fly as expected. A random foldername is generated, just check that
# the folder to which the scripts are saved is in the `tempfile` format.
run_sorter_jobs(
job_list,
engine="slurm",
engine_kwargs=None, # TODO: test defaults
)
tmp_script_folder = "_".join(tempfile.mkdtemp(prefix="spikeinterface_slurm_").split("_")[:-1])
assert tmp_script_folder in mock_subprocess_run.call_args_list[-1].args[0][5]
Some other small things:
- I think that the other, currently existing test (
test_run_sorter_jobs_slurm) might need to be updated to the new way of passing thesbatch_args. - I think it is worth adding an error if any other key-value pair exists in
engine_kwargsexcept fortmp_script_folderorsbatch_args. The reason is this PR is backward-incompatible with previous versions but will not error, only silently fail to pass the kwargs onwards to slurm (e.g. ifcpus_per_taskis passed toengine_kwargsdirectly as it was previously, it will just be ignored). So good to raise an error here to let the user know about the new API. - Also, might as well also assert / raise an error if that
engine_kwargsis only notNonewhenengine="slurm"/
Hi @JoeZiminski I tried to integrate your comments to my PR. Hopefully I did it right. The only thing I did not do was
Also, might as well also assert / raise an error if that engine_kwargs is only not None when engine="slurm"/
I wasn't entirely sure what you meant, but I think sbatch would launch with default parameters if engine_kwargs=None, so that shouldn't be a problem?
Also, I didn't know how to write a test to check that the exception was raised correctly.
Hey @MarinManuel, so sorry for the delay with this! I'm around again now, please feel free to ping me and we can merge this ASAP.
Just a couple of small things I think:
- please see a couple of small comments on the tests
- there are a few github suggestions (e.g. from Zach) about [edit: above]. These can be commited by selected 'commit suggestion', changing the commit message (e.g. just 'docstring fix' or something is okay) and then 'commit changes'. It's a cool github feature!
- Any other comments that are outdated or have been addressed can be resolved.
Also, might as well also assert / raise an error if that engine_kwargs is only not None when engine="slurm"
Sorry this was the other way round, in case a user who is unfamiliar with SLURM e.g. using 'dask' say but passes the engine_kwargs under some false impression these will do something for all engines. Alternatively we could rename engine_kwargs to slurm_engine_kwargs as they only relate to slurm. Actually this is better, maybe they can be called slurm_engine_kwargs or just slurm_kwargs?
After this I can do one last pass and test locally again then I think it's good to go! Apologies the review took so long, this new feature is a super-useful addition.
Hi @JoeZiminski Please let me know how that looks
Hi @MarinManuel, I'm incredibly sorry I misunderstood from the docstring how engine kwargs was intended, I thought it was just for SLURM from the below:
old docstring
engine_kwargs : dict
In the case of engine="slum", possible kwargs are:
- tmp_script_folder: str, default None
the folder in which the job scripts are created. Default: directory created by
the `tempfile` library
- sbatch_args: dict
arguments to be passed to sbatch. They will be automatically prefixed with --.
Arguments must be in the format slurm specify, see the
[documentation for `sbatch`](https://slurm.schedmd.com/sbatch.html) for a list of possible arguments
It might be worth reverting that change such that the all kwargs come under 'engine kwargs' as before. The new docstring can look as below. I'm happy to do this if you give me access to your fork as I appreciate it will be annoying to undo something you just did due to my mistake!
New docstring
engine_kwargs : dict
Parameters to be passed to the underlying engine.
Defaults are:
* loop : None
* joblib : n_jobs=1, backend="loky"
* multiprocessing : max_workers=2, mp_context=None
* dask : client=None
* slurm : tmp_script_folder=None
for the `slurm` entry:
This dictionary contains arguments to be passed to sbatch.
They will be automatically prefixed with --.
Arguments must be in the format slurm specify, see the
[documentation for `sbatch`](https://slurm.schedmd.com/sbatch.html) for a list of possible arguments
Some other very minor things that came up when manually testing, then I think good to go!
- Need to add
pytest-mockhere and here in thepyproject.tomlas it was a new dependency added in the test. - the test is failing on windows as some
oscommands used in therun_sorter_jobassume we are on Linux. In the end we should skip only when not running on Linux. We need to addfrom platform import systemat the top oftest_launcher.pyand@pytest.mark.skipif(system() != 'Linux', reason="Assumes we are on Linux to run SLURM"). - I was manually testing and it was hard to figure out what was being passed to slurm by the CLI. I think we can add a quick print to before the subprocess call to make this clear, e.g. adding
print(f"subprocess called with command {progr}\n")here.
Then all good from my end, sorry about this last-minute back and forth and thanks again for this super useful addition!
Hi @MarinManuel thanks for this, apologies please resend the invite I will fix the tests and conflicts, thanks!
Hi @MarinManuel. Thanks a lot for this improvement. We ultra sorry for the long delay. This is OK for us.
Thank you @JoeZiminski for taking care of this.