missing isofrags.tar.gz in parallel mode (v0.2.3b)
Not sure if this is a bug or specific for my use case.
When running rail in parallel mode using ipcluster with Slurm I get a RuntimeError that isofrags.tar.gz does not exist. If I restart from that point everything finishes cleanly.
If I run rail in parallel on a single node with ipcluster (i.e. local instead of slurm) everything runs cleanly.
I am guessing it has something to do with using slurm. Probably not your problem, only bring it up because there is a mention of this in a commit log on the parallel branch. Please let me know if you have a known fix or a suggestion what might be going on.
Thanks Justin
Thanks for the bug report! So the error output is exactly The file isofrags.tar.gz does not exist and thus cannot be cached.?
Sounds like a race condition. Still somewhat mysterious to me, but in dooplicity/emr_simulator.py try replacing
if not os.path.isfile(file_or_archive):
iface.fail(('The file %s does not exist and thus cannot '
'be cached.') % file_or_archive,
steps=(job_flow[step_number:]
if step_number != 0 else None))
failed = True
raise RuntimeError
(lines 1422-1427) with something like
retries = 0
while not os.path.isfile(file_or_archive):
time.sleep(1)
retries += 1
if retries > 5: break
if not os.path.isfile(file_or_archive):
iface.fail(('The file %s does not exist and thus cannot '
'be cached.') % file_or_archive,
steps=(job_flow[step_number:]
if step_number != 0 else None))
failed = True
raise RuntimeError
and let me know what happens.
This fixes the problem #37