[Espresso] support for in place `outdir`
Summary of Changes
As we discussed previously, here are the first step to support custom espresso outdir i.e without the need to copy files.
This is done here for the post_processing_job by adding an additional kwarg prev_outdir as you suggested previously.
Although this works fine locally, the test in Dask does not pass because multiple instances are trying to decompress the same directory, some files end up being already decompressed as a result other instances crash because "the file does not exist"
Checklist
- [ ] I have read the "Guidelines" section of the contributing guide. Don't lie! 😉
- [ ] My PR is on a custom branch and is not named
main. - [ ] I have added relevant, comprehensive unit tests.
Notes
- Your PR will likely not be merged without proper and thorough tests.
- If you are an external contributor, you will see a comment from @buildbot-princeton. This is solely for the maintainers.
- When your code is ready for review, ping one of the active maintainers.
Can one of the admins verify this patch?
Ooooh, this is a tricky one! I'm glad that this error was caught in CI though! This has never been an issue before because we have always been copying from a directory to a fresh one and decompressing there. Here, we have concurrent processes making changes to the same directory at the same time.
The decompress_dir function looks like the following:
path = Path(path)
for parent, _, files in os.walk(path):
for f in files:
decompress_file(Path(parent, f))
What if we made a custom decompress_dir function wraps the decompress_file call in a try/except block where the exception to catch is whatever you're seeing about the file not existing anymore (likely a FileNotFoundError)? Would that resolve the issues? This function could go in quacc.utils.files.
Sure! that seems like a reasonable idea.
If you have time could you explain why deleting the resolve fix the issue in the test? I can't get a feeling about it, just by curiosity
If you have time could you explain why deleting the resolve fix the issue in the test? I can't get a feeling about it, just by curiosity
There are two possible reasons. The first is simply one of chance: the issue you're running into is essentially a race condition between concurrent processes. Any change to the source code that even partially modifies the runtime of the calculation may avoid the race condition simply by chance. The second possible reason is that the line you highlighted above, static_results = client.compute(static_results).result(), is what runs the static calculation. If you remove the line, you are no longer running that static calculation anymore. It runs when it is dispatched to the Dask Client.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 99.13%. Comparing base (
47ec39e) to head (74757ce).
Additional details and impacted files
@@ Coverage Diff @@
## main #1996 +/- ##
==========================================
- Coverage 99.21% 99.13% -0.09%
==========================================
Files 81 81
Lines 3323 3336 +13
==========================================
+ Hits 3297 3307 +10
- Misses 26 29 +3
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Thanks for the explanation!
Alright, the "safe" decompress does work. However, it seems that QE internally write quite bit of temporary files in each outdir, as a results not all binaries will be compatible with this mode (concurrently).
The thing is that the mode is still interesting in a lot of scenarios, the only one that causes a problem is when you want to run the same binary multiple times from the same previous outdir (examples below).
At that point my proposal is that, we add a column in QE's recipes list such as prev_outdir safety and mark which binaries are compatible? That way users know which binaries are safe to use concurrently.
# Not working because multiple `projwfc.x` calculations will run in the same dir
for x in y:
projwfc_job(prev_outdir=results["dir_name"], input_data = {"weird_param": x})
# Not a problem , each outdir only run one projwfc calc, without the need
# to copy every files twice.
for atoms in atoms_list:
static_job_results = static_job(atoms)
projwfc_job(prev_outdir=static_job_results["dir_name"])
Arguably, the second case is much more common
Thanks for the explanation, @tomdemeyere. I will ultimately defer to your best judgment on this one since you know the inner workings of file I/O in Espresso best. I'd be okay with a separate column in the docs if you think that's the best option in this scenario. (Note: there are still some failed Actions to address).
Unfortunately not many binaries support concurrent writing to the same directory, I added a column in the recipe list to indicate which ones are compatible. For this reason, this new behavior is not default for jobs: copy_files is still the main mechanism. However, for premade flows, we know for sure that using the new mechanism is 100% safe so I switched to that.
This will be very helpful if users are restricted on scratch space, or if for some reason their file system is very slow... If you could indicate a place where I can write some doc about prev_outdir vs copy_files I would be very happy to do so, here or in another PR.
Quacc has the "Rolls-Royce" of Espresso interfaces at this point, I don't see any other (open-source) project having a better interface (not that it is a competition). Thanks for your patience with the whole thing, I am aware that QE required quite a special treatment but ultimately I am very proud of what we achieved in a few months, I hope that the interface will be used to perform very nice work. The whole thing should be pretty much done now (I know it is the second time I say that, but actually using the interface made me realize more work was needed)
Thank you very much your wonderful contributions, @tomdemeyere. I am very glad to see this come together. I will be happy to get this merged ASAP. No further work on this is needed from you at this time.
As for documentation, that is a good question. You have a few options:
- You could put it in the pre-existing documentation about file transfers https://github.com/Quantum-Accelerators/quacc/blob/main/docs/user/misc/file_transfers.md
- You could make a new document in the miscellaneous folder. Naturally, this is simple but not easily discoverable in the built docs.
- You could start a new docs section for recipe-specific information and have one for espresso, including the proposed
prev_outdirdocument. However, this might be a bit too much work to impose on oneself
Quacc has the "Rolls-Royce" of Espresso interfaces at this point, I don't see any other (open-source) project having a better interface (not that it is a competition). Thanks for your patience with the whole thing, I am aware that QE required quite a special treatment but ultimately I am very proud of what we achieved in a few months, I hope that the interface will be used to perform very nice work. The whole thing should be pretty much done now (I know it is the second time I say that, but actually using the interface made me realize more work was needed)
I greatly appreciate all you have done here! I think this has turned out splendidly, and I'm grateful for the many improvements to quacc that this effort has spawned.