mdanalysis icon indicating copy to clipboard operation
mdanalysis copied to clipboard

Enabling of `MDAnalysis.analysis.align.AverageStructure` parallelization

Open talagayev opened this issue 1 year ago • 7 comments

Fixes #4659 attempt

Changes made in this Pull Request:

  • added backends and aggregators to AlignTraj and AverageStructure in analysis.align.
  • added the client_AlignTraj and client_AverageStructure in conftest.py
  • added client_AlignTraj and client_AverageStructure in run() in test_align.py

Currently for AlignTraj it only accepts serial and dask with multiprocessing leading to the pytests taking forever. An additional error that appears is the following:

OSError: File opened in mode: self.mode. Reading only allow in mode "r"

For AverageStructure the Failure that appears is the following:

AttributeError: 'numpy.ndarray' object has no attribute 'load_new'

Which leads me to believe that AverageStructure can not be parallelized, but I would need additional opinions on it and on AlignTraj :)

PR Checklist

  • [x] Tests?
  • [x] Docs?
  • [x] CHANGELOG updated?
  • [x] Issue raised/referenced?

Developers certificate of origin


📚 Documentation preview 📚: https://mdanalysis--4738.org.readthedocs.build/en/4738/

talagayev avatar Oct 18 '24 18:10 talagayev

Hello @talagayev! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 310:80: E501 line too long (86 > 79 characters) Line 327:80: E501 line too long (87 > 79 characters) Line 333:80: E501 line too long (97 > 79 characters) Line 357:80: E501 line too long (85 > 79 characters) Line 377:80: E501 line too long (91 > 79 characters)

Comment last updated at 2025-01-11 21:40:18 UTC

pep8speaks avatar Oct 18 '24 18:10 pep8speaks

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 92.73%. Comparing base (b62e2f7) to head (fbb85ac). :warning: Report is 2 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #4738      +/-   ##
===========================================
+ Coverage    92.72%   92.73%   +0.01%     
===========================================
  Files          180      180              
  Lines        22472    22491      +19     
  Branches      3188     3191       +3     
===========================================
+ Hits         20837    20858      +21     
+ Misses        1177     1176       -1     
+ Partials       458      457       -1     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Jan 11 '25 20:01 codecov[bot]

@marinegor I would ping you in this PR as well.

Here basically I tried different ways to see if it is possible to parallelize the AverageStructure and AlignTraj classes. I was able to implement the parallelization for AverageStructure with the first attempt not working due to being able to read a universe so that error appeared

AttributeError: 'numpy.ndarray' object has no attribute 'load_new'

I added _first to make the class parallelizable. This works well with the test, expect the case, when in_memory=True, then the process takes very long, which I assume is connected to memory issues, so in the current case I revert back to serial for cases when in_memory=True.

As for AlignTraj due to it transforming coordinates and writing out structures it would be necessary to rewrite more parts of the code to make it parallelizable, so I didn't find an easy solution for that, which wouldn't require bigger modifications of the class. So there the question would be if we keep it as non parallelizable or should I try to modify the code to make it parallelizable, which would require bigger modifications?

talagayev avatar Nov 26 '25 22:11 talagayev

@talagayev

expect the case, when in_memory=True, then the process takes very long, which I assume is connected to memory issues

I think parallelization should not actually work with in_memory cases (@yuxuanzhuang please correct me if I'm wrong, afaik you've been working on this). Hence I'd explicitly raise an exception if uses asks for parallel execution and provides in_memory as well.

So there the question would be if we keep it as non parallelizable or should I try to modify the code to make it parallelizable, which would require bigger modifications?

If you're running out of ideas, I'd suggest making this PR for AverageStructure only, and create appropriate issue for AlignTraj, describing your attempts so far.

Also, I imagine there are problems with serialization of self._writer, no? Perhaps we can chat on discord about it (I'm @marinegor there)?

marinegor avatar Nov 26 '25 23:11 marinegor

@talagayev

expect the case, when in_memory=True, then the process takes very long, which I assume is connected to memory issues

I think parallelization should not actually work with in_memory cases (@yuxuanzhuang please correct me if I'm wrong, afaik you've been working on this). Hence I'd explicitly raise an exception if uses asks for parallel execution and provides in_memory as well.

So there the question would be if we keep it as non parallelizable or should I try to modify the code to make it parallelizable, which would require bigger modifications?

If you're running out of ideas, I'd suggest making this PR for AverageStructure only, and create appropriate issue for AlignTraj, describing your attempts so far.

Also, I imagine there are problems with serialization of self._writer, no? Perhaps we can chat on discord about it (I'm @marinegor there)?

Yes makes sense. I think the current two ones that use in_memory and are analysis related are AverageStructure and AlignTraj.

Yes that would be good, I can then rename the PR to cover only AverageStructure for now, add the missing parts for the PR (Documentation + Changelog), create an Issue and write you on Discord, so that we can brainstorm how to adjust the code to make it parallelizable. Yes self._writer is one of the difficulties. I guess for the aligntraj you can adjust the code to give it the reference, but yes the writing during parallelization is the tricky part, maybe with tmp information that is then merged or maybe just doing the calculations and the writing is then in conclude, basically keeping that part serial and only making the calculations parallel.

talagayev avatar Nov 27 '25 01:11 talagayev

@talagayev ok, will be waiting for your message. I also assigned myself a reviewer here, so just re-request review when you think you're done!

marinegor avatar Nov 27 '25 21:11 marinegor

@talagayev ok, will be waiting for your message. I also assigned myself a reviewer here, so just re-request review when you think you're done!

Added the Documentation, CHANGELOG and adjust to raise and ValueError. The PR would be ready to be re-reviewed :)

talagayev avatar Nov 28 '25 13:11 talagayev

@talagayev all looking good! I initially commented on one extra line but just realized I can remove it myself.

Also, may I ask you to create a separate issue for AlignTraj parallelization? Perhaps you could describe the issues you encountered there, and suggest the direction in which one should move to actually enable it.

Perfect :)

Yes, I can create an Issue and write some Ideas in there.

talagayev avatar Dec 15 '25 17:12 talagayev

/azp run

marinegor avatar Dec 15 '25 18:12 marinegor

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Dec 15 '25 18:12 azure-pipelines[bot]

@talagayev and regarding AlignTraj -- I'd just be bold and say that it's impossible to parallelize that with current split-and-combine technique, since that would require an ability to write an arbitrary frame with self._writer, and not only sequential writing. I don't think any writers allow that.

So even in this PR, set available_backends to only serial explicitly, and note that it's impossible to parallelize. And as for the next issue, I'd target adding an option to save aligned trajectory to self.results, and writing it in _conclude.

marinegor avatar Dec 16 '25 11:12 marinegor