mdanalysis icon indicating copy to clipboard operation
mdanalysis copied to clipboard

[HOLD] Re-implement distance_array using a batched algorithm and add compiled iterator layer [Core, WIP]

Open hmacdope opened this issue 3 years ago • 4 comments

Fixes #3708

UPDATE

This PR is currently on hold pending decisions around batched distance arrays.

Aim:

  • Add a batched distance array algorithm for efficient memory use.
  • Allow distance routines to accept NumPy arrays or AtomGroups usign a compiled iterator.

Second large set of changes of CZIEOSS4 performance track. I have so far only changed _distance_array without a box but probably at a good stage to discuss now before moving on to change the rest of calc_distances.h

Changes made in this pull request:

  • Introduces the Iterator layer to homogenise behaviour between ndarrays and AtomGroups at the C++ level with a shared interface.
  • Allows distance routines to accept either ndarray or AtomGroup in any position using templates. This stops us having to write a huge number of overloads or use Cython polymorphism which is pretty annoying. For example _calc_dihedrals(a,b,c,d) would require 24 overloads for flexibility at every position.
  • Introduces a batched distance array algorithm that uses stack allocated arrays. Testing of the C++ with googlebenchdemonstrated this to be very fast with a modest memory footprint. This concept also links in well with the structure in distopia and possible some CUDA code.

Status:

  • [x] Finalise overhang part of batched distance array calculation (currently not working for ncoords>batchsize)
  • [ ] Change remainder of _calc_distances.h
  • [x] Finalise docs for iterator API
  • [ ] Make sure types are consistent (int vs uint64_t etc)
  • [ ] Add tests for AtomGroup distance calculations
  • [ ] Check performance with benchmarks.

Design Ideas:

We need a AtomGroup to behave like a NumPy array at the C++ level. We achieve this using C++ classes that provide a shared interface for iteration over float* coordinate arrays with different implementations for their specific datastructure.

These are then wrapped in a thin Cython class wrapper. AtomGroups or NumPy arrays are then turned into their respective iterators before being passed to the C++ level.

If people like this pattern, other groups can have iterators written very easily. They just need to share an interface with the others.

PR Checklist

  • [ ] Tests?
  • [X] Docs?
  • [ ] CHANGELOG updated?
  • [X] Issue raised/referenced?

hmacdope avatar Jun 15 '22 07:06 hmacdope

Hello @hmacdope! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 179:80: E501 line too long (151 > 79 characters) Line 236:80: E501 line too long (81 > 79 characters)

Line 2063:80: E501 line too long (82 > 79 characters) Line 2072:80: E501 line too long (81 > 79 characters) Line 2073:80: E501 line too long (81 > 79 characters) Line 2079:80: E501 line too long (87 > 79 characters) Line 2089:33: E128 continuation line under-indented for visual indent Line 2122:80: E501 line too long (81 > 79 characters) Line 2126:80: E501 line too long (81 > 79 characters)

Line 372:80: E501 line too long (91 > 79 characters) Line 453:26: E128 continuation line under-indented for visual indent Line 453:80: E501 line too long (89 > 79 characters) Line 455:80: E501 line too long (80 > 79 characters) Line 458:42: E251 unexpected spaces around keyword / parameter equals

Line 380:1: W293 blank line contains whitespace Line 382:80: E501 line too long (82 > 79 characters) Line 392:80: E501 line too long (86 > 79 characters) Line 397:1: W293 blank line contains whitespace Line 398:62: E231 missing whitespace after ',' Line 399:29: E128 continuation line under-indented for visual indent Line 399:56: E231 missing whitespace after ',' Line 400:80: E501 line too long (89 > 79 characters) Line 401:80: E501 line too long (82 > 79 characters) Line 416:1: W293 blank line contains whitespace Line 417:1: W293 blank line contains whitespace Line 418:5: E303 too many blank lines (2) Line 425:80: E501 line too long (86 > 79 characters)

Comment last updated at 2022-06-20 06:34:19 UTC

pep8speaks avatar Jun 15 '22 07:06 pep8speaks

Codecov Report

Merging #3718 (53c4b4e) into develop (a2f9e94) will decrease coverage by 0.76%. The diff coverage is 79.72%.

@@             Coverage Diff             @@
##           develop    #3718      +/-   ##
===========================================
- Coverage    94.27%   93.50%   -0.77%     
===========================================
  Files          192      188       -4     
  Lines        24695    24258     -437     
  Branches      3661     3753      +92     
===========================================
- Hits         23280    22682     -598     
+ Misses        1366     1061     -305     
- Partials        49      515     +466     
Impacted Files Coverage Δ
package/MDAnalysis/core/group_iterators.pyx 52.38% <52.38%> (ø)
package/MDAnalysis/lib/distances.py 96.51% <78.94%> (-1.97%) :arrow_down:
package/MDAnalysis/lib/util.py 89.43% <96.15%> (-1.41%) :arrow_down:
package/MDAnalysis/lib/c_distances.pyx 98.11% <100.00%> (-1.89%) :arrow_down:
package/MDAnalysis/lib/c_distances_openmp.pyx 95.38% <100.00%> (-4.62%) :arrow_down:
...ckage/MDAnalysis/analysis/encore/confdistmatrix.py 67.41% <0.00%> (-10.12%) :arrow_down:
package/MDAnalysis/analysis/encore/covariance.py 84.61% <0.00%> (-9.24%) :arrow_down:
package/MDAnalysis/coordinates/DMS.py 83.63% <0.00%> (-9.10%) :arrow_down:
package/MDAnalysis/coordinates/chemfiles.py 88.46% <0.00%> (-8.80%) :arrow_down:
package/MDAnalysis/converters/OpenMM.py 88.88% <0.00%> (-8.34%) :arrow_down:
... and 97 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update a2f9e94...53c4b4e. Read the comment docs.

codecov[bot] avatar Jun 20 '22 01:06 codecov[bot]

Some further benchmarking has shown that there doesn't appear a large speedup to be gained using a compiled iterator vs using a Python NumPy index call.

Given the large changes proposed in this PR for apparently limited gain I think I will switch tack and use a call to .positions. It's a shame to invest a fair bit into this design and then change tack but I think we needed to explore it fully to make sure it wasn't the right fit. Regardless lesson learnt about the speed of NumPy indexing.

The batching of coordinates using stack allocated memory as is done batched_distances.h does however hold some promise with a 2-8% speedup compared to current develop. It also fits more into the style we had in distopia but would be a large rewrite of calc_distance.h so perhaps best shelved for now.

With that in mind would people prefer I leave this here and open another PR to address the original issue (#3708 )? @richardjgowers or should I just continue here.

hmacdope avatar Jun 21 '22 12:06 hmacdope

If there are no objections I will move the solution to the original issue (#3708) to a new PR and rename this PR accordingly? Will do tomorrow if no objections.

hmacdope avatar Jun 22 '22 07:06 hmacdope

@richardjgowers and I moved away from this idea, I am going to close, feel free to re-open if we want to revive some aspect of this PR.

hmacdope avatar Jun 26 '23 21:06 hmacdope