mdanalysis icon indicating copy to clipboard operation
mdanalysis copied to clipboard

Accessible Surface Area calculations

Open JureCerar opened this issue 2 years ago • 13 comments

Fixes #2439

Changes made in this Pull Request:

  • Added calculation of the accessible surface area using Shrake-Rupley algorithm (modified #4025).
  • Added calculation of relative accessible surface area.

PR Checklist

  • [x] Tests?
  • [x] Docs?
  • [x] CHANGELOG updated?
  • [x] Issue raised/referenced?

Developers certificate of origin


📚 Documentation preview 📚: https://mdanalysis--4417.org.readthedocs.build/en/4417/

JureCerar avatar Jan 08 '24 20:01 JureCerar

Hello @JureCerar! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:

Comment last updated at 2024-07-09 22:52:39 UTC

pep8speaks avatar Jan 08 '24 20:01 pep8speaks

Linter Bot Results:

Hi @JureCerar! Thanks for making this PR. We linted your code and found the following:

Some issues were found with the formatting of your code.

Code Location Outcome
main package ⚠️ Possible failure
testsuite ⚠️ Possible failure

Please have a look at the darker-main-code and darker-test-code steps here for more details: https://github.com/MDAnalysis/mdanalysis/actions/runs/9865281726/job/27241908471


Please note: The black linter is purely informational, you can safely ignore these outcomes if there are no flake8 failures!

github-actions[bot] avatar Jan 08 '24 21:01 github-actions[bot]

Codecov Report

Attention: Patch coverage is 96.68874% with 5 lines in your changes missing coverage. Please review.

Project coverage is 93.63%. Comparing base (cfda8b7) to head (66b3fb7).

Files Patch % Lines
package/MDAnalysis/analysis/sasa.py 96.68% 0 Missing and 5 partials :warning:
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #4417      +/-   ##
===========================================
+ Coverage    93.61%   93.63%   +0.02%     
===========================================
  Files          171      172       +1     
  Lines        21243    21394     +151     
  Branches      3934     3970      +36     
===========================================
+ Hits         19886    20032     +146     
  Misses         898      898              
- Partials       459      464       +5     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Jan 08 '24 21:01 codecov[bot]

It is unfortunate that the MDAKit was not more widely publicized, leading to duplication of effort. @JureCerar, would it make sense for you to contribute the calculation of relative accessible surface area to the MDAKit (assuming it has not yet been implemented)?

RMeli avatar Jan 20 '24 21:01 RMeli

@orbeckst I used and modified the main surface calculation code. As far as I know, the code from #4025 is copied from BioPython/SASA which is under BSD 3 license. Only _get_sphere and _single_frame are based of BioPython’s implementation. I changed the code it to fit MDAnalysis AnalysisBase class, added some tweaks, input values checks, and comments. Everything else is my own code: Relative SASA, tests, documentation, etc.

JureCerar avatar Feb 17 '24 21:02 JureCerar

Thank you for the details. BSD 3 would be ok.

Have you compared output and performance of the code here to mdakit-sasa?

What does your code and mdakit-sasa have in common, where do they differ?

orbeckst avatar Feb 19 '24 06:02 orbeckst

I checked the code. The main difference is mdakit-sasa is a wrapper for FreeSASA package. So the underlying algorithm is different. FreeSASA uses Lee-Richards algorithm where as this code uses Shrake-Rupley algorithm.

Performance wise I did not test it. But I figure FreeSASA (mdkit-sasa) is faster, as it's implemented in C? It's hard to make head-to-head comparison as the algorithm is different. This implementation finishes a 10 frame trajectory of a ~400 residue protein in about a minute or two, which I think is a reasonable speed. In any case, precision can be lowered if speed is needed.

Output wise, the result (i.e. area) is the same regardless of the method or package used.

Here it's also implemented the Relative Surface Area calculation which is a very useful to have when calculating protein surface properties. I guess it could also be implemented in mdkit-sasa?

Just as a side note. I similarly tried writing a wrapper for BioPython/SASA but it was very messy and I could not get it to work properly without writing a lot of temporary files.

JureCerar avatar Feb 19 '24 19:02 JureCerar

Hi All.

As mention by @JureCerar mda_kit wraps the implementation FreeSASA in the BaseAnalysis class, and this kit is very simple as all the heavy lifting is done by FreeSASA:

Regarding performance, perforce is heavily driven by parametrisation, in the case of the Shanke-Rupley the number of points of the spheres are a main parameter if you use Gromacs SASA calculation the default parameters use very few points, FreeSASA have a nThread implementation builtin, but the kit do not implement parallelisation over multiple frames at the moment.

The reason for switching the PR to a kit initially was to separate FreeSASA dependency from core. Let me knot if there is something I can help with.

Regards.

pegerto avatar Feb 19 '24 20:02 pegerto

Thanks @pegerto and @JureCerar ! Some of the developers are currently discussing how to best move forward. We'll keep you updated. Thank you for your patience!

orbeckst avatar Feb 21 '24 18:02 orbeckst

kit do not implement parallelisation over multiple frames at the moment.

This might be very easy once we merge PR #4162 .

orbeckst avatar Feb 21 '24 18:02 orbeckst