haddock3 icon indicating copy to clipboard operation
haddock3 copied to clipboard

sasascore module

Open mgiulini opened this issue 1 year ago • 0 comments

You are about to submit a new Pull Request. Before continuing make sure you read the contributing guidelines and that you comply with the following criteria:

  • [x] You have sticked to Python. Please talk to us before adding other programming languages to HADDOCK3
  • [ ] Your PR is about CNS
  • [x] Your code is well documented: proper docstrings and explanatory comments for those tricky parts
  • [x] You structured the code into small functions as much as possible. You can use classes if there is a (state) purpose
  • [x] Your code follows our coding style
  • [x] You wrote tests for the new code
  • [x] tox tests pass. Run tox command inside the repository folder
  • [x] -test.cfg examples execute without errors. Inside examples/ run python run_tests.py -b
  • [x] PR does not add any dependencies, unless permission granted by the HADDOCK team
  • [x] PR does not break licensing
  • [ ] Your PR is about writing documentation for already existing code :fire:
  • [ ] Your PR is about writing tests for already existing code :godmode:

Closes #861 by creating a sasascore module, which allows to score PDB files against existing accessibility information.

As an example, if some glycosylation sites on chain A (say residues 40 and 50) are known to be preserved upon complex formation, a penalty can be added if they are buried in the resulting model. At the same time, if some residues are known to be buried in the complex, we can impose a penalty if they are accessible.

An example application of the module:

[sasascore]
resdic_accessible_A = [40,50]
residic_buried_B = [22,23,24]

This will create a sasascore.tsv file, analogous to the other scoring tsv files.

structure       original_name   md5     score
cluster_1_model_1.pdb   emref_1.pdb     None    2
cluster_1_model_2.pdb   emref_2.pdb     None    2
cluster_2_model_1.pdb   emref_4.pdb     None    5
cluster_3_model_1.pdb   emref_3.pdb     None    5
cluster_2_model_2.pdb   emref_5.pdb     None    5

Here the score is the number of times the input information has not be satisfied (the lower the better). A file named violations.tsv is also produced, with a detailed picture of the violations:

structure             bur_A acc_B
cluster_1_model_1.pdb 40 22
cluster_1_model_2.pdb 50 23 
cluster_2_model_1.pdb 40,50 22,23,24
cluster_3_model_1.pdb 40,50 22,23,24  
cluster_2_model_2.pdb 40,50 22,23,24

mgiulini avatar Apr 16 '24 08:04 mgiulini