sasascore module
You are about to submit a new Pull Request. Before continuing make sure you read the contributing guidelines and that you comply with the following criteria:
- [x] You have sticked to Python. Please talk to us before adding other programming languages to HADDOCK3
- [ ] Your PR is about CNS
- [x] Your code is well documented: proper docstrings and explanatory comments for those tricky parts
- [x] You structured the code into small functions as much as possible. You can use classes if there is a (state) purpose
- [x] Your code follows our coding style
- [x] You wrote tests for the new code
- [x]
toxtests pass. Runtoxcommand inside the repository folder - [x]
-test.cfgexamples execute without errors. Insideexamples/runpython run_tests.py -b - [x] PR does not add any dependencies, unless permission granted by the HADDOCK team
- [x] PR does not break licensing
- [ ] Your PR is about writing documentation for already existing code :fire:
- [ ] Your PR is about writing tests for already existing code :godmode:
Closes #861 by creating a sasascore module, which allows to score PDB files against existing accessibility information.
As an example, if some glycosylation sites on chain A (say residues 40 and 50) are known to be preserved upon complex formation, a penalty can be added if they are buried in the resulting model. At the same time, if some residues are known to be buried in the complex, we can impose a penalty if they are accessible.
An example application of the module:
[sasascore]
resdic_accessible_A = [40,50]
residic_buried_B = [22,23,24]
This will create a sasascore.tsv file, analogous to the other scoring tsv files.
structure original_name md5 score
cluster_1_model_1.pdb emref_1.pdb None 2
cluster_1_model_2.pdb emref_2.pdb None 2
cluster_2_model_1.pdb emref_4.pdb None 5
cluster_3_model_1.pdb emref_3.pdb None 5
cluster_2_model_2.pdb emref_5.pdb None 5
Here the score is the number of times the input information has not be satisfied (the lower the better). A file named violations.tsv is also produced, with a detailed picture of the violations:
structure bur_A acc_B
cluster_1_model_1.pdb 40 22
cluster_1_model_2.pdb 50 23
cluster_2_model_1.pdb 40,50 22,23,24
cluster_3_model_1.pdb 40,50 22,23,24
cluster_2_model_2.pdb 40,50 22,23,24