Added pdb_selb to filter by B-factor values
Hi,
I was working on some Alphafold predictions and needed a tool to filter with B-factor and saw #163. So I have written a script called pdb_selb to filter atoms by their B-factor values.
However, as the signs < and > may interfere with shell redirect commands, I have added an option to select the operator that should be used instead of directly writing the operation and the threshold value as the same option.
Also, I have added tests and documentation for pdp_selb. But, as this is my first time contributing to this project please let me know if you have any feedback or improvement ideas!
PS: One more question/comment: the selection should act on a residue basis, meaning by that that full residues should be kept/removed and not only a few atoms per residue. Not an issue for pLDDT, but the B-factors are atom-specific.
I have implemented a new option called filtering_mode to select if the mean (used Python's statistics.fmean instead of sum()/len() for better precision), minimum, or maximum B-factor of a residue should be used to filter residues.
While testing, I noticed that the code fails if there are nonconsecutive records for the same residue in a PDB file. The current code assumes the records of a residue should be consecutive and groups consecutive records with the same chain and residue ID to filter it.
We could mitigate this issue by sorting the PDB file using pdb_sort before filtering. Or, as this would also sort already sorted files, we could keep a list of the already processed chain and residue IDs, and if a non-processed record has the same IDs, we could throw an error and then direct the user to sort their PDB file using pdb_sort.
This PR is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 15 days.
This PR was closed because it has been stalled for 15 days with no activity.