sire icon indicating copy to clipboard operation
sire copied to clipboard

[BUG] mol.move().align() fails to align two proteins

Open akalpokas opened this issue 1 year ago • 2 comments

Describe the bug This might more of a feature request rather than a bug report per se, since I don't think mol.move().align() was orignally designed to work with proteins.

I am trying to align two proteins that both have distinct configurations. Sequence wise, these proteins are nearly identical except for residue ID 9 between them. I have computed the mapping between these two proteins (mutant to wild-type) and want to align the wild-type protein to the mutant one (using the inverse mapping), so that I can make a perturbable protein with the mutant conformation. However, when I try to use BioSimSpace.Align.rmsdAlign() function (which wraps around sire.mol.move().align()) or sire.mol.move().align() directly, the resulting structure that I get is just the wild-type protein, without any alignment performed. The function also executes without any errors.

If I try to use BioSimSpace.Align.flexAlign() function (which uses fkcombu) with the same mapping, the two proteins can be aligned properly, however this takes a really long time to compute.

I suspect that even in the case where alignment is being done between two very similar conformations, the residues between the two proteins will not be aligned properly. I believe this should be possible to be fixed by looping over each residue in the target protein and aligning them with the reference structure. I could also circumnavigate this issue by extracting the residues of interest, aligning them individually and updating the coordinates of the target (wild-type) residue so that during the merge part the coordinates between the hybrid residues won't be an issue.

To Reproduce Steps to reproduce the behavior:

  1. Extract the provided inputs.tar.gz file
  2. Run the script align.py via python

Expected behavior Alignment between two proteins in such a way that the saved output files (aligned_wt_rmsd_align.pdb/mol1_aligned.pdb) have the conformation of the mutant protein (frame_0.gro).

Input files inputs.tar.gz

Environment information

  • OS: Linux, Ubuntu 22.04.4 LTS
  • Version of Python: 3.12.3
  • Version of sire: 2024.1.0.dev
  • I confirm that I have checked this bug still exists in the latest released version of sire: yes

akalpokas avatar Apr 28 '24 16:04 akalpokas

The sire RMSD alignment just does rigid-body translation and rotations, so I assume that this is failing when the mapping is too large. It seems to work for me if I align just based on the sub-mapping for the region of interest, not the full one. (This is similar to what you suggest, i.e. just aligning the two residues then shifting everything else based on the translation and rotation vectors, but is probably easier.)

lohedges avatar Apr 29 '24 09:04 lohedges

I have been able to temporarily circumnavigate the issue by extracting all of the residues from both proteins, aligning them individually and then using the updated coordinates to update the coordinates of the input protein. The alignment isn't ideal, but it does the trick for now.

akalpokas avatar Apr 29 '24 09:04 akalpokas

Closing as per-residue-alignment code has been added to BioSimSpace

akalpokas avatar Jun 21 '24 15:06 akalpokas