guess_TopologyAttrs guesses incorrect bonds for DMS lipids in MDAnalysis > 2.7.0
Expected Behavior
When using guess_TopologyAttrs to generate bonds for DSM lipids, the bonds around atom C3S (or any C*S) should be guessed correctly. For example:
("C3S", "C4S"), ("C3S", "C2S"), ("C3S", "H3S"), ("C3S", "H3T")
Actual Behavior
In MDAnalysis versions > 2.7.0, guess_TopologyAttrs produces incorrect extra bonds, e.g.:
[('C3S', 'C5S'), ('C3S', 'H4T'), ('C3S', 'H4S'), ('C3S', 'C4S'), ('C3S', 'HO3'), ('C3S', 'O3'), ('C3S', 'H3S'), ('C3S', 'H2S'), ('C3S', 'C2S'), ('C3S', 'NF'), ('C3S', 'C1S')]
This only happens for the lipid tail with atoms labeled CS. The other tail with CT behaves as expected.
Code to Reproduce import MDAnalysis as mda
Load structure with DMS lipid (minimal example)
u = mda.Universe("lipid_dms.pdb") structure = u.select_atoms("resname DSM")
Guess bonds
structure.guess_TopologyAttrs(to_guess=["bonds"], fudge_factor=0.5)
for b in structure.bonds: print(b)
Version Information
MDAnalysis: 2.7.0 → works as expected
MDAnalysis: > 2.7.0 → produces incorrect bonds
Python: 3.12
OS:Linux
Additional Notes
It seems that atom names like C3S may be misinterpreted as the element cesium (Cs), which has a much larger van der Waals radius, leading to spurious bonds.
@lilyminium do you know if this issue is related to the switch-over to the new guesser system?
Quite possibly. @ricard1997 could you please provide the PDB you're using?
He I attach the file. It is a .gro file and I have changed the extension to .csv for github allow me to upload. Also, in my statement I did an error, the lipid with problem is DSM not DMS.
I appreciate that this is an annoying case of C*S being converted to CS instead of C - but honestly I think the answer here is that the PDB file should have had elements assigned to it, not that we should have attempted to correctly guess the difference between CS and C in an unclear atom name.
@ricard1997 , which program and force field generated your file?
Do you have a topology file (e.g., PSF or TPR) that you could use?