How to keep parity on the residue & chain name assignment behaviour with v0.11 breaking change?
With release 0.11.0 the Topology class was changed.
https://docs.openforcefield.org/projects/toolkit/en/stable/releasehistory.html#atom-metadata-and-hierarchy-schemes-for-iterating-over-residues-chains-etc
In particular if I called off_mol.to_topology().to_openmm() prior to this version,
the toolkit will automatically try to find a residue name in the mol_to_residues dictionary and if not found, assign off_mol.name to be the residue name. Similar automation is done for chain assignment.
Beyond 0.11.0, the residue name is set to 'UNK' and chain id to 'X' https://github.com/openforcefield/openff-toolkit/blob/575501a2d736c176e8a88c387d8afa261a0c1f94/openff/toolkit/topology/topology.py#L1841-L1864
What is the best way to keep parity with this behaviour beyond version 0.11? Do I have to iterate through each atom in the off_mol and update the atom metadata field? Or is there a more elegant way? Thank you
If I understand correctly, you're trying to ensure the openmm.app.Topology has residues named according to the various Molecule.names? Mapping molecule names (which don't have standardized meanings) to residue names (which certainly do) is something of a transformation that arguably shouldn't have been done previously. I'm not sure if logic like below could be considered
if "residue_name" in atom.metadata:
atom_residue_name = atom.metadata["residue_name"]
else:
- atom_residue_name = "UNK"
+ atom_residue_name = "UNK" if atom.molecule.name is not None else atom.moleucle.name
Do I have to iterate through each atom ...
This is the only way I know how to do it (which isn't to say there isn't another way!) since residue information is stored on the atoms themselves. I previously found this to be less-than-ideal, but I forget if we discussed a different way of updating residue information.
The main author of this functionality (@j-wags) is out until December; I'd like his feedback before modifying any of this behavior. For now I think you're stuck iterating through the atoms, but in the future it might be smoother.
Thank you for the prompt reply Matt, this sounds good, it will be great if this can be improved in the future.
Thanks for your patience!
I think the documentation could be improved, as tracked in the linked issue. Two features I'd like to get public 👍/👎 on from Jeff when he returns are
- Setting residues (or other hierarchy information) from the
Topology/MoleculeAPIs - Falling back to
Molecule.name, if present and if atoms do not have residue names, in the above OpenMM conversion(s).
If either of these are good ideas, let's open separate tickets for them.
@mattwthompson is right - The only way to do it currently is to iterate over all atoms and update their metadata.
Setting residues (or other hierarchy information) from the Topology/Molecule APIs
👍 Good idea, I'd accept a PR with these changes
Falling back to Molecule.name, if present and if atoms do not have residue names, in the above OpenMM conversion(s).
👎 I agree that this was an unnecessary behavior change in 0.11, which I wish I'd caught. But now this is a behavior that users who DID promptly update to 0.11 (or who started using OFFTK since then) have built around for 16 months, and to revert it after all this time would effectively be a second behavior change. Considered as a feature request against the current behavior, the benefit of this change is ambiguous. Most molecule names are more than 3 characters, which leads to confusion when writing PDBs or other formats that expect 3-character resnames. And the automagic "sometimes we use the data from here, but other times we use the data from there" pattern is a common cause of user confusion/bugs in workflows that involve OFFTK. So overall I don't think I'd welcome this change, though maybe I'm overlooking something.