Provide a ChemistryFixer for molecule prep assistance
(I thought we had an issue for this already, but I don't see one.)
There are two types of assistance we want to provide for dealing with chemistry of input molecules:
- Sanity checks that the chemistry makes sense (e.g. see #61 ) which will be applied to all inputs, even those prepared by experts/with expert tools
- Chemistry preparation assistance to attempt to ensure chemistry is probably correct
See #61 for some related discussion, and also this comment and following.
Possible usage might look like:
python
from openforcefield.typing.helpers import ChemistryFixer
fixed_topology = ChemistryFixer(topology)
...
Part of the reasoning here is that experts/workflows (e.g. Orion) will often want to be certain that molecules coming into ForceField are not being modified by ForceField, as they are already assumed to be correct. So the default preparation needs to be only sanity checks. But many users (e.g. see #61 ) need more than this, such as some attempt to take a molecule which may not be correct or at least many not be completely specified (such as having missing bond orders) and "make it so", hence a need for ChemistryFixer.
This sounds really useful, should we make a list of things that would need to be included in the ChemistryFixer? I'm thinking at least:
- Charging
- Aromaticity perception
- Perceive bond orders?
Should it all be included in one step or broken into methods something like:
fixer = ChemistryFixer(topology)
fixer.perceive_aromaticity()
fixer.assign_partial_charges()
new_topology = fixer.topology
@j-wags @andrrizzi I suggest this be on the roadmap for the near future.
Basically, pharma partners will often come in with molecules well prepared and not want our tools interfering with them, but academics may be on the opposite extreme (wrong protonation states, nonsensical valence, etc.). We need to ensure we have clear paths both to (a) clean up chemistry BEFORE bringing it in when desired, and (b) to just proceed with exactly the input chemistry, depending on what users want.
This may be naive, but right now I'm thinking that these sanity checks could be encapsulated in the new Molecule and Topology classes. We already do some sanity checks there such as stereochemistry that could be extended.
We may want to split this into separate issues, one for each sanity check to be implemented.
@andrrizzi note we're separately dealing with "sanity checks" (https://github.com/openforcefield/openforcefield/issues/61) which should be INTERNAL to the molecule/topology classes, and "chemistry fixes" which should be OPTIONAL. Everyone should get sanity checks, but only some users will want "cleanup".
In other words, "sanity checks" are for warning users. "Cleanup" is to try to fix problematic chemistry, and is potentially a longer-term issue.
I see. Currently, the stereochemistry sanity checks can be optionally disabled with a keyword argument. We could have the same design for the cleanup, only, if we want by default to be disabled, the enabling keyword arguments would have the opposite default.
Basically, pharma partners will often come in with molecules well prepared and not want our tools interfering with them, but academics may be on the opposite extreme (wrong protonation states, nonsensical valence, etc.). We need to ensure we have clear paths both to (a) clean up chemistry BEFORE bringing it in when desired, and (b) to just proceed with exactly the input chemistry, depending on what users want.
@davidlmobley : I think it's most critical for us to get a concrete idea of what specific input formats / sources users will be bringing molecules in from. The more we know about what kinds of data to expect, the better we can be prepared to fix it.
The referenced thread specifically highlighted a two issues:
- The need for helpful sanity checks that return useful error messages about errors in chemistry and how to fix it
- A mechanism for fixing common problems: The one highlighted was waters molecules that contain H-H bonds. I agree with @andrrizzi that we could have a mode (e.g.
cleanup=True) that could clean up clear, unambiguous problems like this. The more cases we know about, the better.
Other issues that will be common:
- Someone tries to bring in a PDB file (a) without CONECT entries, (b) with erroneous CONECT entries, (c) without protons at all, and/or (d) without CONECT entries and with connectivity that cannot be inferred from the coordinates in a way which makes chemical sense
- Someone tries to bring in a mol2 or SDF file with only a 2D geometry (rather than 3D), potentially without protons
- Someone brings in a mol2 file with incorrect bond orders
- Someone loads a file (mol2, sdf, PDB) with incorrect valence
For the record, I see "Sanity checks" and a "Chemistry fixer" as two separate issues, so I think we want something like https://github.com/openforcefield/openforcefield/issues/61 implemented eventually as well.