`Constraints` collection is not treated when writing GROMACS topology
Description
The issue is in a way related to #1166. At first, I was creating openmm System object with constraints set to HBonds. In the resulting lipid topology, I got not bonds to hydrogens and no constraints section. All those hydrogens just flew away from the lipid in the MD run. This is related to what was discussed in #1005 and #1008, but I am happy to take some action for GROMACS API.
I believe, this is because Constraints collection is not treated at all.
I think there are several solutions, which I can work on. They work on different levels, and it would be great to get your suggestion of what is an optimal choice, before implementing something.
- First solution, is reading in constraints on bonds to hydrogens as bonds. This will require checking the masses or chemical element of an atom involved in the constraint. The problem here, that some force constant (and distance) should be assigned to the bond and this is not a straightforward thing to implement, maybe.
- Other option, is writing bonds to hydrogens as bonds, when exporting interchange with GROMACS. At this level, the problem with bond parameter assignment is still present, but can be at least partially solved by (a) letting the user to define those parameters, (b) informing the user, that bonds to hydrogen must be constrained upstream (in the mdp used for runs)
- Maybe the easiest choice is to explicitly write down the constraint into the GROMACS topology. This is maybe my preferred option, though it also comes with some downsides (I believe that constraints in GROMACS can only be perturbed on CPUs currently). So let me know if you would like to try out some solutions, and I am happy to help with it.
Reproduction
from openmm.app import Modeller, PME, HBonds, ForceField, PDBFile
from openmm import NonbondedForce
from openmm.unit import nanometer
from openff.interchange import Interchange
for lipid, name in zip(["popc", "pope", "dppc"], ["POP", "POP", "DPP"]):
print(lipid)
pdb = PDBFile("memb_protein_trimer.pdb")
m = Modeller(pdb.topology, pdb.positions)
ff = ForceField("amber14-all.xml", "amber14/tip3pfb.xml")
m.addHydrogens(ff)
attempts_left = 5
while attempts_left > 0:
try:
m.addMembrane(ff, lipidType=lipid.upper())
break
except Exception as e:
print(e)
attempts_left -= 1
print(len(list(m.topology.residues())))
names = []
for i, r in enumerate(m.topology.residues()):
if r.name not in names:
names.append(r.name)
if r.name == name:
print(i)
found = i
break
print(names)
m.delete(list(m.topology.residues())[:found])
m.delete(list(m.topology.residues())[1:])
system = ff.createSystem(
m.topology,
nonbondedMethod=PME,
nonbondedCutoff=1.0 * nanometer,
constraints = HBonds,
)
PDBFile.writeFile(
m.topology, m.positions, open(f"test_memb_{lipid}.pdb", "w")
)
for force in system.getForces():
if isinstance(force, NonbondedForce):
n_p = force.getNumParticles()
print(n_p)
charges = [
force.getParticleParameters(i)[0]._value for i in range(n_p)
]
print(charges)
print(sum(charges))
i = Interchange.from_openmm(
system, m.topology, m.positions, system.getDefaultPeriodicBoxVectors()
)
off_ch = [
x.parameters["charge"].m
for x in i.collections["Electrostatics"].potentials.values()
]
print(off_ch)
print(sum(off_ch))
i.to_gromacs(lipid, _merge_atom_types=True)
break
# i.to_pdb(f"{lipid}.pdb")
# i.topology.molecule(0).to_file(f"{lipid}.sdf")
Several things going on here, I want to make sure I'm not missing any facts before we think about next steps forward
I believe, this is because
Constraintscollection is not treated at all.
If I understand GROMACS files correctly, constraints like the <Constraint smirks="[#1:1]-[*:2]" id="c1"></Constraint> parameter in Sage are usually treated as runtime options and live in the .mdp files instead of .top files. There's a special case for water and some confusing options for free energy calculations but I'm not sure this section is meant to be used for "hydrogen bond" constraints between hydrogens and their bonded heavy atoms. I looked through $GMXDATA/top for some examples but only found special cases, like rigid methyl and ammonium groups, which doesn't seem to be what we want here. Maybe this is as simple as it looks and we can guess that func 1 does what we need.
As you've figured out by now, problem usually lies in what input data is provided. The cases blur together in my memory, but some that come to mind which are related include
- Being given constrained bonds but not being given the distance at which they're meant to be constrained
- Ambiguity around if constrained (topological) bonds should have associated (physical) bonds, i.e. harmonic bond parameters
- Needing to specify force constants for constraints, even though constraints by definition do not use harmonic potentials
I'm not sure which is the root cause in your script without being able to run it, but I bet at least one of this list is affecting you here.
Before looking more closely at what does happen, I think what should happen is
- If constrained hydrogens are not (topologically) bonded when importing from external sources, something should throw an exceptionally loud error somewhere in the pipeline. It wouldn't make sense to have an ethanol molecule with six hydrogens and two bonds so this state should be considered invalid.
- If bonds are (topologically) constrained and don't have (physical) interactions, either
a. it should be okay to write GROMACS files[^1] but the user should either be warned or need to opt-in to this case
b. a
[ constraints ]section[^2] should be written within each[ moleculetype ], where appropriate, erroring when equilibrium bond forces are not known - Filling in force constants with fake values should only happen when absolutely necessary, i.e. the linked case in which this (obviously bad) behavior is accepted practice in Amber tooling
[^1]: I'm pretty sure this is valid, just writing out lines within a the [ bonds ] section that list the bonded atom indices but don't list the equilibrium bond length and force constant
[^2]: Again, assuming this is the correct usage of this directive
and some confusing options for free energy calculations but I'm not sure this section is meant to be used for "hydrogen bond" constraints between hydrogens and their bonded heavy atoms.
This is something that I can definitely test, but if I recall correctly, any constraints can go there and by default they all should be treated with LINKS (in principle there is no difference between bonds to hydrogens and other bond types). In .mdp you indeed can say that you want to treat all the bonds to hydrogens (or all bonds) as constraints, but there is no reason, why you can't add hand-crafted constraints.
Being given constrained bonds but not being given the distance at which they're meant to be constrained
I believe in this case, an error should be raised, when Collection is generated.
I'm not sure which is the root cause in your script without being able to run it, but I bet at least one of this list is affecting you here.
Initially hydrogens are topologically bonded, but when system is created:
system = ff.createSystem(
m.topology,
nonbondedMethod=PME,
nonbondedCutoff=1.0 * nanometer,
constraints = HBonds,
)
Bonds to hydrogens are removed from Forces in openMM and are stored as Constraints. This is why, when interchange is created:
i = Interchange.from_openmm(
system, m.topology, m.positions, system.getDefaultPeriodicBoxVectors()
)
bonds to hydrogens end up in the constraints collection.
If constrained hydrogens are not (topologically) bonded when importing from external sources, something should throw an exceptionally loud error somewhere in the pipeline. It wouldn't make sense to have an ethanol molecule with six hydrogens and two bonds so this state should be considered invalid.
I am not sure that in my case, I want an error to be raised. Yes, we don't see any bonds in the original OpenMM system, but in the topology we should see them.
I checked, number of bonds in the topology (133) and number of harmonic bond force in the system (51):
# N bonds in topology
print(m.topology.getNumBonds()) # 133
# N bonded forces in system
for force in system.getForces():
if isinstance(force, HarmonicBondForce):
n_b = force.getNumBonds()
print(n_b) # 51
The obtained values, are ones, that I expect - the difference is exactly the number of hydrogens in a single lipid. So I consider those topologically bonded, but forces not present, since hydrogens are constrained. So I assume, that I am in your case 2. And I think that the proposed solution should be:
- Check if constraints are present as bonds in the topology. If not - raise an error.
I believe, this should only apply to bonds to hydrogens.
- Write
constraintssection for GROMACS
I can test if it works with bonds to hydrogens
Okay, I think writing [ constraints ] in the GROMACS files is a good path to explore. This would be a fairly significant behavior change - compared to the current recommendation of just preparing the .mdp file in a particular manner - but I think it could be worth it.
Independently of that change, could we work around your original issue by
-
not passing constraints through to OpenMM
.createSystem(..., constraints=None) - using
constraints = h-bondsin the.mdpfile?
I figure if you're not actually running the simulations in OpenMM, it's more important to preserve all information in the OpenMM step and optimize for GROMACS runtime.
I wrote up a more discrete feature request here: https://github.com/openforcefield/openff-interchange/issues/1180
Independently of that change, could we work around your original issue by
- not passing constraints through to OpenMM .createSystem(..., constraints=None)
- using constraints = h-bonds in the .mdp file?
This is exactly what I ended up doing (see #1166), but I can imagine a usecase, where one would run some MD in OpenMM with explicit constraints, and then would want to use the exact same system to run a different simulation in GROMACS (the example here can be running simple MD first, and then running some protocol that is readily available in GROMACS e.g. AWH, but not in OpenMM). In this case, the error I encountered can occur.