`_simple_topology_from_openmm` fails with `RecursionError` on large topologies
I am trying the experimental Interchange.from_openmm() feature and I noticed that topology creation for multi-chain proteins would fail withRecursionError: maximum recursion depth exceeded:
# using this two chain protein: https://files.rcsb.org/view/8U7W.pdb
from openmm.app import PDBFile
# this is called by the Interchange.from_openmm method
from openff.interchange.components.toolkit import _simple_topology_from_openmm
pdb = PDBFile("8u7w.pdb")
_simple_topology_from_openmm(pdb.topology)
The pdb above contains waters and ligand, but even after stripping them the same error manifests, so long as multiple protein chains are present.
Tracing down the errors, it ultimately is raised at _add_molecule_keep_cache function in openff.toolkit. Let me know if I should make the issue there instead.
Thank you.
This is a totally valid failure; I've run into it many times before but apparently never documented it. I haven't been able to figure out if the issue is here or in the toolkit, but until I can isolate the root cause let's keep it here.
Thanks, let me know how it goes. Just wondering in the meanwhile do you have any recommendations on converting such openmm systems into e.g. prmtops instead of via interchange?
I have some ideas - could we redirect to https://github.com/orgs/openforcefield/discussions/35?
I've found what I believe to be the root cause of this issue - Topology.add_molecule relies on deepcopying. Molecule.__deepcopy__ has custom behavior, but _SimpleMoleucle does not, and instead falls back to the extremely careful copy.deepcopy. This will require significant overhauls to the toolkit's behavior, so I'll work on it over there but provide updates here when appropriate.
This might be fixed by https://github.com/openforcefield/openff-toolkit/pull/1805
Hey @hjuinj - think we've fixed this!
In [1]: # using this two chain protein: https://files.rcsb.org/view/8U7W.pdb
...: !wget https://files.rcsb.org/view/8U7W.pdb
...:
...: from openmm.app import PDBFile
...:
...: # this is called by the Interchange.from_openmm method
...: from openff.interchange.components.toolkit import _simple_topology_from_openmm
...:
...: pdb = PDBFile("8u7w.pdb")
--2024-04-04 13:02:42-- https://files.rcsb.org/view/8U7W.pdb
Resolving files.rcsb.org (files.rcsb.org)... 128.6.159.100
Connecting to files.rcsb.org (files.rcsb.org)|128.6.159.100|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/plain]
Saving to: ‘8U7W.pdb’
8U7W.pdb [ <=> ] 738.18K 2.62MB/s in 0.3s
2024-04-04 13:02:42 (2.62 MB/s) - ‘8U7W.pdb’ saved [755892]
In [2]: %%timeit
...: _simple_topology_from_openmm(pdb.topology)
...:
...:
958 ms ± 9.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Sorry for the delay - but thanks for the incredibly straightforward MWE, this allowed me to almost instantly drop in and retry this.
thanks so much @mattwthompson, how do I pull in this fix? do I just need to upgrade to the latest openff-toolkit release? or also need to upgrade interchange too?
I think upgrading to toolkit 0.15.0+ would do it, but upgrading Interchange (0.3.25) might bring in bugfixes or useful new features depending on what you're doing
https://docs.openforcefield.org/projects/toolkit/en/stable/releasehistory.html#id3 https://docs.openforcefield.org/projects/interchange/en/stable/releasehistory.html#current-development
@mattwthompson
Thanks again, I really would like to use this as a way to create an Interchange object from OpenMM, but for me this effectively halts:
from openmm.app import *
from openmm import *
from openmm.unit import *
import os
os.environ["INTERCHANGE_EXPERIMENTAL"] = "1"
from sys import stdout
pdb = PDBFile('tmp.pdb') # from openmm example https://raw.githubusercontent.com/openmm/openmm/master/examples/input.pdb
forcefield = ForceField('amber14-all.xml', 'amber14/tip3pfb.xml')
system = forcefield.createSystem(pdb.topology, nonbondedMethod=PME, nonbondedCutoff=1*nanometer, constraints=HBonds)
# Create an Interchange object
out = Interchange.from_openmm(
topology=pdb.topology,
system=system,
positions=pdb.positions,
box_vectors=pdb.topology.getPeriodicBoxVectors(),
)
Am I not using it correctly or is there still some issues?
I'm not sure what's causing your code to hang - a modified version below takes me only a few seconds to run. Maybe could you run this script and share the results?
https://github.com/openforcefield/status/blob/main/devtools/support/debug.sh
In [1]: !wget https://raw.githubusercontent.com/openmm/openmm/master/examples/input.pdb
--2024-04-10 15:22:10-- https://raw.githubusercontent.com/openmm/openmm/master/examples/input.pdb
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.110.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 594292 (580K) [text/plain]
Saving to: ‘input.pdb’
input.pdb 100%[===========================================================================>] 580.36K --.-KB/s in 0.09s
2024-04-10 15:22:10 (6.21 MB/s) - ‘input.pdb’ saved [594292/594292]
In [2]: import openmm.app
...: import openmm.unit
...:
...: import os
...:
...: from openff.interchange import Interchange
...:
...: os.environ["INTERCHANGE_EXPERIMENTAL"] = "1"
...:
...: pdb_file = openmm.app.PDBFile("input.pdb")
...: system = openmm.app.ForceField("amber14-all.xml", "amber14/tip3pfb.xml").createSystem(
...: pdb_file.topology,
...: nonbondedMethod=openmm.app.PME,
...: nonbondedCutoff=1 * openmm.unit.nanometer,
...: constraints=openmm.app.HBonds,
...: )
...:
...: out = Interchange.from_openmm(
...: topology=pdb_file.topology,
...: system=system,
...: positions=pdb_file.positions,
...: box_vectors=pdb_file.topology.getPeriodicBoxVectors(),
...: )
In [3]: %%timeit
...: Interchange.from_openmm(
...: topology=pdb_file.topology,
...: system=system,
...: positions=pdb_file.positions,
...: box_vectors=pdb_file.topology.getPeriodicBoxVectors(),
...: )
...:
5.51 s ± 172 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
ah @mattwthompson I am so sorry, I mistakenly thought the interchange from openmm conversion is the time blocker, but in fact, is the subsequent file coversion that I didn't include in my above code that grinds the program to a halt, specifically:
out.to_prmtop("./tmp.prmtop")
Converting to e.g. amber format seems to be problematic with the halting and if I try instead out.to_top("tmp.top") I get
MissingBondError: Failed to find parameters for bond with topology indices (0, 1)
In that case the problem is just that the OpenMM object is missing bond parameters that the GROMACS (and Amber) files need. Because constraints=openmm.app.HBonds was used, OpenMM chooses not to populate bond parameters of constrained atom pairs. Taking that argument out (and ensuring later GROMACS/Amber/etc. files do use whatever constraints are sought!) might work.
Something similar has been reported with a different use case (https://github.com/openforcefield/openff-toolkit/issues/1824) and in general this is similar to problems we have with missing information in water models (https://github.com/openforcefield/openff-interchange/issues/732 https://github.com/openforcefield/openff-interchange/issues/843).
The Amber writer might just be slow for large systems, that's something I unfortunately haven't had the time to work on optimizing. For a few thousand atoms it should work, if slowly
Depending on the protein force field you want to use (and if you have other components) you may be better off using our port of ff14SB? We have an example using it with an OpenFF ligand in case you haven't come across this: https://docs.openforcefield.org/en/latest/examples/openforcefield/openff-interchange/protein_ligand/protein_ligand.html#protein-component
Thanks @mattwthompson, the above code snippet passes the protein parametersation when I do constraints=None, however, it subsequently fails because water bonds are not tracked. Is there a straightforward way for me to turn the OpenMM example input.pdb above which contains protein + water into an .gro and .top file via interchange?
Not right now, I don't think, if the input has missing bonds. I think it's probably worth making a special case out of this, adding in bonds when something really looks like water. I'll take mess around with a stripped-down test case and update here if I run into any surprises. Thanks for your patience and feedback!
With #965 (soon to be in 0.3.26), I can load that PDB file through. It makes some guesses about what the parameters should be and the bond and angle parameters are pretty arbitrary, so it might take some after-running-a-script editing of the GROMACS files or other hacking for the water model to be accurately represented.
There are still some small energy differences - could be due to water bonds being considered where they shouldn't be, not sure. Please be careful using this code path since there's a lot of sharp edges, still.
import openmm.app
import openmm.unit
import os
from openff.interchange import Interchange
from openff.interchange.drivers import get_openmm_energies, get_gromacs_energies
os.environ["INTERCHANGE_EXPERIMENTAL"] = "1"
pdb_file = openmm.app.PDBFile("input.pdb")
system = openmm.app.ForceField("amber14-all.xml", "amber14/tip3pfb.xml").createSystem(
pdb_file.topology,
nonbondedMethod=openmm.app.PME,
nonbondedCutoff=1 * openmm.unit.nanometer,
constraints=None,
)
out = Interchange.from_openmm(
topology=pdb_file.topology,
system=system,
positions=pdb_file.positions,
box_vectors=pdb_file.topology.getPeriodicBoxVectors(),
)
print(get_gromacs_energies(out))
print(get_openmm_energies(out))
Bond: 535.8565673828125 kilojoule / mole
Angle: 1261.9296875 kilojoule / mole
Torsion: 1812.1295166015625 kilojoule / mole
RBTorsion: 0.0 kilojoule / mole
Nonbonded: None
vdW: 20140.49871826172 kilojoule / mole
Electrostatics: -138738.99938964844 kilojoule / mole
Energies:
Bond: 542.2653182463949 kilojoule / mole
Angle: 1261.687059590436 kilojoule / mole
Torsion: 1896.524260454296 kilojoule / mole
RBTorsion: None
Nonbonded: -118597.39807930728 kilojoule / mole
vdW: None
Electrostatics: None
I think the original issue has been resolved, please feel free to bring other issues to our attention as they arise/persist