Issues with Gromacs Input Files
Hello,
I'm running BSS version 2022.1.0, 8.g3f6b1c18 on linux, installed using conda. I'm attempting to run a short simulation with the gromacs input files supplied here. All relevant inputs/ notebooks are here.
I've run into a few issues:
-
Failure of getSystem(block=True). During equilibration steps (see bdr4_equilibration.ipynb) this command sometimes fails the first time it is run. However, it runs correctly the second time (without repeating the simulation). The system was also unexpectedly unstable (LINCS warnings) and exploded during "PMEMD NPT equilibration for 2ns without restraints" (Fatal error: 3 particles communicated to PME rank 1 are more than 2/3 times the cut-off out of the domain decomposition cell of their charge group in dimension x. This usually means that your system is not well equilibrated.) . I had similar issues when using sander. The input was the ligand 1 complex for brd4.
-
Single point energy calculation works with Gromacs but fails with Amber (see gh_issue.ipynb). Error reads: "EXTRA POINTS: nnb too small!". Input was the ligand 1 complex for jnk1.
-
BSS fails to read Amber inputs generated with ParmEd (see bottom of gh_issue.ipynb). I converted the jnk1 input (ligand 1 complex) from gro/top to parm7/rst7 according to the discussion here. However, upon attempting to reload the parm7/rst files, I get
[OSError: Failed to read molecules from ['jnk1_system.parm7', 'jnk1_system.rst7']. It looks like you failed to include a topology file.]()
Thanks very much.
Hi there,
For part of question 1): I'm not sure why this happens but I've recently fixed this (last) week by simply re-trying the trjconv command if it happens to fail the first time. (Weirdly the failures only ever happen within getSystem() when calling a subprocesses, i.e. using the exact same commands on the command-line works every time!).
I should be able to check the other issues at some point tomorrow.
Cheers.
Hi,
Thanks for the quick response. Great - I'll update to the latest version.
Cheers
Out of interest, is there are reason why you need to use ParmEd to convert the GROMACS files, rather than using BioSImSpace directly, i.e.:
import BioSImSpace as BSS
# Load GROMACS system.
system = BSS.IO.readMolecules(["jnk1_complex.gro", "jnk1_complex.top"])
# Write to AMBER format.
BSS.IO.saveMolecules("test", system, ["prm7", "rst7"])
(This is what would be done behind the scenes if you loaded GROMACS files and tried to run a simulation with AMBER.)
This works, but the resulting topology file looks quite different to the one generated by ParmEd. A single-point calculation would test whether the information is self-consistent, despite the differences.
The original issue I was having was instability during equilibration using sander through BSS and @jmichel80 mentioned that there had been issues previously with file conversion. To rule this out, I wanted to repeat the file conversion using ParmEd and re-run the equilibration to see if I got the same issues. Unfortunately I've not been able to run a single-point calculation using the BSS-converted files (see first part of see gh_issue.ipynb), or to load in the files produced by ParmEd to run a single-point calculation on those.
Hi again,
I'm really not sure what's going on with those GROMACS files provided in the repository you linked to. Out of interest, I looked at their example notebook which shows how the files are generated. If I reproduce their code, but instead write to AMBER format files rather than GROMACS, then everything works, i.e. I can load back the files with BioSimSpace and I get the same results when performing single-point energy calculations with AMBER and GROMACS.
For reference, they are just parameterising the protein using FF14SB and "openff_unconstrained-2.0.0" for the ligand. It isn't possible to do this directly in BioSimSpace since the protein PDB contains missing atom types and the SDF file is required in order to provide the required stereochemistry information for the ligand.
Here's my script. (You'll just need to clone the abfe-benchmark repo to your working directory.)
import BioSimSpace as BSS
import parmed
from openff.toolkit.topology import Molecule
from openff.toolkit.typing.engines.smirnoff import ForceField as OFF_ForceField
try:
from openmm import XmlSerializer, app, unit
from openmm.app import HBonds, NoCutoff, PDBFile, ForceField
except ImportError:
from simtk import unit
from simtk.openmm import XmlSerializer, app
from simtk.openmm.app import HBonds, NoCutoff, PDBFile
# Load in the ligand.
ligand_molecule = Molecule("abfe-benchmark/structures/jnk1/ligand1.sdf")
# Specify the "Sage" forcefield.
force_field = OFF_ForceField("openff_unconstrained-2.0.0.offxml")
# Parametrize the ligand molecule by creating a Topology object from it.
ligand_system = force_field.create_openmm_system(ligand_molecule.to_topology())
# Read in the coordinates of the ligand from the PDB file.
ligand_pdbfile = PDBFile("abfe-benchmark/structures/jnk1/ligand1.pdb")
# Convert the ligand system to a ParmEd object.
ligand_parmed_structure = parmed.openmm.load_topology(ligand_pdbfile.topology,
ligand_system,
ligand_pdbfile.positions)
# Write the ligand structure to AMBER format.
ligand_parmed_structure.save("ligand1.rst7", overwrite=True)
ligand_parmed_structure.save("ligand1.parm7", overwrite=True)
# Parse the protein PDB file.
protein_pdbfile = PDBFile("abfe-benchmark/structures/jnk1/protein.pdb")
# Load the AMBER protein force field through OpenMM.
omm_forcefield = app.ForceField("amber14/protein.ff14SB.xml", "amber14/tip3p.xml")
# Parameterize the protein.
protein_system = omm_forcefield.createSystem(protein_pdbfile.topology,
nonbondedCutoff=1*unit.nanometer,
nonbondedMethod=app.NoCutoff,
constraints=None,
rigidWater=False)
# Convert the protein System into a ParmEd Structure.
protein_parmed_structure = parmed.openmm.load_topology(
protein_pdbfile.topology,
protein_system,
xyz=protein_pdbfile.positions)
# Write the protein structure to AMBER format.
protein_parmed_structure.save("protein.rst7", overwrite=True)
protein_parmed_structure.save("protein.parm7", overwrite=True)
# Load the ligand and protein with BioSimSpace.
ligand = BSS.IO.readMolecules("ligand1.*7")[0]
protein = BSS.IO.readMolecules("protein.*7")[0]
# Combine to form a complex.
complx = (ligand + protein).toSystem()
# Create a single-step minimisation protocol.
protocol = BSS.Protocol.Minimisation(steps=1)
# Create processes for AMBER and GROMACS simulations.
process_amb = BSS.Process.Amber(complx, protocol)
process_gmx = BSS.Process.Gromacs(complx, protocol)
# Modify the GROMACS config to perform zero steps.
config = process_gmx.getConfig()
config[1] = "nsteps = 0"
process_gmx.setConfig(config)
# Run the processes.
process_amb.start()
process_amb.wait()
process_gmx.start()
process_gmx.wait()
# Compare energies. (Where direct comparisons can be made.)
print(f"Bond energies... AMBER = {process_amb.getBondEnergy()}, GROMACS = {process_gmx.getBondEnergy()}")
print(f"Angle energies... AMBER = {process_amb.getAngleEnergy()}, GROMACS = {process_gmx.getAngleEnergy()}")
print(f"Dihedral energies... AMBER = {process_amb.getDihedralEnergy()}, GROMACS = {process_gmx.getDihedralEnergy()}")
This outputs:
Bond energies... AMBER = 202.0904 kcal/mol, GROMACS = 202.0662 kcal/mol
Angle energies... AMBER = 1111.9015 kcal/mol, GROMACS = 1111.8045 kcal/mol
Dihedral energies... AMBER = 4336.2379 kcal/mol, GROMACS = 4315.6788 kcal/mol
I'm not yet sure if the issue with the GROMACS files is our ability to read them, or whether they were invalid in the first place. I'll try to reproduce the above writing to GROMACS format, i.e. using the output of the script, rather what was uploaded to the benchmark repository.
Cheers.
I've just re-run the above script, instead using ParmEd to write to GROMACS format. As you've found, on conversion to AMBER format within BioSimSpace, any AMBER simulation fails with the EXTRA POINTS: nnb too small!" message. While the GROMACS simulations work, the single point energies are different (but close) to those that I obtained above using the the AMBER files written by ParmEd. The output was as follows:
Bond energies... AMBER = None, GROMACS = 241.7304 kcal/mol
Angle energies... AMBER = None, GROMACS = 1124.9594 kcal/mol
Dihedral energies... AMBER = None, GROMACS = 4337.7390 kcal/mol
Given the inconsistencies, I'm not sure whether the issue is with ParmEd or us.
That's very helpful @lohedges ! @fjclark could you check whether the AMBER prm files produced by @lohedges 's script work with SOMD ? We can feedback to Bayer next week the above issues.
Thanks very much @lohedges. @jmichel80 will do.
@jmichel80 I can confirm that the AMBER files produced by @lohedges's script work with SOMD. However, test alchemical simulations crash when I use the same config file I'm using for my current work, giving "NaN or Inf has been generated along the simulation" during minimisation. This turned out to be due to the use of "constraint = allbonds". Do you have any suggestions as to why this is happening, given that the system appeared well equilibrated (I have used an equilibration scheme almost identical to the one which produced input which works with this config file).
All relevant input is here. I have:
-Produced AMBER format files for brd4, lig1 complex using @lohedges 's script (this failed at first due to this issue - despite having compatible versions of ParmEd and OpenMM in my environment, an incompatible version of ParmEd was bundled with AmberTools, which worked after manually modifying decorators.py).
-Equilibrated using pmemd and pmemd.cuda - see input.
-Run a short production run using SOMD through BSS - successful
-Run single alchemical windows using my standard config file with (failure) and without (success) constraint set to "allbonds".
Thanks
This is odd, constraints are disabled before energy minimisation. See https://github.com/michellab/Sire/blob/devel/wrapper/Tools/OpenMMMD.py#L1507-L1520
Also your single point energy before minimisation is identical before minimisation between the two runs, so I expect the same input was passed to integrator.minimiseEnergy()
Is the error reproducible ?
You could try disabling minimisation, that step may not be necessary for an ABFE run started from a structure equilibrated at discharge = 0.0
I've repeated the calculations, only adding or removing "set_constraints = allbonds" to the config. Adding this consistently causes the simulations to crash during minimisation. I'm running my modified version of SOMD with Boresch restraints, but I've repeated this with the latest version of sire as contained within BSS-dev - I observe the same thing.
When disabling minimisation, I get:
2 NaN or Inf has been generated along the simulation
3 Starting /home/finlayclark/sire_boresch_2022.app/bin/somd-freenrg: number of threads equals 20
4 srun: error: ishikawa: task 0: Exited with exit code 255
Very similar to with minimisation, only no output related to setting up the system is displayed.
Out of interest, do the original GROMACS files work with SOMD? (For example, if you just perform minimisation and equilibration with GROMACS rather than PMEMD.) It would be interesting to know if you experience the same constraint related crashes with those, too.
Could you setup from scratch the input files with BioSimSpace (starting from pdb files extracted from the pmemd equilibration and solvating in BioSimSpace) to verify if SOMD runs without errors in that case ? If so comparing the prm7 files should indicate a discrepancy in the handling of bonded terms.
@lohedges I'm not able to equilibrate the original GROMACS files in BSS using GROMACS - I get failures during my final NPT equilibration. Based on this, it seems likely that the issue is with the original files.
Could you setup from scratch the input files with BioSimSpace (starting from pdb files extracted from the pmemd equilibration and solvating in BioSimSpace.
I tried this myself, but the original PDB files provided in the repository contained unknown atom types, so parametrisation with ff14sb failed. (Perhaps it would work with PDBs extracted from the AMBER files, but that would rely on the AMBER files being correct in the first place, which we're currently unsure about.) I also couldn't parameterise the ligand within BioSimSpace since the SDF was required for stereochemistry. However, the parametrisation of the ligand should be identical in either case. (Same sequence of commands used.) The protein might differ though, since they are using OpenMM to parameterise, rather than AmberTools.
I'm not able to equilibrate the original GROMACS files in BSS using GROMACS - I get failures during my final NPT equilibration. Based on this, it seems likely that the issue is with the original files.
Thanks for confirming.
Cheers.
@jmichel80, it looks like constraints aren't disabled before energy minimisation in runFreeNrg(), only run() https://github.com/michellab/Sire/blob/6ff6b04d56a3b76fb18f7ba2317557d15cd41cb6/wrapper/Tools/OpenMMMD.py#L1686-L1695 .
However, even when I modify the my version of Sire to disable constraints before minimisation in runFreeNrg(), I still get the same failure during minimisation:
CONSTRAINT OFF
Starting /home/finlayclark/sire_no_constraint_minim.app/bin/somd-freenrg: number of threads equals 20
Traceback (most recent call last):
File "/home/finlayclark/sire_no_constraint_minim.app/share/Sire/scripts/somd-freenrg.py", line 146, in <module>
OpenMMMD.runFreeNrg(params)
File "/home/finlayclark/sire_no_constraint_minim.app/lib/python3.7/site-packages/Sire/Tools/__init__.py", line 176, in inner
retval = func()
File "/home/finlayclark/sire_no_constraint_minim.app/lib/python3.7/site-packages/Sire/Tools/OpenMMMD.py", line 1696, in runFreeNrg
system = integrator.minimiseEnergy(system, minimise_tol.val, minimise_max_iter.val)
RuntimeError: Particle coordinate is nan
srun: error: ishikawa: task 0: Exited with exit code 1
To follow up on this. The code to disable constraints in run() or runFreeNrg() is buggy. Changing constraint parameters via e.g. integrator.setConstraintType("none") does not cause the openMM system to be reinitialised because Integrator_OpenMM.initialise() is not subsequently called so no effect.
It is likely that the crash during energy minimisation is due to the an incompatibility between the way bonded parameters are defined in the parameter file "openff_unconstrained-2.0.0.offxml" here
and code to setup constraints/bonded terms for the solute in SOMD
https://github.com/michellab/Sire/blob/devel/corelib/src/libs/SireMove/openmmfrenergyst.cpp#L1802-L1814 vs https://github.com/michellab/Sire/blob/devel/corelib/src/libs/SireMove/openmmfrenergyst.cpp#L1868-L1905
Hello @lohedges ,
As Bayer does not have AMBER, I've been asked to check the consistency between revised input files: brd4_inputs.tar.gz
The AMBER inputs produce stable MD with sander (10 ps) and pmemd.cuda (1 ns) for both the complexes. The GROMACS input for the ligand 1 complex also seems to produce stable MD (currently at 0.5 ns).
However, when I attempt to load any of the GROMACS inputs into BSS (2022.1.0+5.g1a606e31, but I've also tried with 2022.2.1), I get the following error:
---------------------------------------------------------------------------
UserWarning Traceback (most recent call last)
File ~/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/BioSimSpace/IO/_io.py:377, in readMolecules(files, property_map)
376 try:
--> 377 system = _SireIO.MoleculeParser.read(files, property_map)
378 except Exception as e:
UserWarning: Exception 'SireMol::missing_atom' thrown by the thread 'master'.
There is no atom with the number "2" in the layout "{00000000-0000-0000-0000-000000000000}".
Thrown from FILE: /usr/share/miniconda/envs/sire_build/conda-bld/sire_1645533092052/work/corelib/src/libs/SireMol/moleculeinfodata.cpp, LINE: 3320, FUNCTION: virtual QList<SireMol::AtomIdx> SireMol::MoleculeInfoData::map(SireMol::AtomNum) const
__Backtrace__
( 0) /home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/Qt/../../../.././libSireError.so.2022 ([0x7fd2a6e1ae42] ++0x42)
-- SireError::getBackTrace()
( 1) /home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/Qt/../../../.././libSireError.so.2022 ([0x7fd2a6e18dbe] ++0xae)
-- SireError::exception::exception(QString, QString)
/home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/Mol/../../../../libSireMol.so.2022(+0x199c8e) [0x7fd2603e1c8e]
/home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/Mol/../../../../libSireMol.so.2022(+0x145104) [0x7fd26038d104]
( 4) /home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/Mol/../../../../libSireMol.so.2022 ([0x7fd2603f89cc] ++0x3c)
-- SireMol::AtomNum::map(SireMol::MolInfo const&) const
( 5) /home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/Mol/../../../../libSireMol.so.2022 ([0x7fd260559851] ++0x31)
-- SireMol::MoleculeInfoData::atomIdx(SireMol::AtomID const&) const
( 6) /home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/Mol/../../../../libSireMol.so.2022 ([0x7fd260552706] ++0x26)
-- SireMol::MoleculeInfo::atomIdx(SireMol::AtomID const&) const
( 7) /home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/IO/../../../../libSireIO.so.2022 ([0x7fd1fabd6ce7] ++0x247)
-- SireIO::GroTop::getBondProperties(SireMol::MoleculeInfo const&, SireIO::GroMolType const&) const
/home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/IO/../../../../libSireIO.so.2022(+0x2205cb) [0x7fd1fabd75cb]
/home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/IO/../../../../libSireIO.so.2022(+0x21f262) [0x7fd1fabd6262]
/home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/Base/../../../.././libtbb.so.12(+0x16092) [0x7fd29fb1c092]
( 11) /home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/IO/../../../../libSireIO.so.2022 ([0x7fd1fabd8156] ++0xb66)
-- SireIO::GroTop::createMolecule(QString, QStringList&, SireBase::PropertyMap const&) const
/home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/IO/../../../../libSireIO.so.2022(+0x221373) [0x7fd1fabd8373]
/home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/IO/../../../../libSireIO.so.2022(+0x22178d) [0x7fd1fabd878d]
/home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/Base/../../../.././libtbb.so.12(+0x23302) [0x7fd29fb29302]
/home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/Base/../../../.././libtbb.so.12(+0x1e794) [0x7fd29fb24794]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7fd2cc2d6609]
( 17) /lib/x86_64-linux-gnu/libc.so.6 ([0x7fd2cc095163] ++0x43)
-- clone
__EndTrace__
Exception 'SireMol::missing_atom' thrown by the thread 'master'.
There is no atom with the number "2" in the layout "{00000000-0000-0000-0000-000000000000}".
Thrown from FILE: /usr/share/miniconda/envs/sire_build/conda-bld/sire_1645533092052/work/corelib/src/libs/SireMol/moleculeinfodata.cpp, LINE: 3320, FUNCTION: virtual QList<SireMol::AtomIdx> SireMol::MoleculeInfoData::map(SireMol::AtomNum) const
The above exception was the direct cause of the following exception:
OSError Traceback (most recent call last)
/home/finlayclark/Documents/research/bayer_benchmark/testing_brd4_joe/brd4/test_input.ipynb Cell 17 in <module>
----> [1](vscode-notebook-cell://ssh-remote%2Bishikawa.chem.ed.ac.uk/home/finlayclark/Documents/research/bayer_benchmark/testing_brd4_joe/brd4/test_input.ipynb#W3sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0) a = BSS.IO.readMolecules(["ligand1/complex.top", f"ligand1/complex.gro"])
File ~/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/BioSimSpace/IO/_io.py:397, in readMolecules(files, property_map)
395 msg = "Failed to read molecules from: %s" % files
396 if _isVerbose():
--> 397 raise IOError(msg) from e
398 else:
399 raise IOError(msg) from None
OSError: Failed to read molecules from: ['ligand1/complex.top', 'ligand1/complex.gro']
The missing atom number changes if I run the command repeatedly.
When I check the single-point energies for the ligand 1 complex for AMBER, I get:
Bond energies... AMBER = 1357.79 kJ/mol
Angle energies... AMBER = 1763.71 kJ/mol
Dihedral energies... AMBER = 6153.36 kJ/mol
For GROMACS, (bypassing BSS), I get the same angle and dihedral energies, but a very different bond energy:
Bond 354.189 kJ/mol
Angle 1763.71 kJ/mol
Dih. (proper+ improper) 6153.37 kJ/mol
Do you know what might be causing the above error and discrepancies?
Thanks very much.
Thanks for this, I should be able to take a look later. The atom number might change because the parsing is done in parallel, so the order might not be reproducible. (Although things are sorted afterwards.)
I think the issue is that the atom names in the *.gro files don't correspond to those from state A in the topology, so the Sire parser isn't finding an appropriate match. For example, here's a merge of ethane and methanol written to file by BSS. (State A = ethane, state B = methanol.)
gro
BioSimSpace System
8
1LIG C 1 1.349 1.431 1.352
1LIG C 2 1.502 1.428 1.361
1LIG H 3 1.311 1.369 1.434
1LIG H 4 1.304 1.386 1.263
1LIG H 5 1.307 1.530 1.367
1LIG H 6 1.553 1.335 1.387
1LIG H 7 1.521 1.495 1.445
1LIG H 8 1.549 1.469 1.271
0.00000 0.00000 0.00000
top
...
[ atoms ]
; nr type0 resnr residue atom cgnr charge0 mass0 type1 charge1 mass1
1 c3 1 LIG C 1 -0.094350 12.010000 oh -0.598800 16.000000
2 c3 1 LIG C 2 -0.094350 12.010000 c3 0.116700 12.010000
3 hc 1 LIG H 3 0.031450 1.008000 ho 0.396000 1.008000
4 hc 1 LIG H 4 0.031450 1.008000 hc_du 0.000000 1.008000
5 hc 1 LIG H 5 0.031450 1.008000 hc_du 0.000000 1.008000
6 hc 1 LIG H 6 0.031450 1.008000 h1 0.028700 1.008000
7 hc 1 LIG H 7 0.031450 1.008000 h1 0.028700 1.008000
8 hc 1 LIG H 8 0.031450 1.008000 h1 0.028700 1.008000
...
Now look at any of the examples from the archive you uploaded, e.g. ligand1/ligand.{gro/top}:
gro
GROningen MAchine for Chemical Simulation
3530
1Ligan C1x 1 2.990 1.896 0.172
1Ligan C2x 2 2.520 1.512 0.139
1Ligan C3x 3 2.568 1.493 -0.169
1Ligan C4x 4 2.923 1.312 -0.546
1Ligan C5x 5 3.065 1.112 -0.469
1Ligan C6x 6 2.854 1.174 -0.351
1Ligan O1x 7 3.211 1.386 -0.471
1Ligan Cl1x 8 2.482 1.848 -0.677
...
top
...
[ atoms ]
; nr type resnr residue atom cgnr charge mass typeB chargeB massB
; residue 1 Ligand 2 rtp Ligand 2 q 0.0
1 C1 1 Ligand 2 C1x 1 -0.12445362 12.010780 ; qtot -0.124454
2 C1 1 Ligand 2 C2x 2 -0.02585362 12.010780 ; qtot -0.150307
3 C1 1 Ligand 2 C3x 3 -0.04085362 12.010780 ; qtot -0.191161
4 C1 1 Ligand 2 C4x 4 -0.11282062 12.010780 ; qtot -0.303981
5 C1 1 Ligand 2 C5x 5 -0.11282062 12.010780 ; qtot -0.416802
6 C1 1 Ligand 2 C6x 6 -0.11282062 12.010780 ; qtot -0.529623
7 O1 1 Ligand 2 O1x 7 -0.53605362 15.999430 ; qtot -1.065676
8 Cl1 1 Ligand 2 Cl1x 8 -0.08845362 35.453200 ; qtot -1.154130
...
None of the dummy types, i.e. *x are present in the atomtypes section either, e.g.:
[ atomtypes ]
; name at.num mass charge ptype sigma epsilon
C1 6 12.010780 0.00000000 A 0.33795318 0.45538912
O1 8 15.999430 0.00000000 A 0.30398122 0.87950233
Cl1 17 35.453200 0.00000000 A 0.33075278 1.1112708
C2 6 12.010780 0.00000000 A 0.34806469 0.36350306
N1 7 14.006720 0.00000000 A 0.3206876 0.7016213
O2 8 15.999430 0.00000000 A 0.30251065 0.70485815
S1 16 32.065500 0.00000000 A 0.35635949 1.046
H1 1 1.007947 0.00000000 A 0.26445434 0.066021356
H2 1 1.007947 0.00000000 A 0.25725815 0.06531786
H3 1 1.007947 0.00000000 A 0.25832257 0.068656285
O3 8 15.999430 0.00000000 A 0.31507524 0.635968
H4 1 1.007947 0.00000000 A 1 0
Na1 11 22.989769 0.00000000 A 0.33283976 0.01158968
Cl2 17 35.453200 0.00000000 A 0.44010397 0.4184
I'll see if I can modify the files for consistency in order to get them to load.
(For reference, I'm not sure if GROMACS cares about anything in the gro file, since it likely just matches things up by number. For our parsers, we have a few self-consistency checks when combining files to create a system, since the coordinates need not come from a GROMACS file at all, i.e. we need some way to be able to map entries in one file to the information in the topology, and line-by-line doesn't always work.)
I just modified the name of one of the atoms in the ethane-methanol gro file and it does give an error, although different and more informative to what you are seeing:
BioSimSpace System
8
1LIG C1 1 1.349 1.431 1.352
1LIG C 2 1.502 1.428 1.361
1LIG H 3 1.311 1.369 1.434
1LIG H 4 1.304 1.386 1.263
1LIG H 5 1.307 1.530 1.367
1LIG H 6 1.553 1.335 1.387
1LIG H 7 1.521 1.495 1.445
1LIG H 8 1.549 1.469 1.271
0.00000 0.00000 0.00000
UserWarning: SireError::incompatible_error: Incompatibility between the files, could not match all atoms to the corresponding topology file. Check atom and residue names! (call sire.error.get_last_error_details() for more info)
It looks like this issue would trip you up, but isn't the source of the exact error that you are seeing.
It may be failing because of the weird formatting of the state B information, e.g.:
1 C1 1 Ligand 2 C1x 1 -0.12445362 12.010780 ; qtot -0.124454
I imagine that our parser isn't just ignoring stuff beyond the semi-colon, rather expecting valid information related to state B. Having fixed the names in the gro file, I'll try reading it as a single state file to see if that works.
Oh, and just to note that we currently don't have read support for GROMACS FEP files, we can only write to them. The information would only ever be read from the information corresponding to state A.
Okay, I've fixed it. The GROMACS files are formatted in a way which can't be parsed by Sire due to the inconsistent naming between gro and top and the use of whitespace in molecule type and residue names. For example the atomtypes line has a Ligand 2 entry for the residue, which is being broken into two records. In the corresponding gro file, the residue is named Ligan due to the space constraints.
In order to get things to read I needed to:
- Update all of the atom names in the gro file to match the atom names in state A from the
atomtypessection forLigand 2. - Replace all instances of
Ligand 2in the top file withLiganto avoid splitting on whitespace and to match the residue name in the gro file.
Here is an archive with the (hopefully) fixed GROMACS files for ligand1. This will allow you to load state A.
I'm not sure how these input files were originally generated, but the files look like they are pushing the limits of the GROMACS formatting and naming conventions. I didn't think that GROMACS used fixed width, so am surprised that things like Ligand 2 aren't causing issues elsewhere.
Cheers.
Brilliant, thank you very much for sorting this out! I'll send Bayer a link to this issue.
No problem. If you could check that files loaded after modification are sane, that would be great.
Could we also capture in this issue how the input files were generated by Bayer ?
The files seem to be sane. Comparing the single-point energies of ligand1 from your fixed input and the AMBER input gives:
Bond energies... AMBER = 12.1355 kcal/mol, GROMACS = 12.9934 kcal/mol Angle energies... AMBER = 107.9002 kcal/mol, GROMACS = 107.9861 kcal/mol Dihedral energies... AMBER = 19.3379 kcal/mol, GROMACS = 19.3600 kcal/mol
Although when I convert the fixed GROMACS inputs to AMBER, process_amber.getBondEnergy() gives None for some reason:
lig_gmx = BSS.IO.readMolecules([f"fixed_input/fixed_ligand1/ligand.gro", f"fixed_input/fixed_ligand1/ligand.top"])
# Create a single-step minimisation protocol.
protocol = BSS.Protocol.Minimisation(steps=1)
# Create processes for AMBER and GROMACS simulations.
process_gmx = BSS.Process.Gromacs(lig_gmx, protocol)
process_amb = BSS.Process.Amber(lig_gmx, protocol)
# Modify the GROMACS config to perform zero steps.
config = process_gmx.getConfig()
config[1] = "nsteps = 0"
process_gmx.setConfig(config)
# Run the processes.
process_amb.start()
process_amb.wait()
process_gmx.start()
process_gmx.wait()
# Compare energies. (Where direct comparisons can be made.)
print(f"Bond energies... AMBER = {process_amb.getBondEnergy()}, GROMACS = {process_gmx.getBondEnergy()}")
print(f"Angle energies... AMBER = {process_amb.getAngleEnergy()}, GROMACS = {process_gmx.getAngleEnergy()}")
print(f"Dihedral energies... AMBER = {process_amb.getDihedralEnergy()}, GROMACS = {process_gmx.getDihedralEnergy()}")
Outputs
Bond energies... AMBER = None, GROMACS = 12.9934 kcal/mol
Angle energies... AMBER = 107.9843 kcal/mol, GROMACS = 107.9861 kcal/mol
Dihedral energies... AMBER = 19.3600 kcal/mol, GROMACS = 19.3600 kcal/mol
@jmichel80 I'm afraid I don't yet have any more information on how the inputs were generated beyond what was in Joseph's email - that this is an altered workflow which is yet to be pushed to github.
@lohedges , I was also wondering about comparing bond energies. In your comment above (https://github.com/michellab/BioSimSpace/issues/289#issuecomment-1082868601) you use these when comparing single-point energies. However, when I convert one of these systems (ligand 1 complex) loaded from the same (AMBER) input to GROMACS, I find that the bond energy changes by over 100 kcal / mol (although the angle and dihedral energies are unaffected).
Bond energies... AMBER = 324.5188 kcal/mol, GROMACS = 84.6532 kcal/mol
Angle energies... AMBER = 421.5360 kcal/mol, GROMACS = 421.5368 kcal/mol
Dihedral energies... AMBER = 1470.6892 kcal/mol, GROMACS = 1464.7347 kcal/mol
Is this unexpected or does this just depend on the details of the interconversion?
Thanks.