BioSimSpace icon indicating copy to clipboard operation
BioSimSpace copied to clipboard

Issues with Gromacs Input Files

Open fjclark opened this issue 3 years ago • 41 comments

Hello,

I'm running BSS version 2022.1.0, 8.g3f6b1c18 on linux, installed using conda. I'm attempting to run a short simulation with the gromacs input files supplied here. All relevant inputs/ notebooks are here.

I've run into a few issues:

  1. Failure of getSystem(block=True). During equilibration steps (see bdr4_equilibration.ipynb) this command sometimes fails the first time it is run. However, it runs correctly the second time (without repeating the simulation). The system was also unexpectedly unstable (LINCS warnings) and exploded during "PMEMD NPT equilibration for 2ns without restraints" (Fatal error: 3 particles communicated to PME rank 1 are more than 2/3 times the cut-off out of the domain decomposition cell of their charge group in dimension x. This usually means that your system is not well equilibrated.) . I had similar issues when using sander. The input was the ligand 1 complex for brd4.

  2. Single point energy calculation works with Gromacs but fails with Amber (see gh_issue.ipynb). Error reads: "EXTRA POINTS: nnb too small!". Input was the ligand 1 complex for jnk1.

  3. BSS fails to read Amber inputs generated with ParmEd (see bottom of gh_issue.ipynb). I converted the jnk1 input (ligand 1 complex) from gro/top to parm7/rst7 according to the discussion here. However, upon attempting to reload the parm7/rst files, I get [OSError: Failed to read molecules from ['jnk1_system.parm7', 'jnk1_system.rst7']. It looks like you failed to include a topology file.]()

Thanks very much.

fjclark avatar Mar 29 '22 16:03 fjclark

Hi there,

For part of question 1): I'm not sure why this happens but I've recently fixed this (last) week by simply re-trying the trjconv command if it happens to fail the first time. (Weirdly the failures only ever happen within getSystem() when calling a subprocesses, i.e. using the exact same commands on the command-line works every time!).

I should be able to check the other issues at some point tomorrow.

Cheers.

lohedges avatar Mar 29 '22 17:03 lohedges

Hi,

Thanks for the quick response. Great - I'll update to the latest version.

Cheers

fjclark avatar Mar 29 '22 17:03 fjclark

Out of interest, is there are reason why you need to use ParmEd to convert the GROMACS files, rather than using BioSImSpace directly, i.e.:

import BioSImSpace as BSS

# Load GROMACS system.
system = BSS.IO.readMolecules(["jnk1_complex.gro", "jnk1_complex.top"])

# Write to AMBER format.
BSS.IO.saveMolecules("test", system, ["prm7", "rst7"])

(This is what would be done behind the scenes if you loaded GROMACS files and tried to run a simulation with AMBER.)

This works, but the resulting topology file looks quite different to the one generated by ParmEd. A single-point calculation would test whether the information is self-consistent, despite the differences.

lohedges avatar Mar 29 '22 17:03 lohedges

The original issue I was having was instability during equilibration using sander through BSS and @jmichel80 mentioned that there had been issues previously with file conversion. To rule this out, I wanted to repeat the file conversion using ParmEd and re-run the equilibration to see if I got the same issues. Unfortunately I've not been able to run a single-point calculation using the BSS-converted files (see first part of see gh_issue.ipynb), or to load in the files produced by ParmEd to run a single-point calculation on those.

fjclark avatar Mar 29 '22 17:03 fjclark

Hi again,

I'm really not sure what's going on with those GROMACS files provided in the repository you linked to. Out of interest, I looked at their example notebook which shows how the files are generated. If I reproduce their code, but instead write to AMBER format files rather than GROMACS, then everything works, i.e. I can load back the files with BioSimSpace and I get the same results when performing single-point energy calculations with AMBER and GROMACS.

For reference, they are just parameterising the protein using FF14SB and "openff_unconstrained-2.0.0" for the ligand. It isn't possible to do this directly in BioSimSpace since the protein PDB contains missing atom types and the SDF file is required in order to provide the required stereochemistry information for the ligand.

Here's my script. (You'll just need to clone the abfe-benchmark repo to your working directory.)

import BioSimSpace as BSS

import parmed

from openff.toolkit.topology import Molecule
from openff.toolkit.typing.engines.smirnoff import ForceField as OFF_ForceField

try:
    from openmm import XmlSerializer, app, unit
    from openmm.app import HBonds, NoCutoff, PDBFile, ForceField
except ImportError:
    from simtk import unit
    from simtk.openmm import XmlSerializer, app
    from simtk.openmm.app import HBonds, NoCutoff, PDBFile

# Load in the ligand.
ligand_molecule = Molecule("abfe-benchmark/structures/jnk1/ligand1.sdf")

# Specify the "Sage" forcefield.
force_field = OFF_ForceField("openff_unconstrained-2.0.0.offxml")

# Parametrize the ligand molecule by creating a Topology object from it.
ligand_system = force_field.create_openmm_system(ligand_molecule.to_topology())

# Read in the coordinates of the ligand from the PDB file.
ligand_pdbfile = PDBFile("abfe-benchmark/structures/jnk1/ligand1.pdb")

# Convert the ligand system to a ParmEd object.
ligand_parmed_structure = parmed.openmm.load_topology(ligand_pdbfile.topology,
											          ligand_system,
											          ligand_pdbfile.positions)

# Write the ligand structure to AMBER format.
ligand_parmed_structure.save("ligand1.rst7", overwrite=True)
ligand_parmed_structure.save("ligand1.parm7", overwrite=True)

# Parse the protein PDB file.
protein_pdbfile = PDBFile("abfe-benchmark/structures/jnk1/protein.pdb")

# Load the AMBER protein force field through OpenMM.
omm_forcefield = app.ForceField("amber14/protein.ff14SB.xml", "amber14/tip3p.xml")

# Parameterize the protein.
protein_system = omm_forcefield.createSystem(protein_pdbfile.topology,
                                             nonbondedCutoff=1*unit.nanometer,
                                             nonbondedMethod=app.NoCutoff,
                                             constraints=None,
                                             rigidWater=False)

# Convert the protein System into a ParmEd Structure.
protein_parmed_structure = parmed.openmm.load_topology(
                                        protein_pdbfile.topology,
                                        protein_system,
                                        xyz=protein_pdbfile.positions)

# Write the protein structure to AMBER format.
protein_parmed_structure.save("protein.rst7", overwrite=True)
protein_parmed_structure.save("protein.parm7", overwrite=True)

# Load the ligand and protein with BioSimSpace.
ligand = BSS.IO.readMolecules("ligand1.*7")[0]
protein = BSS.IO.readMolecules("protein.*7")[0]

# Combine to form a complex.
complx = (ligand + protein).toSystem()

# Create a single-step minimisation protocol.
protocol = BSS.Protocol.Minimisation(steps=1)

# Create processes for AMBER and GROMACS simulations.
process_amb = BSS.Process.Amber(complx, protocol)
process_gmx = BSS.Process.Gromacs(complx, protocol)

# Modify the GROMACS config to perform zero steps.
config = process_gmx.getConfig()
config[1] = "nsteps = 0"
process_gmx.setConfig(config)

# Run the processes.
process_amb.start()
process_amb.wait()
process_gmx.start()
process_gmx.wait()

# Compare energies. (Where direct comparisons can be made.)
print(f"Bond energies... AMBER = {process_amb.getBondEnergy()}, GROMACS = {process_gmx.getBondEnergy()}")
print(f"Angle energies... AMBER = {process_amb.getAngleEnergy()}, GROMACS = {process_gmx.getAngleEnergy()}")
print(f"Dihedral energies... AMBER = {process_amb.getDihedralEnergy()}, GROMACS = {process_gmx.getDihedralEnergy()}")

This outputs:

Bond energies... AMBER = 202.0904 kcal/mol, GROMACS = 202.0662 kcal/mol
Angle energies... AMBER = 1111.9015 kcal/mol, GROMACS = 1111.8045 kcal/mol
Dihedral energies... AMBER = 4336.2379 kcal/mol, GROMACS = 4315.6788 kcal/mol

I'm not yet sure if the issue with the GROMACS files is our ability to read them, or whether they were invalid in the first place. I'll try to reproduce the above writing to GROMACS format, i.e. using the output of the script, rather what was uploaded to the benchmark repository.

Cheers.

lohedges avatar Mar 30 '22 09:03 lohedges

I've just re-run the above script, instead using ParmEd to write to GROMACS format. As you've found, on conversion to AMBER format within BioSimSpace, any AMBER simulation fails with the EXTRA POINTS: nnb too small!" message. While the GROMACS simulations work, the single point energies are different (but close) to those that I obtained above using the the AMBER files written by ParmEd. The output was as follows:

Bond energies... AMBER = None, GROMACS = 241.7304 kcal/mol
Angle energies... AMBER = None, GROMACS = 1124.9594 kcal/mol
Dihedral energies... AMBER = None, GROMACS = 4337.7390 kcal/mol

Given the inconsistencies, I'm not sure whether the issue is with ParmEd or us.

lohedges avatar Mar 30 '22 09:03 lohedges

That's very helpful @lohedges ! @fjclark could you check whether the AMBER prm files produced by @lohedges 's script work with SOMD ? We can feedback to Bayer next week the above issues.

jmichel80 avatar Mar 30 '22 10:03 jmichel80

Thanks very much @lohedges. @jmichel80 will do.

fjclark avatar Mar 30 '22 10:03 fjclark

@jmichel80 I can confirm that the AMBER files produced by @lohedges's script work with SOMD. However, test alchemical simulations crash when I use the same config file I'm using for my current work, giving "NaN or Inf has been generated along the simulation" during minimisation. This turned out to be due to the use of "constraint = allbonds". Do you have any suggestions as to why this is happening, given that the system appeared well equilibrated (I have used an equilibration scheme almost identical to the one which produced input which works with this config file).

All relevant input is here. I have:

-Produced AMBER format files for brd4, lig1 complex using @lohedges 's script (this failed at first due to this issue - despite having compatible versions of ParmEd and OpenMM in my environment, an incompatible version of ParmEd was bundled with AmberTools, which worked after manually modifying decorators.py).

-Equilibrated using pmemd and pmemd.cuda - see input.

-Run a short production run using SOMD through BSS - successful

-Run single alchemical windows using my standard config file with (failure) and without (success) constraint set to "allbonds".

Thanks

fjclark avatar Mar 31 '22 09:03 fjclark

This is odd, constraints are disabled before energy minimisation. See https://github.com/michellab/Sire/blob/devel/wrapper/Tools/OpenMMMD.py#L1507-L1520

Also your single point energy before minimisation is identical before minimisation between the two runs, so I expect the same input was passed to integrator.minimiseEnergy()

Is the error reproducible ?

You could try disabling minimisation, that step may not be necessary for an ABFE run started from a structure equilibrated at discharge = 0.0

jmichel80 avatar Mar 31 '22 09:03 jmichel80

I've repeated the calculations, only adding or removing "set_constraints = allbonds" to the config. Adding this consistently causes the simulations to crash during minimisation. I'm running my modified version of SOMD with Boresch restraints, but I've repeated this with the latest version of sire as contained within BSS-dev - I observe the same thing.

When disabling minimisation, I get:

  2 NaN or Inf has been generated along the simulation
  3 Starting /home/finlayclark/sire_boresch_2022.app/bin/somd-freenrg: number of threads equals 20
  4 srun: error: ishikawa: task 0: Exited with exit code 255

Very similar to with minimisation, only no output related to setting up the system is displayed.

fjclark avatar Mar 31 '22 10:03 fjclark

Out of interest, do the original GROMACS files work with SOMD? (For example, if you just perform minimisation and equilibration with GROMACS rather than PMEMD.) It would be interesting to know if you experience the same constraint related crashes with those, too.

lohedges avatar Mar 31 '22 10:03 lohedges

Could you setup from scratch the input files with BioSimSpace (starting from pdb files extracted from the pmemd equilibration and solvating in BioSimSpace) to verify if SOMD runs without errors in that case ? If so comparing the prm7 files should indicate a discrepancy in the handling of bonded terms.

jmichel80 avatar Mar 31 '22 10:03 jmichel80

@lohedges I'm not able to equilibrate the original GROMACS files in BSS using GROMACS - I get failures during my final NPT equilibration. Based on this, it seems likely that the issue is with the original files.

fjclark avatar Mar 31 '22 11:03 fjclark

Could you setup from scratch the input files with BioSimSpace (starting from pdb files extracted from the pmemd equilibration and solvating in BioSimSpace.

I tried this myself, but the original PDB files provided in the repository contained unknown atom types, so parametrisation with ff14sb failed. (Perhaps it would work with PDBs extracted from the AMBER files, but that would rely on the AMBER files being correct in the first place, which we're currently unsure about.) I also couldn't parameterise the ligand within BioSimSpace since the SDF was required for stereochemistry. However, the parametrisation of the ligand should be identical in either case. (Same sequence of commands used.) The protein might differ though, since they are using OpenMM to parameterise, rather than AmberTools.

I'm not able to equilibrate the original GROMACS files in BSS using GROMACS - I get failures during my final NPT equilibration. Based on this, it seems likely that the issue is with the original files.

Thanks for confirming.

Cheers.

lohedges avatar Mar 31 '22 11:03 lohedges

@jmichel80, it looks like constraints aren't disabled before energy minimisation in runFreeNrg(), only run() https://github.com/michellab/Sire/blob/6ff6b04d56a3b76fb18f7ba2317557d15cd41cb6/wrapper/Tools/OpenMMMD.py#L1686-L1695 .

However, even when I modify the my version of Sire to disable constraints before minimisation in runFreeNrg(), I still get the same failure during minimisation:

CONSTRAINT OFF 
Starting /home/finlayclark/sire_no_constraint_minim.app/bin/somd-freenrg: number of threads equals 20
Traceback (most recent call last):
  File "/home/finlayclark/sire_no_constraint_minim.app/share/Sire/scripts/somd-freenrg.py", line 146, in <module>
    OpenMMMD.runFreeNrg(params)
  File "/home/finlayclark/sire_no_constraint_minim.app/lib/python3.7/site-packages/Sire/Tools/__init__.py", line 176, in inner
    retval = func()
  File "/home/finlayclark/sire_no_constraint_minim.app/lib/python3.7/site-packages/Sire/Tools/OpenMMMD.py", line 1696, in runFreeNrg
    system = integrator.minimiseEnergy(system, minimise_tol.val, minimise_max_iter.val)
RuntimeError: Particle coordinate is nan 
srun: error: ishikawa: task 0: Exited with exit code 1

fjclark avatar Mar 31 '22 16:03 fjclark

To follow up on this. The code to disable constraints in run() or runFreeNrg() is buggy. Changing constraint parameters via e.g. integrator.setConstraintType("none") does not cause the openMM system to be reinitialised because Integrator_OpenMM.initialise() is not subsequently called so no effect.

It is likely that the crash during energy minimisation is due to the an incompatibility between the way bonded parameters are defined in the parameter file "openff_unconstrained-2.0.0.offxml" here

and code to setup constraints/bonded terms for the solute in SOMD

https://github.com/michellab/Sire/blob/devel/corelib/src/libs/SireMove/openmmfrenergyst.cpp#L1802-L1814 vs https://github.com/michellab/Sire/blob/devel/corelib/src/libs/SireMove/openmmfrenergyst.cpp#L1868-L1905

jmichel80 avatar Apr 04 '22 13:04 jmichel80

Hello @lohedges ,

As Bayer does not have AMBER, I've been asked to check the consistency between revised input files: brd4_inputs.tar.gz

The AMBER inputs produce stable MD with sander (10 ps) and pmemd.cuda (1 ns) for both the complexes. The GROMACS input for the ligand 1 complex also seems to produce stable MD (currently at 0.5 ns).

However, when I attempt to load any of the GROMACS inputs into BSS (2022.1.0+5.g1a606e31, but I've also tried with 2022.2.1), I get the following error:

---------------------------------------------------------------------------
UserWarning                               Traceback (most recent call last)
File ~/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/BioSimSpace/IO/_io.py:377, in readMolecules(files, property_map)
    376 try:
--> 377     system = _SireIO.MoleculeParser.read(files, property_map)
    378 except Exception as e:

UserWarning: Exception 'SireMol::missing_atom' thrown by the thread 'master'.
There is no atom with the number "2" in the layout "{00000000-0000-0000-0000-000000000000}".
Thrown from FILE: /usr/share/miniconda/envs/sire_build/conda-bld/sire_1645533092052/work/corelib/src/libs/SireMol/moleculeinfodata.cpp, LINE: 3320, FUNCTION: virtual QList<SireMol::AtomIdx> SireMol::MoleculeInfoData::map(SireMol::AtomNum) const
__Backtrace__
(  0) /home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/Qt/../../../.././libSireError.so.2022 ([0x7fd2a6e1ae42] ++0x42)
  -- SireError::getBackTrace()

(  1) /home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/Qt/../../../.././libSireError.so.2022 ([0x7fd2a6e18dbe] ++0xae)
  -- SireError::exception::exception(QString, QString)

/home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/Mol/../../../../libSireMol.so.2022(+0x199c8e) [0x7fd2603e1c8e]
/home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/Mol/../../../../libSireMol.so.2022(+0x145104) [0x7fd26038d104]
(  4) /home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/Mol/../../../../libSireMol.so.2022 ([0x7fd2603f89cc] ++0x3c)
  -- SireMol::AtomNum::map(SireMol::MolInfo const&) const

(  5) /home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/Mol/../../../../libSireMol.so.2022 ([0x7fd260559851] ++0x31)
  -- SireMol::MoleculeInfoData::atomIdx(SireMol::AtomID const&) const

(  6) /home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/Mol/../../../../libSireMol.so.2022 ([0x7fd260552706] ++0x26)
  -- SireMol::MoleculeInfo::atomIdx(SireMol::AtomID const&) const

(  7) /home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/IO/../../../../libSireIO.so.2022 ([0x7fd1fabd6ce7] ++0x247)
  -- SireIO::GroTop::getBondProperties(SireMol::MoleculeInfo const&, SireIO::GroMolType const&) const

/home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/IO/../../../../libSireIO.so.2022(+0x2205cb) [0x7fd1fabd75cb]
/home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/IO/../../../../libSireIO.so.2022(+0x21f262) [0x7fd1fabd6262]
/home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/Base/../../../.././libtbb.so.12(+0x16092) [0x7fd29fb1c092]
( 11) /home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/IO/../../../../libSireIO.so.2022 ([0x7fd1fabd8156] ++0xb66)
  -- SireIO::GroTop::createMolecule(QString, QStringList&, SireBase::PropertyMap const&) const
/home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/IO/../../../../libSireIO.so.2022(+0x221373) [0x7fd1fabd8373]
/home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/IO/../../../../libSireIO.so.2022(+0x22178d) [0x7fd1fabd878d]
/home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/Base/../../../.././libtbb.so.12(+0x23302) [0x7fd29fb29302]
/home/finlayclark/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/Sire/Base/../../../.././libtbb.so.12(+0x1e794) [0x7fd29fb24794]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7fd2cc2d6609]
( 17) /lib/x86_64-linux-gnu/libc.so.6 ([0x7fd2cc095163] ++0x43)
  -- clone

__EndTrace__
Exception 'SireMol::missing_atom' thrown by the thread 'master'.
There is no atom with the number "2" in the layout "{00000000-0000-0000-0000-000000000000}".
Thrown from FILE: /usr/share/miniconda/envs/sire_build/conda-bld/sire_1645533092052/work/corelib/src/libs/SireMol/moleculeinfodata.cpp, LINE: 3320, FUNCTION: virtual QList<SireMol::AtomIdx> SireMol::MoleculeInfoData::map(SireMol::AtomNum) const

The above exception was the direct cause of the following exception:

OSError                                   Traceback (most recent call last)
/home/finlayclark/Documents/research/bayer_benchmark/testing_brd4_joe/brd4/test_input.ipynb Cell 17 in <module>
----> [1](vscode-notebook-cell://ssh-remote%2Bishikawa.chem.ed.ac.uk/home/finlayclark/Documents/research/bayer_benchmark/testing_brd4_joe/brd4/test_input.ipynb#W3sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0) a = BSS.IO.readMolecules(["ligand1/complex.top", f"ligand1/complex.gro"])

File ~/anaconda3/envs/biosimspace-dev/lib/python3.9/site-packages/BioSimSpace/IO/_io.py:397, in readMolecules(files, property_map)
    395 msg = "Failed to read molecules from: %s" % files
    396 if _isVerbose():
--> 397     raise IOError(msg) from e
    398 else:
    399     raise IOError(msg) from None

OSError: Failed to read molecules from: ['ligand1/complex.top', 'ligand1/complex.gro']

The missing atom number changes if I run the command repeatedly.

When I check the single-point energies for the ligand 1 complex for AMBER, I get:

Bond energies... AMBER = 1357.79 kJ/mol
Angle energies... AMBER = 1763.71 kJ/mol
Dihedral energies... AMBER = 6153.36 kJ/mol

For GROMACS, (bypassing BSS), I get the same angle and dihedral energies, but a very different bond energy:

Bond                                        354.189   kJ/mol
Angle                                       1763.71   kJ/mol
Dih. (proper+ improper)          6153.37   kJ/mol

Do you know what might be causing the above error and discrepancies?

Thanks very much.

fjclark avatar Oct 19 '22 10:10 fjclark

Thanks for this, I should be able to take a look later. The atom number might change because the parsing is done in parallel, so the order might not be reproducible. (Although things are sorted afterwards.)

lohedges avatar Oct 19 '22 11:10 lohedges

I think the issue is that the atom names in the *.gro files don't correspond to those from state A in the topology, so the Sire parser isn't finding an appropriate match. For example, here's a merge of ethane and methanol written to file by BSS. (State A = ethane, state B = methanol.)

gro

BioSimSpace System
    8
    1LIG      C    1   1.349   1.431   1.352
    1LIG      C    2   1.502   1.428   1.361
    1LIG      H    3   1.311   1.369   1.434
    1LIG      H    4   1.304   1.386   1.263
    1LIG      H    5   1.307   1.530   1.367
    1LIG      H    6   1.553   1.335   1.387
    1LIG      H    7   1.521   1.495   1.445
    1LIG      H    8   1.549   1.469   1.271
   0.00000   0.00000   0.00000

top

...
[ atoms ]
;   nr   type0  resnr residue  atom   cgnr    charge0        mass0   type1    charge1        mass1
     1      c3      1     LIG     C      1  -0.094350    12.010000      oh  -0.598800    16.000000
     2      c3      1     LIG     C      2  -0.094350    12.010000      c3   0.116700    12.010000
     3      hc      1     LIG     H      3   0.031450     1.008000      ho   0.396000     1.008000
     4      hc      1     LIG     H      4   0.031450     1.008000   hc_du   0.000000     1.008000
     5      hc      1     LIG     H      5   0.031450     1.008000   hc_du   0.000000     1.008000
     6      hc      1     LIG     H      6   0.031450     1.008000      h1   0.028700     1.008000
     7      hc      1     LIG     H      7   0.031450     1.008000      h1   0.028700     1.008000
     8      hc      1     LIG     H      8   0.031450     1.008000      h1   0.028700     1.008000
...

Now look at any of the examples from the archive you uploaded, e.g. ligand1/ligand.{gro/top}:

gro

GROningen MAchine for Chemical Simulation
 3530
    1Ligan  C1x    1   2.990   1.896   0.172
    1Ligan  C2x    2   2.520   1.512   0.139
    1Ligan  C3x    3   2.568   1.493  -0.169
    1Ligan  C4x    4   2.923   1.312  -0.546
    1Ligan  C5x    5   3.065   1.112  -0.469
    1Ligan  C6x    6   2.854   1.174  -0.351
    1Ligan  O1x    7   3.211   1.386  -0.471
    1Ligan Cl1x    8   2.482   1.848  -0.677
...

top

...
[ atoms ]
;   nr       type  resnr residue  atom   cgnr    charge       mass  typeB    chargeB      massB
; residue    1 Ligand 2 rtp Ligand 2 q 0.0
    1         C1      1 Ligand 2    C1x      1 -0.12445362  12.010780   ; qtot -0.124454
    2         C1      1 Ligand 2    C2x      2 -0.02585362  12.010780   ; qtot -0.150307
    3         C1      1 Ligand 2    C3x      3 -0.04085362  12.010780   ; qtot -0.191161
    4         C1      1 Ligand 2    C4x      4 -0.11282062  12.010780   ; qtot -0.303981
    5         C1      1 Ligand 2    C5x      5 -0.11282062  12.010780   ; qtot -0.416802
    6         C1      1 Ligand 2    C6x      6 -0.11282062  12.010780   ; qtot -0.529623
    7         O1      1 Ligand 2    O1x      7 -0.53605362  15.999430   ; qtot -1.065676
    8        Cl1      1 Ligand 2   Cl1x      8 -0.08845362  35.453200   ; qtot -1.154130
...

None of the dummy types, i.e. *x are present in the atomtypes section either, e.g.:

[ atomtypes ]
; name    at.num    mass    charge ptype  sigma      epsilon
C1             6  12.010780  0.00000000  A     0.33795318     0.45538912
O1             8  15.999430  0.00000000  A     0.30398122     0.87950233
Cl1           17  35.453200  0.00000000  A     0.33075278      1.1112708
C2             6  12.010780  0.00000000  A     0.34806469     0.36350306
N1             7  14.006720  0.00000000  A      0.3206876      0.7016213
O2             8  15.999430  0.00000000  A     0.30251065     0.70485815
S1            16  32.065500  0.00000000  A     0.35635949          1.046
H1             1   1.007947  0.00000000  A     0.26445434    0.066021356
H2             1   1.007947  0.00000000  A     0.25725815     0.06531786
H3             1   1.007947  0.00000000  A     0.25832257    0.068656285
O3             8  15.999430  0.00000000  A     0.31507524       0.635968
H4             1   1.007947  0.00000000  A              1              0
Na1           11  22.989769  0.00000000  A     0.33283976     0.01158968
Cl2           17  35.453200  0.00000000  A     0.44010397         0.4184

I'll see if I can modify the files for consistency in order to get them to load.

(For reference, I'm not sure if GROMACS cares about anything in the gro file, since it likely just matches things up by number. For our parsers, we have a few self-consistency checks when combining files to create a system, since the coordinates need not come from a GROMACS file at all, i.e. we need some way to be able to map entries in one file to the information in the topology, and line-by-line doesn't always work.)

lohedges avatar Oct 19 '22 12:10 lohedges

I just modified the name of one of the atoms in the ethane-methanol gro file and it does give an error, although different and more informative to what you are seeing:

BioSimSpace System
    8
    1LIG      C1   1   1.349   1.431   1.352
    1LIG      C    2   1.502   1.428   1.361
    1LIG      H    3   1.311   1.369   1.434
    1LIG      H    4   1.304   1.386   1.263
    1LIG      H    5   1.307   1.530   1.367
    1LIG      H    6   1.553   1.335   1.387
    1LIG      H    7   1.521   1.495   1.445
    1LIG      H    8   1.549   1.469   1.271
   0.00000   0.00000   0.00000
UserWarning: SireError::incompatible_error: Incompatibility between the files, could not match all atoms to the corresponding topology file. Check atom and residue names! (call sire.error.get_last_error_details() for more info)

It looks like this issue would trip you up, but isn't the source of the exact error that you are seeing.

lohedges avatar Oct 19 '22 12:10 lohedges

It may be failing because of the weird formatting of the state B information, e.g.:

 1         C1      1 Ligand 2    C1x      1 -0.12445362  12.010780   ; qtot -0.124454

I imagine that our parser isn't just ignoring stuff beyond the semi-colon, rather expecting valid information related to state B. Having fixed the names in the gro file, I'll try reading it as a single state file to see if that works.

lohedges avatar Oct 19 '22 13:10 lohedges

Oh, and just to note that we currently don't have read support for GROMACS FEP files, we can only write to them. The information would only ever be read from the information corresponding to state A.

lohedges avatar Oct 19 '22 13:10 lohedges

Okay, I've fixed it. The GROMACS files are formatted in a way which can't be parsed by Sire due to the inconsistent naming between gro and top and the use of whitespace in molecule type and residue names. For example the atomtypes line has a Ligand 2 entry for the residue, which is being broken into two records. In the corresponding gro file, the residue is named Ligan due to the space constraints.

In order to get things to read I needed to:

  1. Update all of the atom names in the gro file to match the atom names in state A from the atomtypes section for Ligand 2.
  2. Replace all instances of Ligand 2 in the top file with Ligan to avoid splitting on whitespace and to match the residue name in the gro file.

Here is an archive with the (hopefully) fixed GROMACS files for ligand1. This will allow you to load state A.

I'm not sure how these input files were originally generated, but the files look like they are pushing the limits of the GROMACS formatting and naming conventions. I didn't think that GROMACS used fixed width, so am surprised that things like Ligand 2 aren't causing issues elsewhere.

Cheers.

lohedges avatar Oct 19 '22 15:10 lohedges

Brilliant, thank you very much for sorting this out! I'll send Bayer a link to this issue.

fjclark avatar Oct 19 '22 15:10 fjclark

No problem. If you could check that files loaded after modification are sane, that would be great.

lohedges avatar Oct 19 '22 16:10 lohedges

Could we also capture in this issue how the input files were generated by Bayer ?

jmichel80 avatar Oct 19 '22 16:10 jmichel80

The files seem to be sane. Comparing the single-point energies of ligand1 from your fixed input and the AMBER input gives:

Bond energies... AMBER = 12.1355 kcal/mol, GROMACS = 12.9934 kcal/mol Angle energies... AMBER = 107.9002 kcal/mol, GROMACS = 107.9861 kcal/mol Dihedral energies... AMBER = 19.3379 kcal/mol, GROMACS = 19.3600 kcal/mol

Although when I convert the fixed GROMACS inputs to AMBER, process_amber.getBondEnergy() gives None for some reason:

lig_gmx = BSS.IO.readMolecules([f"fixed_input/fixed_ligand1/ligand.gro", f"fixed_input/fixed_ligand1/ligand.top"])

# Create a single-step minimisation protocol.
protocol = BSS.Protocol.Minimisation(steps=1)

# Create processes for AMBER and GROMACS simulations.
process_gmx = BSS.Process.Gromacs(lig_gmx, protocol)
process_amb = BSS.Process.Amber(lig_gmx, protocol)

# Modify the GROMACS config to perform zero steps.
config = process_gmx.getConfig()
config[1] = "nsteps = 0"
process_gmx.setConfig(config)

# Run the processes.
process_amb.start()
process_amb.wait()
process_gmx.start()
process_gmx.wait()

# Compare energies. (Where direct comparisons can be made.)
print(f"Bond energies... AMBER = {process_amb.getBondEnergy()}, GROMACS = {process_gmx.getBondEnergy()}")
print(f"Angle energies... AMBER = {process_amb.getAngleEnergy()}, GROMACS = {process_gmx.getAngleEnergy()}")
print(f"Dihedral energies... AMBER = {process_amb.getDihedralEnergy()}, GROMACS = {process_gmx.getDihedralEnergy()}")

Outputs

Bond energies... AMBER = None, GROMACS = 12.9934 kcal/mol
Angle energies... AMBER = 107.9843 kcal/mol, GROMACS = 107.9861 kcal/mol
Dihedral energies... AMBER = 19.3600 kcal/mol, GROMACS = 19.3600 kcal/mol

fjclark avatar Oct 19 '22 17:10 fjclark

@jmichel80 I'm afraid I don't yet have any more information on how the inputs were generated beyond what was in Joseph's email - that this is an altered workflow which is yet to be pushed to github.

fjclark avatar Oct 19 '22 17:10 fjclark

@lohedges , I was also wondering about comparing bond energies. In your comment above (https://github.com/michellab/BioSimSpace/issues/289#issuecomment-1082868601) you use these when comparing single-point energies. However, when I convert one of these systems (ligand 1 complex) loaded from the same (AMBER) input to GROMACS, I find that the bond energy changes by over 100 kcal / mol (although the angle and dihedral energies are unaffected).

Bond energies... AMBER = 324.5188 kcal/mol, GROMACS = 84.6532 kcal/mol
Angle energies... AMBER = 421.5360 kcal/mol, GROMACS = 421.5368 kcal/mol
Dihedral energies... AMBER = 1470.6892 kcal/mol, GROMACS = 1464.7347 kcal/mol

Is this unexpected or does this just depend on the details of the interconversion?

Thanks.

fjclark avatar Oct 19 '22 17:10 fjclark