somd-freenrg crashes when ligand atoms are completely non-interacting
Describe the bug I've recently been trying to implement multiple distance restraints for absolute binding free energy calculations in BioSimSpace and SOMD (modified BSS, modified sire, and an example of how to run a complete calculation). The idea is that the ligand intermolecular interactions are removed with many distance restraints active, then there's a final stage where all but one of the restraints are released. The correction for releasing the ligand to the standard state volume can then easily be calculated.
During the final release stage, where the ligand is completely non-interacting I've been getting crashes. The config file I was using with my modified version of sire was:
save coordinates = True
ncycles = 50
nmoves = 25000
ncycles_per_snap = 4
buffered coordinates frequency = 500
timestep = 4 * femtosecond
reaction field dielectric = 78.4
cutoff type = cutoffperiodic
cutoff distance = 12 * angstrom
barostat = True
pressure = 1.00000 atm
thermostat = True
temperature = 300.00 kelvin
constraint = hbonds-notperturbed
energy frequency = 100
lambda array = (0.0, 0.125, 0.25, 0.375, 0.5, 1.0)
lambda_val = 0.0
perturbed residue number = 293
random seed = -1
gpu = 0
center solute = False
minimise = True
hydrogen mass repartitioning factor = 3
use distance restraints = True
use permanent distance restraints = True
distance restraints dictionary = {(1474, 4687): (4.432198728689354, 20.0, 0.5557321930560404), (1472, 4686): (3.164351608202915, 20.0, 0.6751681461642662), (1474, 4690): (4.700974930995326, 20.0, 0.6701990692583388), (1488, 4688): (6.7858681626072315, 20.0, 0.7488970818377805), (2444, 4677): (4.863475388033847, 20.0, 0.7671705708955612), (1472, 4691): (6.367026275053535, 20.0, 0.821252569114769), (2261, 4683): (6.773986513796716, 20.0, 0.8828616026118992), (1490, 4685): (7.720582022157998, 20.0, 0.9347068069890287), (286, 4674): (3.8127377709679453, 20.0, 0.9511268213071902), (374, 4680): (6.181106731821419, 20.0, 0.9637979205880889), (1521, 4692): (3.7598878700775633, 20.0, 0.9830902561404447), (1529, 4684): (7.87209938990775, 20.0, 1.0582870196044247), (374, 4693): (4.975946331072206, 20.0, 1.063773425343931), (346, 4676): (8.470528418491696, 20.0, 1.0492604125738865), (284, 4675): (4.725973194492003, 20.0, 1.1798376555744614), (390, 4682): (8.350568045090748, 20.0, 1.2118321370763052), (2436, 4679): (4.32826597759919, 20.0, 1.2565532487514752), (307, 4678): (7.55090821416431, 20.0, 1.3127014576568437), (1421, 4681): (8.80522670899781, 20.0, 1.3688291245090927), (259, 4673): (9.195983552051707, 20.0, 2.1342422686136686)}
permanent distance restraints dictionary = {(1474, 4689): (3.875183995573175, 20.0, 0.5185270232355128)}
turn on receptor-ligand restraints mode = True
And the pert file just mapped dummy atoms to dummy atoms, e.g.:
version 1
molecule LIG
atom
name C
initial_type du
final_type du
initial_LJ 0.00000 0.00000
final_LJ 0.00000 0.00000
initial_charge 0.00000
final_charge 0.00000
endatom
atom
name C1
initial_type du
final_type du
initial_LJ 0.00000 0.00000
final_LJ 0.00000 0.00000
initial_charge 0.00000
final_charge 0.00000
endatom
...
I still observed crashes at random values of lambda if I dropped the timestep to 1 fs and if I switched to Langevin dynamics.
To check if this was due to changes I had made to sire or the restraints, I ran some sets of calculations with Sire 2023.4.0 (a073701d68cd61f9be56e360be3b3ee3f812bbba/GitHub/2023-08-07T17:45:41+01:00 | UNCLEAN) but with ligand still non-interacting and with no restraints. Please see the input files. For each test I ran four repeats of each window. I got two crashes at lambda = 0 (Nan) and all simulations at lambda =1 resulted in Segmentation fault (core dumped) somd-freenrg -C somd.cfg -p CUDA -m somd.pert -c somd.rst7 -t somd.prm7.
Have I chosen some configuration parameters poorly, or should this not be happening?
Also, when I watch the trajectories back, the ligand makes large jumps after every 50th frame. There are 600 frames, and 600 / 50 =12 and ncycles / ncycles_per_snap = 12.5, so I assume this is an artifact of how the trajectories are written?
To Reproduce Please see the input files. Run run.sh. with sire 2023.4.0. (I observed two crashes in four repeats of the stage).
Expected behavior Stable MD.
(please complete the following information):
- OS: Ubuntu 22.04
- Version of Python: 3.10.12
- Version of sire: 2023.4.0 (a073701d68cd61f9be56e360be3b3ee3f812bbba/GitHub/2023-08-07T17:45:41+01:00 | UNCLEAN)
- I confirm that I have checked this bug still exists in the latest released version of sire: [no] (As I would rather avoid rerunning too many simulations unless there have been very recent changes which are likely to affect this)
Thank you.
hi @fjclark - do you also observe crashes with MIF inputs from the MDR paper repo (that presumably did not cause runtime errors with earlier versions of sire used for the MDR paper) ?
Can confirm that I see the same segmentation fault at lambda = 1. Will see if I can trace it.
Just to note that I don't see a mapping from dummy ambertype to dummy ambertype in your example pert files, but this doesn't seem to be the source of the problem:
version 1
molecule LIG
atom
name C
initial_type C3
final_type du
initial_LJ 0.00000 0.00000
final_LJ 0.00000 0.00000
initial_charge 0.00000
final_charge 0.00000
endatom
atom
name C1
initial_type C3
final_type du
initial_LJ 0.00000 0.00000
final_LJ 0.00000 0.00000
initial_charge 0.00000
final_charge 0.00000
endatom
...
hi @fjclark - do you also observe crashes with MIF inputs from the MDR paper repo (that presumably did not cause runtime errors with earlier versions of sire used for the MDR paper) ?
Hi @jmichel80 I'll check now (yes I did not observe these crashes previously).
For lambda = 1, I can confirm that the segfault occurs on this line of OpenMMMD.py. I'll add some debugging statements to the C++ code to see where it's blowing up. (Obviously this bit works at lambda = 0, so NaNs there might be completely unrelated.)
Just to note that I don't see a mapping from dummy ambertype to dummy ambertype in your example pert files, but this doesn't seem to be the source of the problem:
version 1 molecule LIG atom name C initial_type C3 final_type du initial_LJ 0.00000 0.00000 final_LJ 0.00000 0.00000 initial_charge 0.00000 final_charge 0.00000 endatom atom name C1 initial_type C3 final_type du initial_LJ 0.00000 0.00000 final_LJ 0.00000 0.00000 initial_charge 0.00000 final_charge 0.00000 endatom ...
Sorry, I should have explained - this is because the standard version of sire doesn't allow atoms to start and end as dummy atoms, so I've had to modify the pert file. For my modified version of sire, this is allowed (https://github.com/OpenBioSim/sire/commit/45135642ec94caa2b611971a4d948bda2b15cd42) and the pert file is as shown in my original post.
Also, the pert files are missing an endmolecule record at the end, but again this doesn't seem to affect the segfault. Looking at the output of the parser it seems to get to the initial_charge of the final atom, then crashes:
...
"name O"
"initial_type O3"
"final_type du"
"initial_LJ 3.03981 0.21021"
"final_LJ 0.00000 0.00000"
"initial_charge -0.00000"
"final_charge -0.00000"
"endatom"
"atom"
"name O1"
"initial_type O3"
"final_type du"
"initial_LJ 0.00000 0.00000"
"final_LJ 0.00000 0.00000"
"initial_charge0.00000"
[1] 110167 segmentation fault (core dumped) somd-freenrg -C somd.cfg -t somd.prm7 -c somd.rst7 -m somd.pert -p CUDA
Ah, I see, there is no space in initial_charge0.00000 for the final record. Printing the line and the words (the records after the line is split) makes it clear what's going wrong. Not sure why I didn't spot this above :man_facepalming:
("atom")
"name O"
("name", "O")
"initial_type O3"
("initial_type", "O3")
"final_type du"
("final_type", "du")
"initial_LJ 3.03981 0.21021"
("initial_LJ", "3.03981", "0.21021")
"final_LJ 0.00000 0.00000"
("final_LJ", "0.00000", "0.00000")
"initial_charge -0.00000"
("initial_charge", "-0.00000")
"final_charge -0.00000"
("final_charge", "-0.00000")
"endatom"
("endatom")
"atom"
("atom")
"name O1"
("name", "O1")
"initial_type O3"
("initial_type", "O3")
"final_type du"
("final_type", "du")
"initial_LJ 0.00000 0.00000"
("initial_LJ", "0.00000", "0.00000")
"final_LJ 0.00000 0.00000"
("final_LJ", "0.00000", "0.00000")
"initial_charge0.00000"
("initial_charge0.00000")
[1] 114032 segmentation fault (core dumped) somd-freenrg -C somd.cfg -t somd.prm7 -c somd.rst7 -m somd.pert -p CUDA
After fixing this, the lambda = 1 window runs. Not sure if will NaN at a later date, though.
Sorry about this - it looks like I've messed up editing a few of the pert files so that initial LJ terms remain for O. The pert file does seem to be correct for lambda = 0 though, where the Nans occur.
No problem. I'll look at that one to see if I can get it to fail.
No crashes yet, but I can confirm that I see the ligand "jump" after 50 frames. It almost looks like everything is slightly rotated within the box. It's visible for the protein and ions too, but harder to tell when waters are included. Not sure if this is some buffering error, i.e. there is a larger jump in time between frames 49-50, 99-100, ..., than any other pair of frames, or whether something else is going on.
It seems like this is due to the use of OFF vs GAFF. When I reparameterise the ligand with GAFF2, keeping the initial coordinates identical, I haven't yet observed any crashes. I think the original ligand FF is OFF2 (set up by a collaborator - I have asked for confirmation). Tests run:
- Applying intermolecular restraints to the decoupled ligand (requires my modified version of Sire - happy to supply inputs if needed). OFF system: 2/ 6 simulations crash with Nan. GAFF2 system: 0 / 18 simulations crash. Using
constraint=hbonds-notperturbed. - Setting
constraint=allbonds. OFF system: immediately crashes in all cases. GAFF2 system: runs as expected in all cases.
The input systems are here.
This is another example of the issue discussed here, where constraint=allbonds causes OFF systems to crash.
Thanks.
Is this still an open issue? I'm doing some spring cleaning ;-)
I'm afraid this is still an open issue - using GAFF instead of any OFF seems to improve things substantially, but with more tests I still get the occasional crash with a fully decoupled ligand. I would also prefer to use OFF2 with constraint=allbonds, but this always causes crashes.
Would it be worth trying this in somd2? If you don't see the crash then we can debug where in somd things are going wrong. And if you still see the crash then this is definitely something we need to fix in somd2 :-)
Yes, definitely. As discussed, I'll have a shot in the next few weeks.
I've tried some similar tests with SOMD2:
Overall Set-Up
- Systems: Protein-ligand complexes where the ligand is parameterised with GAFF2 or OFF2, taken from the comment above.
- All input/scripts/setup: gh_issue_test_off_gaff_stability.tar.gz
The scripts are almost the same as used in the somd2 ABFE issue . Only differences are:
- Most examples don't use any restraints, so can be run with standard SOMD2. The tests with restraints require my modified SOMD2 branch.
- I only use two windows, as I was looking for instabilities at the end state, and wanted a sanity check that things were still stable at lambda =0.
Software
BioSimSpace: 2024.1.0.dev Sire: 2024.1.0.dev SOMD2: my modified SOMD2 branch, which has only very minor changes from SOMD2 main, which I merged in this morning before installing. python: 3.11.8 OS: Ubuntu 22.04
Tests
GAFF2
Sanity check with many windows
Carried out once to check that we got a free energy change of roughly the expected size:
barostat_frequency: 25
charge_scale_factor: 0.2
checkpoint_frequency: 100 ps
com_reset_frequency: 10
constraint: h_bonds
coulomb_power: 0.0
cutoff: "12 \xC5"
cutoff_type: rf
dynamic_constraints: true
energy_frequency: 10 ps
equilibration_time: '0 '
equilibration_timestep: 0.001 ps
frame_frequency: 20 ps
h_mass_factor: 1.5
include_constrained_energies: false
integrator: langevin_middle
lambda_schedule: "LambdaSchedule(\n morph: (-\u03BB + 1) * initial + \u03BB * final\n\
\ restraint: 1\n)"
log_file: somd2.log
log_level: info
max_gpus: 1
max_threads: 20
minimise: true
num_lambda: 34
output_directory: output
overwrite: false
pert_file: null
**perturbable_constraint: bonds_not_heavy_perturbed**
platform: cuda
pressure: 1 atm
restart: false
**restraints: null**
run_parallel: true
**runtime: 100 ps**
save_trajectories: true
save_velocities: false
shift_delta: "2 \xC5"
somd1_compatibility: false
swap_end_states: false
temperature: 298 K
**timestep: 0.004 ps**
- No crashes observed
- Free energy change of roughly expected magnitude (~ 340 kcal / mol).
"Standard" Settings
10 repeats with two windows each:
barostat_frequency: 25
charge_scale_factor: 0.2
checkpoint_frequency: 100 ps
com_reset_frequency: 10
constraint: h_bonds
coulomb_power: 0.0
cutoff: "12 \xC5"
cutoff_type: rf
dynamic_constraints: true
energy_frequency: 10 ps
equilibration_time: '0 '
equilibration_timestep: 0.001 ps
frame_frequency: 20 ps
h_mass_factor: 1.5
include_constrained_energies: false
integrator: langevin_middle
lambda_schedule: "LambdaSchedule(\n morph: (-\u03BB + 1) * initial + final * \u03BB\
\n restraint: 1\n)"
log_file: somd2.log
log_level: info
max_gpus: 1
max_threads: 20
minimise: true
num_lambda: 2
output_directory: output_1
overwrite: false
pert_file: null
**perturbable_constraint: bonds_not_heavy_perturbed**
platform: cuda
pressure: 1 atm
restart: false
**restraints: null**
run_parallel: true
**runtime: 100 ps**
save_trajectories: true
save_velocities: false
shift_delta: "2 \xC5"
somd1_compatibility: false
swap_end_states: false
temperature: 298 K
**timestep: 0.004 ps**
- NaN crash observed in 1/10 runs
- Crash observed at lambda = 1, so likely same issue as discussed in the somd2 ABFE issue where the molecule is not properly coupled to the heat bath (issue was slightly improved by using a Brownian integrator)
OFF2
"Standard" Settings
10 repeats with two windows each, same config as for GAFF2.
- NaN crashes observed for all runs for all lambda windows during dynamics
With timestep of 0.5 fs
Config the same as "standard", but with a 0.5 fs timestep.
- NaN crashes for all runs for all lambda windows during dynamics
With timestep of 0.5 fs and perturbable_constraint=none
Config the same as "standard", but with a 0.5 fs timestep and constraints removed via "perturbable_constraint=none".
- NaN crash observed in 3/10 runs
- Crashes only observed at lambda =1
Overall Points
- Issues are broadly the same as SOMD1
- GAFF2 simulations are generally much more stable than OFF2 simulations
- GAFF2
- Fairly stable with
bonds_not_heavy_perturbed - A single crash observed when the ligand was fully decoupled, possibly the same cause as that observed for Benzene in the SOMD2 issue
- Fairly stable with
- OFF2
- Very unstable with
bonds_not_heavy_perturbed- all dynamics crash with NaN at lambda = 0 and 1. My understanding is thatbonds_not_heavy_perturbedconstrains heavy bonds, so this seems to be the same issue as when we useconstraint=allbondswith SOMD1 - Crashes at lambda=0 removed by using
perturbable_constraint=none, but several crashes at lambda =1 persist, so still seems less stable than GAFF2.
- Very unstable with
Conclusion
Constraints on heavy bonds cause the OFF2 system to blow up at all values of lambda, but can be used on the GAFF2 system. Without constraints on heavy bonds, both systems seem stable when the ligand is fully interacting, but show crashes when the ligand is fully decoupled (and unrestrained).
Could you try with h_bonds_not_perturbed? I found that bonds_not_perturbed would end up constraining everything for the tyk2 set and I also couldn't run anything. (I think this was possibly due to minimising with constraints for a non-equilibrated system).
Also, it might be worth checking what constraints are applied for the different force fields, and whether things are better if you use the somd1_compatibility_mode. (Once a dynamics object is created, you can use d.get_constraints() to print what bonds are constrained.)
hi @fjclark - you could try using a Nose Hoover integrator and add a sub-thermostat to control the temperature of the solute separately from the rest of the system, see http://docs.openmm.org/development/api-python/generated/openmm.openmm.NoseHooverIntegrator.html
see also https://github.com/openmm/openmm/issues/2910
Thanks for the suggestions.
h_bonds constraints
@lohedges:
Could you try with h_bonds_not_perturbed?
OFF2 h_bonds_not_heavy_perturbed
Same as "standard" protocol other than perturbable_constraint=h_bonds_not_heavy_perturbed and 20 repeats.
- This removes the instabilities at lambda != 1. NaNs at lambda =1 observed in 2/20 runs.
GAFF2 h_bonds_not_perturbed
- No NaNs observed over 20 repeats. So possibly more stable with with
bonds_not_heavy_perturbedfor GAFF2, but can't say for sure as we only saw one crash in that case.
OFF2 with SOMD1 Compatibility Mode
- Everything blows up during every dynamics attempt as before. As expected given that this happened with SOMD1.
OFF2 with SOMD1 Compatibility Mode and h_bonds_not_perturbed
- 0/10 repeats crash.
Constraints Applied
There are 33 bonds in the ligand: 11 to H and 22 with only heavy atoms:
OFF2/GAFF2 with bonds_not_heavy_perturbed: All 33 bonds constrained
OFF2/GAFF2 with h_bonds_not_heavy_perturbed: 11 bonds to H constrained only
So as with SOMD1, constraints between heavy atoms are unusable with OFF2 but don't seem to make the GAFF2 simulations any less stable.
I'm going to bump this to 2024.2.0 as we've made a lot of progress, but more is needed before we put in all the fixes for 2024.1.0
I am bumping this to 2024.3.0. Could you check if the latest fixes in 2024.2.0.dev have fixed the instabilities. They have fixed instabilities in other systems.
I retested the OFF2 simulations above (10 repeats, "standard" settings) with sire 2024.2.0.dev, but unfortunately all 10 simulations fail with NaN, as before.
Just to check. Are you still using the somd1 compatibility mode? All recent fixes are applied in that layer.
Thanks - I wasn't. However, they all still crash when I turn on somd1 compatibility mode. Full config:
barostat_frequency: 25
charge_scale_factor: 0.2
checkpoint_frequency: 100 ps
com_reset_frequency: 10
constraint: h_bonds
coulomb_power: 1.0
cutoff: "12 \xC5"
cutoff_type: rf
dynamic_constraints: true
energy_frequency: 10 ps
equilibration_constraints: false
equilibration_time: '0 '
equilibration_timestep: 0.001 ps
frame_frequency: 20 ps
h_mass_factor: 1.5
include_constrained_energies: false
integrator: langevin_middle
lambda_energy: null
lambda_schedule: "LambdaSchedule(\n morph: initial * (-\u03BB + 1) + final * \u03BB\
\n restraint: 1\n)"
lambda_values: null
log_file: somd2.log
log_level: info
max_gpus: 1
max_threads: 20
minimise: true
num_lambda: 2
output_directory: output_1
overwrite: false
pert_file: null
perturbable_constraint: h_bonds_not_heavy_perturbed
platform: cuda
pressure: 1 atm
restart: false
restraints:
- !!python/object/apply:sire.legacy.MM._MM.BoreschRestraints
state: !!python/tuple
- sire_pickle_data: AAABSnheuxoUfZOBgYGxY+J8v1knHwQA2UxIbEaQnI1h6Tp0NUA+MxADAeshKH0YSh9B0g9SA1UnFAiRFwqC0sEO/Axx8QbP/yOpZwHZb//lZn3StnW19t+XrDgYcrINSZ4VZJ4DS0n7n53Kix3Ya3YqL5pTYf9t0ZzyWdmBDiIQk5HUs4HMg4giAJI8O8J96KoYGGzs2XpAoqs0n50G+R/keIYihlSGYoYSIJ3IkMmQx1ACUwUAAs1Nbw==
sire_pickle_version: 1
run_parallel: true
runtime: 100 ps
save_energy_components: false
save_trajectories: true
save_velocities: false
shift_delta: "1 \xC5"
somd1_compatibility: true
swap_end_states: false
temperature: 298 K
timestep: 0.004 ps
write_config: true
No problem. Thanks for confirming.