Charge in consistency between the files in gromacs_solvated.tar.gz and gromacs.tar.gz
I just found that that atomic charges of part of the molecules are different in gromacs_solvated.tar.gz and gromacs.tar.gz.
Such as the mobley_820789, the atomic charges in gromacs/mobley_820789.top are:
While in gromacs_solvated/mobley_820789.top:
That's quite different. And only the parameters in gromacs/mobley_820789.top can reproduce the result in literature.
Openff-toolkit was used to regenerate the atomic charges of mobley_820789 with gaff-1.8, and the regenerated charges were consistent with the content in gromacs/mobley_820789.top.
How did the charges in gromacs_solvated generated? And was it processed after gaff-1.7?
Wow, yes, these are VERY different. That's quite concerning.
At this point (given how much time has elapsed) i don't have any information on provenance other than what's present in our scripts and paper. I certainly HOPE all of this was generated fully consistently via the scripts, but also the superficial first impression of the files is that this is what one would get if there is some kind of human error in generation protocol (e.g. someone copied the wrong file somewhere), which makes me worried. Do you have a sense of how widespread this problem is?
Well, I did a simple analysis today. If we compare the non-bond parameters only, there are 452 molecules that contain different parameters between gromacs_solvated.tar.gz and gromacs.tar.gz. But some of the charge parameters are closer, indicating that they may be generated with different random seeds.
If we enlarge the tolerance of the charges difference to 0.01 e, there are 29 molecules that contains different parameters within these two set.
mobley_6334915
mobley_3047364
mobley_1735893
mobley_6861308
mobley_2523689
mobley_628086
mobley_9979854
mobley_4936555
mobley_2929847
mobley_5948990
mobley_820789
mobley_6727159
mobley_2364370
mobley_7754849
mobley_5200358
mobley_7455579
mobley_3259411
mobley_902954
mobley_3572203
mobley_4792268
mobley_8754702
mobley_7326706
mobley_3802803
mobley_2269032
mobley_5890803
mobley_5571660
mobley_3265457
mobley_1770205
mobley_8124669
If you want to fix this problem, I recommend you to regenerate the files in gromacs_solvated.tar.gz as it contains parameters that do not reproduce the calculated solvation free energies.
If you want, I can submit a PR to fix this later.
Just to add to this, I've found quite a number of compounds where the total charge does not add up to 0, but is off by around 10^-2 - 10^-4 e. If desired, I can post a complete list of molIDs for which this is the case. This could possibly be related to the inconsistencies between the charges in gromacs and gromacs_solvated folders because the total number for which this is the case is similar (489, to be precise). It is also strange, as GROMACS should not allow topologies with charges like this to run without throwing at least a warning.
Did you look further into this problem? I'm now torn between regenerating the topologies myself or selecting the compounds with no charge inconsistencies for further analysis. For now, I went with the latter.
Thanks. I'm not sure why the inconsistencies between the directories, but I can comment on the overall issue. Certain tools my group used in generating and exporting GROMACS topologies tended to not write partial charges to sufficient precision, which would result in (essentially) rounding error accumulating so that charges don't exactly total zero. In more recent workflows we've often added a step where we took the resulting discrepancy (1e-2 to 1e-4) and distributed a compensationg amount across all the atoms in the molecules to make these exactly zero.
That said, we might well have run the calculations with a slightly noninteger charge, as in GROMACS this will typically result in a warning that the charge is not zero and it gets compensated for by a uniform neutralizing background charge. This might result in a very small difference in the hydration free energy, but for neutral molecules it would typically be smaller than the statistical error (unless you're running VERY long simulations to get very high precision), so not a particularly significant issue.
But yes, the partial charges in the two directories ought to be consistent, so I'm not sure how that happened.
I see, I had suspected a rounding error. Thank you for elaborating on your procedure and a very valuable resource!