xtb_docs icon indicating copy to clipboard operation
xtb_docs copied to clipboard

Reparametrization Instructions

Open jolayfield opened this issue 5 years ago • 9 comments

My group has been looking to re-parametrize GFN2-xTB to calculate the vibrational structure of a specific class of small molecules. We would be happy to contribute a section to documents about how to do this since it appears to be missing. I am not sure if the omission is intentional but if it is desired we can add that section to the documentation if desired.

jolayfield avatar Jul 23 '20 20:07 jolayfield

We did not add information on the parametrisation so far since this is usually not the primary concern for users of xtb. Of course, any contributions to this document are highly appreciated.

The parametrisation tools used by us internally are somewhat involved and not available with xtb, creating a more standard way to access and manipulate the xTB Hamiltonian would certainly be necessary. Also, certain features of the xTB Hamiltonian, especially the self-consistent D4 in GFN2-xTB, add a little caveat to any parametrisation attempt, as it requires a specially modified xtb binary for this purpose.

awvwgk avatar Jul 24 '20 09:07 awvwgk

I would be also interested. Si was recently reparameterized for GFN1-xTB (10.1021/acs.jcim.1c01170). I am guessing that GFN1-xTB is easier to reparameterize than GFN2-xTB? That could be a starting point for this section of the documentation.

RaphaelRobidas avatar Dec 21 '21 18:12 RaphaelRobidas

I created some basic instructions for parametrization of the xTB Hamiltonian at https://tblite.readthedocs.io/en/latest/tutorial/fitting.html. The base parametrization doesn't really matter actually, if the infrastructure allows to use handle them on equal footing.

awvwgk avatar Dec 21 '21 18:12 awvwgk

This is great, thank you. The procedure seems to work so far. There is only one detail which is unclear to me in that tutorial, namely the format of the reference data. Is this described a bit more elsewhere?

It is mentioned that the structures must be in Turbomole format. I am guessing that each directory thus contains the reference energy as Turbomole output file? I unfortunately do not have access to Turbomole and would benefit from details to convert my reference data into the correct format.

RaphaelRobidas avatar Dec 21 '21 19:12 RaphaelRobidas

It is mentioned that the structures must be in Turbomole format.

The geometry input format doesn't really matter, coord, xyz, gen, ein, mol, sdf, pdf, or vasp are currently supported. It should however be consistent to allow automatic processing.

I am guessing that each directory thus contains the reference energy as Turbomole output file?

Currently I'm using a format from DFTB+ called tagged data (see https://github.com/tblite/tblite/blob/main/man/tblite-tag.5.adoc). Using a more standardized format in the future would be preferable.

Feedback is welcome.

awvwgk avatar Dec 21 '21 21:12 awvwgk

Thanks, I can get the fitting process running. The file format is not a huge issue in my opinion, since the calculation output files need to be parsed and formatted anyway. cclib actually parses the gradients for most packages. It would be fairly straightforward to write a helper script which parses the necessary information from raw output files and generates reference data in the required format. I'll code it, if you'd like.

Also, the fitting requires the virial. I'm not sure what that corresponds to, and it does not seem to be a term used in the output files (or does not refer to a 3x3 matrix). Does it require a certain keyword or I'm just missing the right synonym?

RaphaelRobidas avatar Dec 21 '21 23:12 RaphaelRobidas

The virial (pressure) is related to the stress tensor / lattice gradient and is usually only available from periodic DFT programs. However, tblite does always calculate and print it, but entries which don't have an equivalent in the reference will be ignored.

awvwgk avatar Dec 22 '21 07:12 awvwgk

Thanks a lot for the help! The reparameterization runs as expected by omitting the virial. The fitting has a tendency of either diverging into very high values (which become NaN) or staying quite close to the initial parameters. I'm guessing this depends on the initial guess and is an expected challenge of the fitting procedure.

RaphaelRobidas avatar Dec 23 '21 16:12 RaphaelRobidas

Actually, I just realized that the reference energies need to be normalized in some way to be compared with the calculated xTB energies. The best practices of this process could be useful to have in the documentation as well.

RaphaelRobidas avatar Dec 24 '21 14:12 RaphaelRobidas