General API for preparing IOData object before dumping
Motivation
At the moment, it is mostly assumed that an IOData instance contains all the right attributes in the right form before it is passed on to a dump_one call. Some file formats (WNF, WFX and potentially also FCHK) modify the IOData object to become compatible with the format. Typical modifications include:
- WFX/WFN: converting basis to Cartesian functions (with transformation of the MO coeffs)
- WFX/WFN: decontracting the basis (with transformation of the MO coeffs)
- FCHK: convert (natural) orbitals to density matrix, recontraction of basis sets.
- In general, possibly too far fetched, reverse-engineering contractions from WFN/WFX files.
This is problematic for several reasons:
- It makes
dump_onefunctions long and complicated. - Users may not be aware of the conversion taking place, which may result in loss of information.
- It may sometimes be of interest to disable conversions, e.g. when they are optional or when the user does not want any conversions (and prefers an exception to be raised instead). The latter is typical when dealing with conversions of large data sets, where data preservation is desirable and unintended loss of information due to conversion is not wanted.
- Some of the current conversions introduce redundant data, which results in inefficient use of storage.
See also:
- https://github.com/theochem/iodata/pull/164#discussion_r467966558
- https://github.com/theochem/iodata/pull/252#pullrequestreview-2099358284
Proposal
-
Add an optional
prepare_dumpfunction to the fileformat modules. If present, it takes andIODatainstance as argument, and returns a potentially modified one. The givenIODatainstance is not modified. An optionallow_changes=Falseshould be added, to allow disabling any conversion. If this flag is set toFalseand the file cannot be written without conversion, an exception is raised. If this flag is set toTrue, a warning is emitted when a conversion is applied. -
The
dump_oneanddump_manyin the file formats functions call the newprepare_dumpfunction before dumping. -
Add a
allow_changes=Falseoption to thedump_oneanddump_manyfunctions in the file formats modules. This is passed on to theprepare_dumpfunction. The dump functions return the potentially modifiedIODatainstance(s). -
Add a
allow_changes=Falseoption to thedump_oneanddump_manyfunctions in the moduleiodata.api. This is passed on to the dump functions of the selected file format. Also these dump functions return the potentially modifiedIODatainstance(s). -
Factor out some of the reusable utility functions to modify the
IODataobject, e.g. manipulations of basis set and corresponding changes to MO coefficients. -
Add an option to the script
iodata-convertto enable or disable modifications before dumping. -
Add basic sanity check to
dump_oneanddump_manythat required attributes are notNonebefore creating a file. Such missing attributes will raise an error, and may result in overwriting the output with an empty file, which is never useful and may ocassionally lead to data loss. This type of pre-flight check could be added toprepare_dump, but it is better to write one general implementation for all file formats, so it is always checked.
TODO list
- [x] Validate that required fields are present in
dump_oneanddump_many. Indump_one, this can be done before even creating the file. Indump_manythis check is possible on the first frame before creating a file, not for later ones. See #337 - [x] Implement a light version of the
prepare_dumpAPI and use it for validity checks: JSON, FCHK, WFX, WFN, Molden, Molekel. This does not add any actual conversion yet. (These will be implemented in later pull requests.) See #344 - [x] Split
FileFormatErrorintoLoadErrorandDumpErrorand update formats modules to use these consistently. The current use of exceptions in the formats module is not consistent. Idem forFileFormatWarning. See #345 - [x] Replace
lit.errorby raisingLoadErrordirectly and extendLoadErrorwith the logic inlit.error. Update contributor guide accordingly. See #348 - [x] Replace
lit.warnby directwarnings.warn()using an improved LoadWarning class that contains the logic now implemented inlit.warn(). Also extendDumpWarningwith file or filename argument, likeLoadWarning. Do not uselit.warnExplainLoadWarningin the contributor guide. See #349 - [x] Extend
DumpErrorwith file or filename argument, likeLoadError.Directly subclass exceptions fromExceptioninstead ofValueError. (The latter improves testing when usingpytest.raises.) See #349 - [x] Convert MOs to unrestricted if format does not support
occs_aminusb: WFN, WFX, Molden, Molekel. #352 - [x] Replace catch-all constructs like
warnings.catch_warnings()in unit tests by more specific warnings. See #353 - [x] Make optional arguments in
iodata.apimandatory keywords, by inserting*,in the argument list. See #355 - [x] Rename argument
iodatainiodata.apitodatafor consistency with the rest of the code. See #356 - [x] Turn
Shellattributes into arrays (now lists) with converter functions, in analogy toIODataattributes. #371 - [x] Move
convert_*functions frombasisandorbitalstoconvertmodule, which becomes the lower-level analog of thepreparemodule. It does similar things, but without the context of dumping data to files. #372 - [x] Write a prepare function to segment a basis before dumping. This can be used by the following formats: Molden, WFN, WFX, FCHK (except for SP shell), Molekel. There should be an option to leave SP shells in place while segmenting all others. At the same time, fix the third point in #256. See #373
- [ ] Write prepare function to sort shells by center. (Molden and Molekel assume this.)
- [ ] Decontract basis and convert MOs in
prepare_dumpfor WFN and WFX. This would also fix #258. - [ ] Convert MOs to Cartesian basis in
prepare_dumpfor WFN and WFX. This would also fix #259. - [ ] Make all convert functions consistent: when no changes are needed, they return reference to the input objects.
- [x] Add
--allow-changesoption to command-line interface. See #374 - [ ] Extend the contributor guide with the following:
- Explain how to raise errors and warnings in
dump_one - Explain how to write a
prepare_dumpfunction, and how to raise errors and warnings
- Explain how to raise errors and warnings in
- [x] Split getting started into four pages: command-line, loading, dumping and input writing, as also mentioned in #210. See #351
- [ ] Expand getting started page on command-line usage to illustrate the
--allow-changesoption. - [ ] Expand getting started page on dumping, to illustrate the
allow_changeskeyword argument.
Another example of required conversion is discussed in #252: many formats do not support restricted orbitals with "unrestricted occupation numbers". In this case, the orbitals need to be converted to unrestricted form to be able to write a file.