iodata icon indicating copy to clipboard operation
iodata copied to clipboard

Clone all io code from molmod

Open tovrstra opened this issue 7 years ago • 14 comments

See:

  • https://github.com/molmod/molmod/tree/master/molmod/io
  • https://github.com/molmod/molmod/tree/master/molmod/io/test
  • https://github.com/molmod/molmod/tree/master/molmod/data/test

tovrstra avatar Jan 25 '19 15:01 tovrstra

Once the draft version of #43 for orca is ready we can collaborate on transferring the io code from molmod and tamkin.

evohringer avatar Mar 04 '19 13:03 evohringer

Dear Esteban, thank you for work on #43 . Shall we split the tasks of IO functions stolen from tamkin and molmod to avoid duplicate efforts, please? Except what has been implemented in iodata, cpmd.py and gamess.py both appeared in IO module in tamkin and molmod. I am currently working on qchem.py. @evohringer

FanwangM avatar Mar 12 '19 14:03 FanwangM

Dear Esteban, I am trying to fix the dependency problem for codes taken from tamkin, which is mostly about stealing some codes from molmod, https://github.com/theochem/iodata/issues/36#issuecomment-472716912. If we can divide the tasks by clarifying which file formats we are going to take and who is taking care of them, we can get to know the dependet-required codes from tamkin and molmod. Then we can fix the dependency issuse before taking codes from them.

The other option is we can add dependency coeds gradually in risk of duplicate work and messy files. Do you have any suggestions or a better way? Thank you. @evohringer

I can fix those ones soon,

from molmod.periodic import periodic
from molmod.io import PunchFile
from molmod import angstrom, amu, calorie, avogadro
from molmod.io.common import SlicedReader

FanwangM avatar Mar 14 '19 06:03 FanwangM

@fwmeng88 Thanks for tackling this. Since you are keeping us updated through this issue, the risk for overlap is small.

We should avoid that molmod becomes a dependency, even temporarily, because it contains outdated code that is not fully inconsistent with similar code in IOData, e.g. the way the units are defined. When you need more units, you can define the here, based on constants from SciPy:

https://github.com/theochem/iodata/blob/master/iodata/utils.py#L38

Similarly, there is some overlap with the following part of iodata too:

https://github.com/theochem/iodata/blob/master/iodata/periodic.py

It would be better to directly use these modules instead of their MolMod counterparts.

All formats in MolMod that make use of the SliceReader can only be ported over after #26 is fixed. It is a mechanism for reading (parts of) trajectories. I'm not sure if we need the SliceReader functionality in IOData. Instead we might also just read in a whole trajectory, instead of subsampling it when reading. @evohringer Would it be useful to subsample a trajectory upon reading, e.g. to read only every 100th timestep in memory?

tovrstra avatar Mar 14 '19 10:03 tovrstra

All formats in MolMod that make use of the SliceReader can only be ported over after #26 is fixed. It is a mechanism for reading (parts of) trajectories. I'm not sure if we need the SliceReader functionality in IOData. Instead we might also just read in a whole trajectory, instead of subsampling it when reading. @evohringer Would it be useful to subsample a trajectory upon reading, e.g. to read only every 100th timestep in memory?

I think the subsampling is faster and easier done in the software where the trajectories were created. Would we return a list of dictionaries when reading in the whole trayectory. How do we implement that?

evohringer avatar Mar 14 '19 14:03 evohringer

Dear Esteban, I am trying to fix the dependency problem for codes taken from tamkin, which is mostly about stealing some codes from molmod, #36 (comment). If we can divide the tasks by clarifying which file formats we are going to take and who is taking care of them, we can get to know the dependet-required codes from tamkin and molmod. Then we can fix the dependency issuse before taking codes from them.

We will start with the following formats: psf, gromacs, charmm and pdb

We will try to implement it without dependencies outside iodata.

evohringer avatar Mar 14 '19 14:03 evohringer

Great! I'm also fine not to support subsampling. I should still work out how trajectories could be handled in #7 . I'll comment there.

tovrstra avatar Mar 14 '19 16:03 tovrstra

Dear Esteban, I am trying to fix the dependency problem for codes taken from tamkin, which is mostly about stealing some codes from molmod, #36 (comment). If we can divide the tasks by clarifying which file formats we are going to take and who is taking care of them, we can get to know the dependet-required codes from tamkin and molmod. Then we can fix the dependency issuse before taking codes from them.

We will start with the following formats: psf, gromacs, charmm and pdb

We will try to implement it without dependencies outside iodata.

Dear Esteban, I will take care of wfx, qchem, gamess and cpmd where wfx is almost done.

FanwangM avatar Mar 15 '19 04:03 FanwangM

@evohringer and @tovrstra can you please decide what to do with these formats left from molmod: https://github.com/molmod/molmod/tree/master/molmod/io

  1. atrj
  2. cml
  3. cpmd
  4. crystal
  5. dlpoly
  6. gamess
  7. gromacs
  8. lammps
  9. psf

@evohringer have you started working on psf, gromacs, and charmm as mentioned before?

FarnazH avatar Jul 15 '20 02:07 FarnazH

@FarnazH : We decided to skip psf, gromacs and charmm and fpcues on pdb format instead which is accessible from all MD packages. But in principle we could add them in the future if needed.

Maybe @tovrstra can comment better on the cpmd, cp2k crystal since I have no experience with this formats.

evohringer avatar Jul 15 '20 12:07 evohringer

Perhaps @evohringer and @tovrstra could make a priority-order list of which file formats are most important, in case anyone is inclined to add support? @BradenDKelly is adding *.mwfn (multiwfn) and of the ones I saw, the ability to parse a GAMESS punch file would be relevant, as then we'd have reasonable support for GAMESS, Psi4, Gaussian, Orca, and Q-Chem (at least), which covers a lot of the quantum chemistry space at least.

P.S. This is a copy of the message in the Tamkin issue (#36 ) but that issue and this one are clearly related.

PaulWAyers avatar Jul 15 '20 14:07 PaulWAyers

@evohringer I understand that PDB is a better format (well-defined and versatile), but if I am not wrong, there is information in the output files that are not printed in PDB, and probably that's why the parsers were added to molmod in the first place. For example, looking at the gormacs parser in molmod (https://github.com/molmod/molmod/blob/master/molmod/io/gromacs.py), it reads time, position, velocity, and cell information from *.gro trajectory. While we are at it, I think it's useful to add these, especially because we just need to port them from molmod.

FarnazH avatar Jul 15 '20 16:07 FarnazH

@FarnazH No problem. @lmacaya will port the gromacs format as "gromacs.py".

evohringer avatar Jul 17 '20 21:07 evohringer

@RichRick1 please take a look at gamess format in molmod which needs to be transferred to iodata and be added in iodata/formats module. Thanks. https://github.com/molmod/molmod/blob/master/molmod/io/gamess.py

FarnazH avatar Aug 05 '20 19:08 FarnazH