wepy icon indicating copy to clipboard operation
wepy copied to clipboard

Slow reading speeds of weights from HDF5s

Open alexrd opened this issue 4 years ago • 2 comments

It can take a remarkably long time to read weights from an HDF5. I used iotop, and it shows a sustained reading rate of 220-230 Kb/s, but this is over tens of minutes just to read in the weights, which total 768 Kb of data. I'm guessing each weight is in a separate "chunk"? We should rearrange this to allow for more efficient reading.

For the record, this is my code for reading the weights: wts = np.array([np.array(we.h5['runs/0/trajectories/'+str(i)+'/weights']).flatten() for i in range(48)])

It takes ~30 min to read the weights from a 2000 cycle, 48 walker h5 file. For comparison, the RMSDs are stored in the same HDF5 file, but as an observable, and it takes 1-2 seconds to read them.

alexrd avatar May 19 '21 13:05 alexrd

How is your HDF5 file arranged/what post-processing did you do? Is each run linked in to the main one?

salotz-sitx avatar May 21 '21 14:05 salotz-sitx

Is this still a problem? I've been doing some benchmarking on our systems and which node you are on relative to the parallel file system on an HPC system can have a huge impact.

For testing I was extracting a subset of the data and then rechunking it. In this one I am rechunking the positions based on the number of atoms:

h5copy -v -i input/raw.wepy.h5 -s '/runs/0/trajectories/0' -o _output/traj0.h5 -d 'traj'
h5repack -v -i _output/traj0.h5 -o _output/traj0_goodchunks.h5 -l traj/positions:CHUNK=1x98391x3

Unfortunately there is nothing we can do to get the chunking in a good condition while the simulation is running. It has to be a post-processing step, and sometimes running h5repack can be slow so I wouldn't want it to be automatic.

salotz-sitx avatar Sep 10 '21 14:09 salotz-sitx