Unable to synchronously open object (object 'nu' doesn't exist)
Hello! This work of yours has been a strong support to drive innovation in machine learning simulation and I thank you for your contribution. I was recently studying your project code.
I went to the /data_gen_NLE/ReactionDiffusionEq/ folder to generate the Reaction Diffusion dataset and went to the /pdebench/models/ folder to run run_forward_1D.sh to train the network. The command to run is:
CUDA_VISIBLE_DEVICES='0' python3 train_models_forward.py +args=config_ReacDiff.yaml ++args.filename='ReacDiff_Nu1.0_Rho1.0.hdf5' ++args.model_name='FNO'
Then, I encountered this bug.
Similarly, I went to the /data_gen_NLE/BurgersEq/ folder to generate the burgers dataset and then trained the network with the command,
CUDA_VISIBLE_DEVICES='2,3' python3 train_models_forward.py +args=config_Bgs.yaml ++args.filename='1D_Burgers_Sols_Nu1.0.hdf5' ++args.model_name='FNO'
and encountered a similar bug.
But, I used the dataset downloaded from the /pdebench/data_download/ directory for testing and the program was able to run successfully.
I wonder if it is a problem with the HDF5 file. I use the HDFView to check the Data format.
I found that the t-axis coordinate has 202 points(form 0 to 2.01) and the x-axis has 1024 points(form 0 to 1), but the tensor is a 2*5000 data format.
The config file to generate 1D_Burgers_Sols_Nu1.0.hdf5 files is
Hi. Thank you for your kind report. This could be originated from pmap which split batch dimension from (N_b, ...) into (N_GPU, N_b/N_GPU, ... ). Please try to reshape the resulting file batch dimension from the latter one to the original batch number. In addition, our forward script does not allow us to use multi-GPUs, so please only use 1-GPU for training.
Reference
Thanks for your reply.
vm_evolve = jax.pmap(jax.vmap(evolve, axis_name='j'), axis_name='i')
local_devices = jax.local_device_count()
uu = vm_evolve(u.reshape([local_devices, cfg.multi.numbers//local_devices, -1]))
save_dim=[cfg.multi.numbers]+list(uu.shape[-2:])
uu_reshape=uu.reshape(save_dim)
jnp.save(cwd+cfg.multi.save+'1D_Advection_Sols_beta'+str(beta)[:5], uu_reshape)
This is mt solution. For the Advection-1D data set, I created a new variable uu_reshape to change the original uu shape. However, this method is not applicable to different dimensions and different data sets. The variable save_dim needs to be assigned a value for different data sets. Is there a unified approach?
For the 1D PDEs (Advection, Burgers, and ReactionDiffuson), this issue has been fixed in commit https://github.com/pdebench/PDEBench/commit/b470a487fc37644c79d1c34d632d3360eddee357. The main problem was that the extraneous dimension due to JAX's pmap wasn't properly reshaped before serializing.
To generate 1D Advection dataset, try the following:
# generate data and save as .npy array
$ cd PDEBench/pdebench/data_gen/data_gen_NLE/AdvectionEq
$ CUDA_VISIBLE_DEVICES='2,3' python3 advection_multi_solution_Hydra.py +multi=beta1e0.yaml
# serialize to hdf5 by transforming npy file
$ cd ..
$ python Data_Merge.py
To generate 1D Burgers dataset, try the following:
# generate data and save as .npy array
$ cd PDEBench/pdebench/data_gen/data_gen_NLE/BurgersEq/
$ CUDA_VISIBLE_DEVICES='0,2' python3 burgers_multi_solution_Hydra.py +multi=1e-1.yaml
# serialize to hdf5 by transforming npy file
$ cd ..
$ python Data_Merge.py
To generate 1D ReactionDiffusion dataset, try the following:
# generate data and save as .npy array
$ cd PDEBench/pdebench/data_gen/data_gen_NLE/ReactionDiffusionEq/
$ CUDA_VISIBLE_DEVICES='0,1' python3 reaction_diffusion_multi_solution_Hydra.py +multi=Rho2e0_Nu5e0.yaml
# serialize to hdf5 by transforming npy file
$ cd ..
$ python Data_Merge.py