some problems on training time
thanks to your brilliant work, and your code is really easy to read
I'm quite interested in your work so I was planning to try it, I've noticed that you said the EBMs can be trained on a single V100 tesla GPU for 3days on clevr, I have A40 which should be OK
but when I ran the code, each iteration takes about 10 seconds , the estimated time cost for one epoch is around 12 hours so I think there might be something wrong
Is Slurm a must? I did not use Slurm. Or maybe the training speed will get faster after the first epoch? since I've noticed that there are some buffers.
Could really need some help, thank you 💗
Maybe check if the program is run on the GPU that you get assigned to.
If there are multiple programs asking for the resources at the same time, this may happen.
But this shouldn't happen if you run on GPUs, assuming you are using the same setup.