gflownet icon indicating copy to clipboard operation
gflownet copied to clipboard

Shared pinned buffers

Open bengioe opened this issue 2 years ago • 3 comments

This PR implements a better way of sharing torch tensors between process by creating (large enough) shared tensors that are created once are used as a transfer mechanism. Doing this on the fragment environment seh_frag.py I'm getting a 30% wall time improvement for simple settings, with batch size 64 (I'm sure we could have fun maxing that out and see how far we can take GPU utilization).

Some notes:

  • The effect is mostly felt when sampling (which is where most time is spent in the first place), and sending Batch and GraphActionCategoricals through shared buffers improves time
  • Passing batches to the training loop (which are much bigger and "rarer") doesn't seem to have a significant speedup, but I've implemented it nonetheless for future proofing

Other changes:

  • Removed local grad clipping which is not quite correct; the difference is minimal but relevant, there's also a nice speedup
  • Made all algorithms inherit from GFNAlgorithm
  • global_cfg is set for all algorithms
  • cond_info is now folded into the batch object rather than being passed as an argument everywhere
  • fixed GraphActionCategorical.entropy when masks are used, gradients wrt logits would be NaN.

Note, EnvelopeQL is still in a broken state, will fix in #127

bengioe avatar Feb 23 '24 22:02 bengioe

I'm of a mind to merge this actually. It's not the cleanest implementation possible but there are significant gains here (as mentioned, a 30% speedup with the default settings on seh_frag.py). Will test across tasks and report back.

bengioe avatar Mar 01 '24 01:03 bengioe

Made significant simplifications to the method by subclassing Pickler/Unpickler, found some very tricky bugs (I was making a bad usage of pinned CUDA buffers and ended up with rare race conditions). Speedups remain (might even be a bit faster).

bengioe avatar Mar 11 '24 18:03 bengioe

Merged with trunk + made a few fixes. Pretty happy with this now!

bengioe avatar May 09 '24 15:05 bengioe