Derrick Blakely

Results 9 comments of Derrick Blakely

I believe all of the segfaults are caused by building with an older version of gcc. I'm using a conda environment and did `conda install gcc_linux-64 gxx_linux-64` and reinstalled pytorch...

> The PyG `DataParallel` wrapper is closely related to the PyTorch one. Hence, you should be fine using the [`SyncBatchNorm`](https://pytorch.org/docs/master/nn.html#torch.nn.SyncBatchNorm) lately introduced [here](https://github.com/pytorch/pytorch/issues/2584). Training on one big graph across multiple...

I'm interested and can start working on this sometime soon.

Sorry @josiahbjorgaard, I did not.

Hey there, sorry to nag, but any chance of moving this along? Anything I can do to help?

I'd also greatly appreciate this feature! 🙏 In the meantime, I feel like it would be nice to have DeepSpeed raise a value error or at least give a warning...

Hey @eric-mitchell, I've been wondering the same thing. Any info you could share on this?

I ran into the same issue a few months ago and didn't have any success with `average_log_prob=True` -- the model became very degenerative. Ultimately I left `average_log_prob=False` and had to...

Hey @yata0, the author mentioned some ideas [here](https://github.com/eric-mitchell/direct-preference-optimization/issues/35#issuecomment-1705906371) and I tried each of those 4 suggestions. All of them helped to some extent. To "normalize" the data lengths, I simply...