Mark Worrall
Mark Worrall
I can pick this up.
PR above for this. ^^^ BTW: I [noticed](https://github.com/LAION-AI/Open-Assistant/blob/8fbed95aa5476b71f1503176ee8ef62167f11ec6/model/reward/instructor/rank_datasets.py#L116) we're breaking ties somewhat arbitrarily for [`WebGPT`](https://huggingface.co/datasets/openai/webgpt_comparisons) yet the dataset has many of them - was this deliberate?
I'm happy to pick this up. I'm new around here (this would be my first contribution) so would happily work on it with input from others. Let me know if...
Hi @sanagno, I have quite a few questions (sorry!). Questions related to this issue ❓ 1. Should the sampling be happening per epoch, or once at the start of training...
Thanks for your thoughtful comments @sanagno - greatly appreciated. 🙇 I'll have a think about the best way to implement the sampling per epoch but from a quick look overriding...
Just a quick update on this: the single GPU version is working and was simple enough. I've got a bit more to do to get it to work in a...
Hey @Sobsz, would you like any help with this issue? I'm new around here and this feels like a good issue to get familiar with parts of the SFT pipeline....
Is this still open and needing work? I'd be very happy to help if so.
Hi @thaumstrial! 👋 I read [your write-up](https://thaumstrial.github.io/post/qian-chang-t5-encoder-mo-gai-ban-jiang-li-mo-xing/) on how you trained the reward model and that it didn't work. I don't think I follow the what procedure you used exactly...
Nice, thanks - will take a look!