Mark Worrall comments

Results 10 comments of


                                            Mark Worrall

Get model evaluation working on the reward model trainer

I can pick this up.

Get model evaluation working on the reward model trainer

PR above for this. ^^^ BTW: I [noticed](https://github.com/LAION-AI/Open-Assistant/blob/8fbed95aa5476b71f1503176ee8ef62167f11ec6/model/reward/instructor/rank_datasets.py#L116) we're breaking ties somewhat arbitrarily for [`WebGPT`](https://huggingface.co/datasets/openai/webgpt_comparisons) yet the dataset has many of them - was this deliberate?

Add data sampling features for SFT and future projects ( RLHF ) as well

I'm happy to pick this up. I'm new around here (this would be my first contribution) so would happily work on it with input from others. Let me know if...

Add data sampling features for SFT and future projects ( RLHF ) as well

Hi @sanagno, I have quite a few questions (sorry!). Questions related to this issue ❓ 1. Should the sampling be happening per epoch, or once at the start of training...

Add data sampling features for SFT and future projects ( RLHF ) as well

Thanks for your thoughtful comments @sanagno - greatly appreciated. 🙇 I'll have a think about the best way to implement the sampling per epoch but from a quick look overriding...

Add data sampling features for SFT and future projects ( RLHF ) as well

Just a quick update on this: the single GPU version is working and was simple enough. I've got a bit more to do to get it to work in a...

Create Explain XKCD dataset

Hey @Sobsz, would you like any help with this issue? I'm new around here and this feels like a good issue to get familiar with parts of the SFT pipeline....

Train a reward model based on flant5-xxl

Is this still open and needing work? I'd be very happy to help if so.

Train a reward model based on flant5-xxl

Hi @thaumstrial! 👋 I read [your write-up](https://thaumstrial.github.io/post/qian-chang-t5-encoder-mo-gai-ban-jiang-li-mo-xing/) on how you trained the reward model and that it didn't work. I don't think I follow the what procedure you used exactly...

Train a reward model based on flant5-xxl

Nice, thanks - will take a look!