Sotiris Anagnostidis
Sotiris Anagnostidis
One of the baselines presented in the InstructGPT [paper](https://arxiv.org/pdf/2203.02155.pdf) is a "properly" prompted GPT-3 model (see footnote 6 and section 3.5). Before the specified instruction/prompt by the user, a specified...
Would be nice to have a shortcut, something like Ctrl/Cmd+Enter to press Review and Submit.
Rlhf
(mostly) Placeholder code for RLHF. @theblackcat102 Let me what you think about the structure. I think at this point we should choose one model and start to create scripts that...
When generating with decoder models, we can cache intermediate activations to avoid recomputing them. This is done by default in the `transformers` implementation when generating multiple new tokens. In our...
We need to make sure that different splits of the dataset are used for sft, reward and rl training. Basically [sft_dataset](https://github.com/LAION-AI/Open-Assistant/blob/main/model/model_training/custom_datasets/prompt_dialogue.py#L14) and [reward](https://github.com/LAION-AI/Open-Assistant/blob/main/model/reward/instructor/rank_datasets.py#L304) datasets needs to use the same splits.
Still TODOs: - Need to fix #1661 - @theblackcat102 please provide scripts on how you are preprocessing data for the RM We also need: - Simpler RM based on only...
Make the current sampler work correctly for distributed training - split the dataset per epoch per device - fix small error that cased the fractions/sizes to be ignored
We need a better evaluation pipeline to better quantify model performance and compare models with each other. Some ideas include - Evaluating on dataset for which we already have the...