Sotiris Anagnostidis

Results 13 issues of Sotiris Anagnostidis

Typos, refactoring, QA-labels and GPT-JT training

ml

One of the baselines presented in the InstructGPT [paper](https://arxiv.org/pdf/2203.02155.pdf) is a "properly" prompted GPT-3 model (see footnote 6 and section 3.5). Before the specified instruction/prompt by the user, a specified...

ml

Would be nice to have a shortcut, something like Ctrl/Cmd+Enter to press Review and Submit.

website
UI/UX
beta feedback

(mostly) Placeholder code for RLHF. @theblackcat102 Let me what you think about the structure. I think at this point we should choose one model and start to create scripts that...

ml

When generating with decoder models, we can cache intermediate activations to avoid recomputing them. This is done by default in the `transformers` implementation when generating multiple new tokens. In our...

ml
needs discussion
inference

We need to make sure that different splits of the dataset are used for sft, reward and rl training. Basically [sft_dataset](https://github.com/LAION-AI/Open-Assistant/blob/main/model/model_training/custom_datasets/prompt_dialogue.py#L14) and [reward](https://github.com/LAION-AI/Open-Assistant/blob/main/model/reward/instructor/rank_datasets.py#L304) datasets needs to use the same splits.

ml

Still TODOs: - Need to fix #1661 - @theblackcat102 please provide scripts on how you are preprocessing data for the RM We also need: - Simpler RM based on only...

ml

Make the current sampler work correctly for distributed training - split the dataset per epoch per device - fix small error that cased the fractions/sizes to be ignored

ml

We need a better evaluation pipeline to better quantify model performance and compare models with each other. Some ideas include - Evaluating on dataset for which we already have the...

ml

Always save the last model at the end of training.

ml