Sotiris Anagnostidis issues

Results 13 issues of


                                            Sotiris Anagnostidis

Sft gptjt qa labels

Typos, refactoring, QA-labels and GPT-JT training

Few-shot prefix a LLM to prompt it into an instruction following mode

One of the baselines presented in the InstructGPT [paper](https://arxiv.org/pdf/2203.02155.pdf) is a "properly" prompted GPT-3 model (see footnote 6 and section 3.5). Before the specified instruction/prompt by the user, a specified...

Keyboard Shortcuts

Would be nice to have a shortcut, something like Ctrl/Cmd+Enter to press Review and Submit.

website

UI/UX

beta feedback

Rlhf

(mostly) Placeholder code for RLHF. @theblackcat102 Let me what you think about the structure. I think at this point we should choose one model and start to create scripts that...

Implementation of 'use_cache' across generate calls for decoder model

When generating with decoder models, we can cache intermediate activations to avoid recomputing them. This is done by default in the `transformers` implementation when generating multiple new tokens. In our...

needs discussion

inference

OA dataset consistent splits

We need to make sure that different splits of the dataset are used for sft, reward and rl training. Basically [sft_dataset](https://github.com/LAION-AI/Open-Assistant/blob/main/model/model_training/custom_datasets/prompt_dialogue.py#L14) and [reward](https://github.com/LAION-AI/Open-Assistant/blob/main/model/reward/instructor/rank_datasets.py#L304) datasets needs to use the same splits.

Sotiris Anagnostidis

Sft gptjt qa labels

Few-shot prefix a LLM to prompt it into an instruction following mode

Keyboard Shortcuts

Rlhf

Implementation of 'use_cache' across generate calls for decoder model

OA dataset consistent splits

Instructions to reproduce training

Distributed Sampler

Model evaluation

Save last model