SPPO
SPPO copied to clipboard
The official implementation of Self-Play Preference Optimization (SPPO)
I noticed [data-mistral-7b-instruct-sppo-iter1](https://huggingface.co/datasets/UCLA-AGI/data-mistral-7b-instruct-sppo-iter1) column rm_scores is a list length of 7 where as there are only 5 generated responses. data-mistral-7b-instruct-sppo-iter2 and data-mistral-7b-instruct-sppo-iter3 looks correct and both have length = 5
I am trying to reproduce the Mistral-7B-SPPO Iter1 model. However, after my first iteration, the model I trained diverged significantly from the published Mistral-7B-SPPO Iter1 model when comparing the results...
https://github.com/uclaml/SPPO/blob/e524519cc87e9e48cd4da30588f7aa566638df4c/scripts/compute_prob.py#L39 From my understanding of the code, the score list here is the output from the `blender.rank(*, return_scores=True)` which should output the average relative score of the response in the...
## Issue Data generation requires exactly 8 GPUs to be present. This doesn't make the code run properly on machines with less than 8 GPUs (for instance, I am using...
Hello authors, Great work! I added a quick PR to adapt generation to run on fewer than 8 GPUs if needed - https://github.com/uclaml/SPPO/pull/24. This is a minimally invasive change
Dear authors, may I know how we can train the iterative DPO baseline model using this repo? Is there a convenient way to modify the sppo code?
``` step 10: {'loss': 119743.8516, 'grad_norm': 938286.7284407256, 'learning_rate': 2.0161290322580643e-09, 'rewards/chosen': 0.0, 'rewards/rejected': 0.0, 'rewards/accuracies': 0.0, 'rewards/margins': 0.0, 'logps/rejected': -128.30323791503906, 'logps/chosen': -178.66146850585938, 'logits/rejected': -0.7681801915168762, 'logits/chosen': -0.792536735534668, 'epoch': 0.0} step 20: {'loss':...
Hey guys! For who is interested, I recently submitted a pull request to implements SPPO on Axolotl trainer, you can fallow the pull request here: https://github.com/axolotl-ai-cloud/axolotl/pull/1735 Original SPPO implementation fork:...
I found that the current repository configuration is not compatible with Gemma2. The reason might be that transformers and vllm are not fully compatible with Gemma2. Could you share the...
Hi, when I follow the default steps to set up environment: pip install vllm it will automaticly install vllm 0.5.0.post1, and transformers>=4.40.0 is required. When installing SPPO ( transformers==4.36.2 are...