SPPO issues

Discrepancy in rm_scores Length for Mistral-7B-SPPO Iter1 Dataset

1

I noticed [data-mistral-7b-instruct-sppo-iter1](https://huggingface.co/datasets/UCLA-AGI/data-mistral-7b-instruct-sppo-iter1) column rm_scores is a list length of 7 where as there are only 5 generated responses. data-mistral-7b-instruct-sppo-iter2 and data-mistral-7b-instruct-sppo-iter3 looks correct and both have length = 5

srzhu97

Release training dataset and loss log to help reproduce results

1

I am trying to reproduce the Mistral-7B-SPPO Iter1 model. However, after my first iteration, the model I trained diverged significantly from the published Mistral-7B-SPPO Iter1 model when comparing the results...

srzhu97

Scores and probability calcuations

4

https://github.com/uclaml/SPPO/blob/e524519cc87e9e48cd4da30588f7aa566638df4c/scripts/compute_prob.py#L39 From my understanding of the code, the score list here is the output from the `blender.rank(*, return_scores=True)` which should output the average relative score of the response in the...

namdw

Modify generate.sh to take a dynamic number of GPUs as input

1

## Issue Data generation requires exactly 8 GPUs to be present. This doesn't make the code run properly on machines with less than 8 GPUs (for instance, I am using...

aman2304

Added a new PR to allow generation on fewer than 8 GPUs

Hello authors, Great work! I added a quick PR to adapt generation to run on fewer than 8 GPUs if needed - https://github.com/uclaml/SPPO/pull/24. This is a minimally invasive change

aman2304

DPO baseline implementation

Dear authors, may I know how we can train the iterative DPO baseline model using this repo? Is there a convenient way to modify the sppo code?

yesiam-png

Is it normal the pipeline start with a huge loss ?

3

``` step 10: {'loss': 119743.8516, 'grad_norm': 938286.7284407256, 'learning_rate': 2.0161290322580643e-09, 'rewards/chosen': 0.0, 'rewards/rejected': 0.0, 'rewards/accuracies': 0.0, 'rewards/margins': 0.0, 'logps/rejected': -128.30323791503906, 'logps/chosen': -178.66146850585938, 'logits/rejected': -0.7681801915168762, 'logits/chosen': -0.792536735534668, 'epoch': 0.0} step 20: {'loss':...

qy1026

SPPO Implementation on Axolotl!

Hey guys! For who is interested, I recently submitted a pull request to implements SPPO on Axolotl trainer, you can fallow the pull request here: https://github.com/axolotl-ai-cloud/axolotl/pull/1735 Original SPPO implementation fork:...

kaykyr

What's the package configuration for reproduce SPPO-Gemma-2?

1

I found that the current repository configuration is not compatible with Gemma2. The reason might be that transformers and vllm are not fully compatible with Gemma2. Could you share the...

Jackory

Which version of vllm should be installed

4

Hi, when I follow the default steps to set up environment: pip install vllm it will automaticly install vllm 0.5.0.post1, and transformers>=4.40.0 is required. When installing SPPO ( transformers==4.36.2 are...

xinghuang2050

SPPO
SPPO copied to clipboard

Metadata

Discrepancy in rm_scores Length for Mistral-7B-SPPO Iter1 Dataset

Release training dataset and loss log to help reproduce results

Scores and probability calcuations

Modify generate.sh to take a dynamic number of GPUs as input

Added a new PR to allow generation on fewer than 8 GPUs

DPO baseline implementation

Is it normal the pipeline start with a huge loss ?

SPPO Implementation on Axolotl!

What's the package configuration for reproduce SPPO-Gemma-2?

Which version of vllm should be installed

← Metadata

Owner

Metadata

SPPO SPPO copied to clipboard

Metadata

← Metadata

Owner

Metadata

SPPO
SPPO copied to clipboard