SPIN icon indicating copy to clipboard operation
SPIN copied to clipboard

The official implementation of Self-Play Fine-Tuning (SPIN)

Results 24 SPIN issues
Sort by recently updated
recently updated
newest added

Hi, I reproduce this project, the test result seems like paper, but generate result looks bad just like: {'real': [{'role': 'user', 'content': 'Describe the neural pathways that connect the hippocampus...

Hi there, great job on the project! I'm looking to clarify whether the UCLA-AGI/zephyr-7b-sft-full-SPIN-iter1 model was fine-tuned on top of UCLA-AGI/zephyr-7b-sft-full-SPIN-iter0 or alignment-handbook/zephyr-7b-sft-full. The paper suggests that training progresses from...

1. At iteration 0, the p_{\theta_0}=p_{SFT} and the global optimal p_{\theta_1} after iteration 1 of following objecitve will still be p_{SFT} . Thus, the following iterations of p_{\theta} will always...

The following part in the paper explains the difference of SPIN and DPO. It claims that DPO improve the model using **instance** level information while SPIN are on the **distribution**...

In the paper, it seems that you combined the two steps of reinforcement learning into one step, forming an end-to-end training method. The specific algorithm is shown in the figure....

Hello! Thanks for the open-sourced code release. I have been trying to run the fine-tuning with a phi-2 3B model on a 40GB A100 GPU, while running `accelerate launch spin/run_spin.py...

Is the checkpoint provided in this repo trained with the old version of zephyr-7b-sft-full?

- base model: `alignment-handbook/zephyr-7b-sft-full` - train data: `UCLA-AGI/SPIN_iter0` I use the default hyper-parameter to train the model, and test the model with `HuggingFaceH4/open_llm_leaderboard` locally. The result on `allenai/ai2_arc` as below:...

Hi, Thank you for your work. We're re-evaluating experiments using an updated SFT ckpt from https://huggingface.co/alignment-handbook/zephyr-7b-sft-full and using lm-evaluation-harness v0.4.0 for evaluation. We've noticed a significant performance drop in GSM8k....

is quite interesting of this awesome work. do you ever try that 1. train with SPIN only 2. train with SFT + SPIN + DPO 3. mixture the SPIN +...