SPIN issues

Generate Result

Hi， I reproduce this project, the test result seems like paper, but generate result looks bad just like: {'real': [{'role': 'user', 'content': 'Describe the neural pathways that connect the hippocampus...

lss11005

Confused about iterations

4

Hi there, great job on the project! I'm looking to clarify whether the UCLA-AGI/zephyr-7b-sft-full-SPIN-iter1 model was fine-tuned on top of UCLA-AGI/zephyr-7b-sft-full-SPIN-iter0 or alignment-handbook/zephyr-7b-sft-full. The paper suggests that training progresses from...

junkangwu

Theoretical Analysis and Idea of SPIN are quite weird (may not make senses)??

1. At iteration 0, the p_{\theta_0}=p_{SFT} and the global optimal p_{\theta_1} after iteration 1 of following objecitve will still be p_{SFT} . Thus, the following iterations of p_{\theta} will always...

LLMforScience

SPIN == DPO in self-iteration?

6

The following part in the paper explains the difference of SPIN and DPO. It claims that DPO improve the model using **instance** level information while SPIN are on the **distribution**...

onebula

Thesis discussion: Why can the end-to-end algorithm work properly?

5

In the paper, it seems that you combined the two steps of reinforcement learning into one step, forming an end-to-end training method. The specific algorithm is shown in the figure....

nomadlx

GPU Memory question

1

Hello! Thanks for the open-sourced code release. I have been trying to run the fine-tuning with a phi-2 3B model on a 40GB A100 GPU, while running `accelerate launch spin/run_spin.py...

fangyuan-ksgk

Question about the checkpoint provided in this repo

3

Is the checkpoint provided in this repo trained with the old version of zephyr-7b-sft-full?

StarDewXXX

Unable to reproduce performance

10

- base model: `alignment-handbook/zephyr-7b-sft-full` - train data: `UCLA-AGI/SPIN_iter0` I use the default hyper-parameter to train the model, and test the model with `HuggingFaceH4/open_llm_leaderboard` locally. The result on `allenai/ai2_arc` as below:...

guozhiyao

Significant Performance Drop in GSM8k Evaluation with Updated SFT ckpt

3

Hi, Thank you for your work. We're re-evaluating experiments using an updated SFT ckpt from https://huggingface.co/alignment-handbook/zephyr-7b-sft-full and using lm-evaluation-harness v0.4.0 for evaluation. We've noticed a significant performance drop in GSM8k....

yinyueqin

Have you tried combination of SPIN, SFT, DPO

is quite interesting of this awesome work. do you ever try that 1. train with SPIN only 2. train with SFT + SPIN + DPO 3. mixture the SPIN +...

penolove

SPIN
SPIN copied to clipboard

Metadata

Generate Result

Confused about iterations

Theoretical Analysis and Idea of SPIN are quite weird (may not make senses)??

SPIN == DPO in self-iteration?

Thesis discussion: Why can the end-to-end algorithm work properly?

GPU Memory question

Question about the checkpoint provided in this repo

Unable to reproduce performance

Significant Performance Drop in GSM8k Evaluation with Updated SFT ckpt

Have you tried combination of SPIN, SFT, DPO

← Metadata

Owner

Metadata

SPIN SPIN copied to clipboard

Metadata

← Metadata

Owner

Metadata

SPIN
SPIN copied to clipboard