Huizhuo Angela Yuan comments

Results 24 comments of


                                            Huizhuo Angela Yuan

Thesis discussion: Why can the end-to-end algorithm work properly?

> In the paper, it seems that you combined the two steps of reinforcement learning into one step, forming an end-to-end training method. The specific algorithm is shown in the...

Thesis discussion: Why can the end-to-end algorithm work properly?

First of all, a monotonically decreasing and convex function $\ell$ is required in the algorithm. The value of $\ell(0) = \log(2) \approx 0.6931$. Therefore, the value of this formula is...

Confused about iterations

> Hi there, great job on the project! > > I'm looking to clarify whether the UCLA-AGI/zephyr-7b-sft-full-SPIN-iter1 model was fine-tuned on top of UCLA-AGI/zephyr-7b-sft-full-SPIN-iter0 or alignment-handbook/zephyr-7b-sft-full. The paper suggests that...

Confused about iterations

Hi @junkangwu , thanks for your follow-up question. In all iterations, the num_train_epochs parameter is set to 6. This setting is enforced explicitly in the training script. For instance, in...

Question about the checkpoint provided in this repo

Yes, and we have updated the scripts and guidelines to reproduce the results in our paper. We specified the version by adding model_revision parameter in config.yaml. Please check our recent...

Question about the checkpoint provided in this repo

> if I want to reproduce the results reported in SPIN's paper, should I use revision=ac6e600eefcce74f5e8bae1035d4f66019e93190 and train with the SPIN dataset(iter 0,1,2,3) provided in huggingface? https://github.com/uclaml/SPIN/blob/main/configs/config.yaml

Huizhuo Angela Yuan

Thesis discussion: Why can the end-to-end algorithm work properly?

Thesis discussion: Why can the end-to-end algorithm work properly?

Confused about iterations

Confused about iterations

Question about the checkpoint provided in this repo

Question about the checkpoint provided in this repo

Unable to reproduce performance

Unable to reproduce performance

SPIN == DPO in self-iteration?

GPU Memory question