srzhu97
Results
2
issues of
srzhu97
I noticed [data-mistral-7b-instruct-sppo-iter1](https://huggingface.co/datasets/UCLA-AGI/data-mistral-7b-instruct-sppo-iter1) column rm_scores is a list length of 7 where as there are only 5 generated responses. data-mistral-7b-instruct-sppo-iter2 and data-mistral-7b-instruct-sppo-iter3 looks correct and both have length = 5
I am trying to reproduce the Mistral-7B-SPPO Iter1 model. However, after my first iteration, the model I trained diverged significantly from the published Mistral-7B-SPPO Iter1 model when comparing the results...