matouk98
matouk98
Thanks a lot for your great work! I learn quite much from you code. I have a question about optim.py, line67 has a function getattr, however I couldn't find them...
Hi, hanjun. Thanks a lot for your great work! I have a question about the hierarchical Q-Learning mentioned in the paper. In equation 11, there are 2M Q functions and...
Hi, congratulations to the great work and thanks for open source! I am running step 3.2 with pair-preference-model-LLaMA3-8B. However, I encountered the warning "Some weights of LlamaForSequenceClassification were not initialized...
I have some questions about the iterative pipeline. Please correct me if my understanding is wrong, thank you so much! From the report, \pi_0 should be the SFT policy trained...