On the Use of v Vector in Direction Preference Alignment Method

Open LiuChen19960902 opened this issue 1 year ago • 0 comments

Hello,

Thank you for sharing your excellent paper on "Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards". I have some questions regarding the Direction Preference Alignment method, specifically about how the user preference vector ( v ) is incorporated during the model training process.

To be more specific, I would like to understand how the user preference vector ( v ) is actually integrated into the model during the training phase. From my understanding, should the attribute weights be directly concatenated onto the prompt (as you mentioned in the system prompt)?

I look forward to your response. Thanks!

Jun 05 '24 09:06 LiuChen19960902