direct-preference-optimization

direct-preference-optimization copied to clipboard

Reame
Issues

where is config document of ipo?

Open 3244we opened this issue 1 year ago • 1 comments

It seems that the IPO's config file is missing here, which prevents the IPO from running

May 07 '24 15:05 3244we

do DPO preference-based training

name: ipo

the temperature parameter for DPO; lower values mean we care less about

the reference model

beta: ???

the noise parameter for conservative DPO; should be in range (0, 0.5); interpreted as

the fraction of preference pairs that are flipped

eps=0 is the original DPO loss in the DPO paper

label_smoothing: 0

if true, use a uniform (maximum entropy) reference model

reference_free: false

is this config right?

May 07 '24 15:05 3244we