direct-preference-optimization
direct-preference-optimization copied to clipboard
where is config document of ipo?
It seems that the IPO's config file is missing here, which prevents the IPO from running
do DPO preference-based training
name: ipo
the temperature parameter for DPO; lower values mean we care less about
the reference model
beta: ???
the noise parameter for conservative DPO; should be in range (0, 0.5); interpreted as
the fraction of preference pairs that are flipped
eps=0 is the original DPO loss in the DPO paper
label_smoothing: 0
if true, use a uniform (maximum entropy) reference model
reference_free: false
is this config right?