direct-preference-optimization icon indicating copy to clipboard operation
direct-preference-optimization copied to clipboard

Was it your intention to recreate wandb tables in iterator?

Open huskydoge opened this issue 1 year ago • 0 comments

https://github.com/eric-mitchell/direct-preference-optimization/blob/f8b8c0f49dc92a430bae41585f9d467d3618fe2f/trainers.py#L297C1-L302C1

if self.config.sample_during_eval:
    all_policy_samples, all_reference_samples = [], []
    policy_text_table = wandb.Table(columns=["step", "prompt", "sample"])
    if self.config.loss.name in {'dpo', 'ipo'}:
        reference_text_table = wandb.Table(columns=["step", "prompt", "sample"])

Just make sure it's not a bug, since there is a "step" column, I suppose it should be a table recording "samples during eval" throughout the whole training procedure. However, I can only see eval_batch_size rows of policy/reference samples derived from the first eval in wandb UI, and then no updating to the table is made.


Besides, regarding updating wandb table, there is actually a bug in wandb that remains unsolved. https://github.com/eric-mitchell/direct-preference-optimization/blob/f8b8c0f49dc92a430bae41585f9d467d3618fe2f/trainers.py#L341C1-L345C1

  if self.config.sample_during_eval:
      wandb.log({"policy_samples": policy_text_table}, step=self.example_counter)
      if self.config.loss.name in {'dpo', 'ipo'}:
          wandb.log({"reference_samples": reference_text_table}, step=self.example_counter)

This won't update the table in wandb UI.

  • Here is a possible solution: https://github.com/wandb/wandb/issues/2981#issuecomment-1997445737

huskydoge avatar Apr 04 '24 15:04 huskydoge