Was it your intention to recreate wandb tables in iterator?
https://github.com/eric-mitchell/direct-preference-optimization/blob/f8b8c0f49dc92a430bae41585f9d467d3618fe2f/trainers.py#L297C1-L302C1
if self.config.sample_during_eval:
all_policy_samples, all_reference_samples = [], []
policy_text_table = wandb.Table(columns=["step", "prompt", "sample"])
if self.config.loss.name in {'dpo', 'ipo'}:
reference_text_table = wandb.Table(columns=["step", "prompt", "sample"])
Just make sure it's not a bug, since there is a "step" column, I suppose it should be a table recording "samples during eval" throughout the whole training procedure. However, I can only see eval_batch_size rows of policy/reference samples derived from the first eval in wandb UI, and then no updating to the table is made.
Besides, regarding updating wandb table, there is actually a bug in wandb that remains unsolved. https://github.com/eric-mitchell/direct-preference-optimization/blob/f8b8c0f49dc92a430bae41585f9d467d3618fe2f/trainers.py#L341C1-L345C1
if self.config.sample_during_eval:
wandb.log({"policy_samples": policy_text_table}, step=self.example_counter)
if self.config.loss.name in {'dpo', 'ipo'}:
wandb.log({"reference_samples": reference_text_table}, step=self.example_counter)
This won't update the table in wandb UI.
- Here is a possible solution: https://github.com/wandb/wandb/issues/2981#issuecomment-1997445737