torchtune Recommendations for obtaining validation dataset loss after each epoch

For finetuning using a custom dataset, message converter function, and csv column format, how do we obtain validation losses on a separate csv with the same format at the end of each epoch? Do we need to wait until after training to run on all the checkpointed files?

Also, how can we generate outputs using the same message converter function and tune run generate, using a csv file with a single row as input?

Jun 01 '24 00:06 dcsuka

None of our recipes currently support doing a validation check each epoch. The easiest way to get this functionality would be to copy the recipe you want, duplicate _setup_data with something like _setup_val_data, call it, and then setup a for loop after the training for loop to run validation.

You can also make a request for this to be in our recipes by default, but we'd have to discuss whether it's worth the extra complexity. As for the generate recipe, it's not meant for generating from a csv, that seems more like an evaluation flow but we can discuss that too.

Jun 03 '24 16:06 pbontrager

I'm glad you brought this up because this is a common workflow (validation or generation while training) that we need to improve on. I would follow @pbontrager's suggestion of modifying our existing recipe with a validation step, but I do think this should eventually be a default recipe or an option in an existing recipe. @pbontrager maybe we should start setting up a location for community contributed recipes like this one that would be widely useful.

For generating on a single row of custom csv data, you can use utils.generate directly after loading your csv file, applying the same transforms (such as through InstructDataset or ChatDataset) and running generate on a single row of token IDs as a time. Or you can update the generate recipe locally to add this flexibility. We should consider providing more direct examples of this in our documentation or the option to run on a custom dataset (cc @ebsmothers)

Jun 03 '24 16:06 RdoubleA

@dcsuka Where you able to add the validation loss to your recipe?

Sep 25 '24 14:09 MaxFrax

I've started working on this for lora_singledevice_finetune https://github.com/MaxFrax/torchtune/tree/add_validation_set_lorasingledevice

If anybody wants to contribute or share tips, feel free to reach out

Nov 15 '24 16:11 MaxFrax

hey @MaxFrax , thats so nice! :)

We are aware of this gap in torchtune, and its something that we will focus on next quarter. We do support lm harness using eleuther AI, but our docs don't cover yet what users should do when evaluating on custom datasets.

We would love to take a look at your fork when you are done with your changes!

Nov 15 '24 16:11 felipemello1