segyges

Results 2 comments of segyges

This needs to cook more, apparently protobuf is actually used, I think we probably want to change the version of flashattention.

If dataset processing was repeated between the runs it's possible the train/val split is different; if any of the data in the validation set was previously in the training set...