segyges
Results
2
comments of
segyges
This needs to cook more, apparently protobuf is actually used, I think we probably want to change the version of flashattention.
If dataset processing was repeated between the runs it's possible the train/val split is different; if any of the data in the validation set was previously in the training set...