OneTrainer [Enhancement] New features about text augmentation

Hello,

I found it would be good if we can have the preview of text augmentation just like the image augmentation. And there are also some potential augmentations for captions:

Randomly dropping a caption chunk by a given probability. Need to have a list of strings to exclude caption chunks that user don't want to be dropped.

I also have a question about the shuffle of dataset. Does msg dataloader will shuffle the order of training data in each epoch? I think it should be normal to always shuffle the dataset.

Many thanks!

Dec 01 '23 11:12 ultraman-blazar

I also have a question about the shuffle of dataset. Does msg dataloader will shuffle the order of training data in each epoch? I think it should be normal to always shuffle the dataset.

Yes, the dataset is shuffled on each epoch. The shuffling is deterministic. So if you restart training it will use the same order again, starting from the point where you stopped.

Dec 02 '23 00:12 Nerogar

I also have a question about the shuffle of dataset. Does msg dataloader will shuffle the order of training data in each epoch? I think it should be normal to always shuffle the dataset.

Yes, the dataset is shuffled on each epoch. The shuffling is deterministic. So if you restart training it will use the same order again, starting from the point where you stopped.

Thanks! That helps to prove when looking at loss curve, they are the same for exact same exp.

Dec 02 '23 05:12 ultraman-blazar

@Nisekoixmy This has been mostly implemented in the merged pull request #518

Nov 10 '24 00:11 O-J1