OneTrainer icon indicating copy to clipboard operation
OneTrainer copied to clipboard

[Enhancement] New features about text augmentation

Open ultraman-blazar opened this issue 2 years ago • 2 comments

Hello,

I found it would be good if we can have the preview of text augmentation just like the image augmentation. And there are also some potential augmentations for captions:

  • Randomly dropping a caption chunk by a given probability. Need to have a list of strings to exclude caption chunks that user don't want to be dropped.

I also have a question about the shuffle of dataset. Does msg dataloader will shuffle the order of training data in each epoch? I think it should be normal to always shuffle the dataset.

Many thanks!

ultraman-blazar avatar Dec 01 '23 11:12 ultraman-blazar

I also have a question about the shuffle of dataset. Does msg dataloader will shuffle the order of training data in each epoch? I think it should be normal to always shuffle the dataset.

Yes, the dataset is shuffled on each epoch. The shuffling is deterministic. So if you restart training it will use the same order again, starting from the point where you stopped.

Nerogar avatar Dec 02 '23 00:12 Nerogar

I also have a question about the shuffle of dataset. Does msg dataloader will shuffle the order of training data in each epoch? I think it should be normal to always shuffle the dataset.

Yes, the dataset is shuffled on each epoch. The shuffling is deterministic. So if you restart training it will use the same order again, starting from the point where you stopped.

Thanks! That helps to prove when looking at loss curve, they are the same for exact same exp.

ultraman-blazar avatar Dec 02 '23 05:12 ultraman-blazar

@Nisekoixmy This has been mostly implemented in the merged pull request #518

O-J1 avatar Nov 10 '24 00:11 O-J1