OneTrainer icon indicating copy to clipboard operation
OneTrainer copied to clipboard

[Feat] Clearer handling of cropping and resolutions

Open MartinoCesaratto opened this issue 1 year ago • 7 comments

Describe your use-case.

Right now the quick start guide suggests that I shouldn't really bother about resizing my dataset images, but it will be handled by Onetrainer if I activate resolution bucketing, but I noticed that when selecting multiple training resolution, if I set batch size to 1 it uses all samples, but if I set it to 2 the number of steps is less than half, so some image is not used anymore.

What's not really clear is what happens, let's make an example:

  • I set training resolutions to 512, 640, 768, 960.
  • I have a 639*641 image, is it always cropped to 512x640, or sometimes to 512x512?
  • I have a 256x320 image, is it upscaled to 512x640 or can sometimes end up at 768x960?

I also noticed that even with crop jitter enabled the preview is static, if I have a 1024*512 image do I get crops of image[0:960,0:512] and [64:1024, 0:512] or the crops are always centered? Will it sometimes be cropped to resolutions different from 960x512?

What would you like to see as a solution?

I have 5 proposals to improve both clarity and training:

  1. Use all images option: when batch size > 1, always try to have batch_size images for every resolution even if it involves using crops with less coverage of the original images
  2. correclty show crop jitter's effect in the preview (assuming righ now it only shows a centered square crop and not what's actually used)
  3. vary scaling option: if possible, also uses samples downscaled to lower resolutions, not only maximum one
  4. when using samples below a set resolution (even if upscaled), optionally add a set tag (for example "low resolution, low quality") to the prompt, same when above certain resolution (for example "high resolution")
  5. allow to set both horizontal and vertical resolution, so that i can set something like "384, 512x512, 768" and have as a set of allowed resolutions "384x384, 384x768, 512x512, 768x768, 768x384"

Have you considered alternatives? List them here.

right now I can probably have multiple copies of each image with different resolutions/aspect ratios/cropping, but would require a lot of them to truly cover each possible crop of each image

MartinoCesaratto avatar Apr 26 '24 13:04 MartinoCesaratto

I really can't trust any of the scripts when it comes to bucketing and auto resize

I wish there was a button that process images and save exactly as that would be used during training

So if bucketing enabled they could be save under bucketing res folders

FurkanGozukara avatar Apr 26 '24 14:04 FurkanGozukara

The main reason I ask this is that I noticed that training with multiple resolutions slightly improves quality, but on some datasets it seems to overfit to a subset of the images at each available resolution, so I'd like to have each image used at multiple scales to prevent this

MartinoCesaratto avatar Apr 26 '24 14:04 MartinoCesaratto

I have had similar questions about how scaling works

dathide avatar May 19 '24 21:05 dathide

What I can add is that if you add a number of repeats >1 to your concept setting and enable crop/jitter you actually seem to get multiple versions of the same picture(s) => judging on the amount of items cached. This also makes sense from the standpoint that after each epoch, a full run of all training data was performed. Hence my guess is, that for each repeat a crop/jitter "instance" of each image is created and each of these "instances" gets used during one epoch.

gilga2024 avatar May 22 '24 18:05 gilga2024

I was wondering if having an option to save all images+captions arranged in buckets as they were sent to torch might be of help here.

I mean I'd like to run a training and after to have a directory with subdirectory per epoch, each with subdirectories for each step inside the epoch, each step directory with a copy of all images in that step's batch + prompt file for each image in that batch, exactly how they were when that batch was sent to torch.

The thought here is it would help see how OT changes images and prompts based on concept parameter settings, and how it groups the concept images into batches.

dkalintsev avatar Nov 10 '24 01:11 dkalintsev

I would prefer SD-Scripts-style duplication over dropping input data personally. I rather have all the information from the dataset in the training than having it discarded in any case. Maybe offer an option to choose the strategy used to the users, so everyone is happy :)

DanPli avatar Nov 24 '24 22:11 DanPli