ControlNet Custom resolution dataset, ex 1024 x 768?

Is it possible to train with a custom resolution dataset, ex 1024 x 768? I tried something like this, but didn't work:

accelerate launch train_controlnet.py --pretrained_model_name_or_path=$MODEL_DIR --output_dir=$OUTPUT_DIR --dataset_name=fusing/fill50k *--resolution=1024x768 * --learning_rate=1e-5 --validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" --validation_prompt "red circle with blue background" "cyan circle with brown floral background" --train_batch_size=4

thanks!

Apr 10 '23 10:04 alelordelo

the dimensions should be multiple of 64..

Apr 26 '23 15:04 engrmusawarali

the dimensions should be multiple of 64..

But the question is how to deal with the dataset in which the images' width and height are not equal, e.g., 1024 x 768. IF we resize the images to 512 x 512, then how to upscale the images to the original sizes? These are the questions I'm facing with.

Jul 03 '23 02:07 NorthSummer

Both of the dimensions of the input size must be multiple of 64, because its performs the convolutions on input and concatenates the condition to the adapter. Regarding upscaling I don't know, you can either crop the input images and resize the input images while preserving the aspect ratio.

Jul 03 '23 10:07 engrmusawarali

also see here https://github.com/lllyasviel/ControlNet/issues/365

Sep 17 '23 08:09 geroldmeisinger

So the model only accept that the input images are all the same size? My data set is guaranteed to be a multiple of 64 in width and height (obtained from the canny preprocessing codes in this repository), but an error is reported： RuntimeError: stack expects each tensor to be equal size, but got [512, 896, 3] at entry 0 and [512, 1152, 3] at entry 1

Mar 30 '24 07:03 remember00000