Custom resolution dataset, ex 1024 x 768?
Is it possible to train with a custom resolution dataset, ex 1024 x 768? I tried something like this, but didn't work:
accelerate launch train_controlnet.py --pretrained_model_name_or_path=$MODEL_DIR --output_dir=$OUTPUT_DIR --dataset_name=fusing/fill50k *--resolution=1024x768 * --learning_rate=1e-5 --validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" --validation_prompt "red circle with blue background" "cyan circle with brown floral background" --train_batch_size=4
thanks!
the dimensions should be multiple of 64..
the dimensions should be multiple of 64..
But the question is how to deal with the dataset in which the images' width and height are not equal, e.g., 1024 x 768. IF we resize the images to 512 x 512, then how to upscale the images to the original sizes? These are the questions I'm facing with.
Both of the dimensions of the input size must be multiple of 64, because its performs the convolutions on input and concatenates the condition to the adapter. Regarding upscaling I don't know, you can either crop the input images and resize the input images while preserving the aspect ratio.
also see here https://github.com/lllyasviel/ControlNet/issues/365
So the model only accept that the input images are all the same size? My data set is guaranteed to be a multiple of 64 in width and height (obtained from the canny preprocessing codes in this repository), but an error is reported: RuntimeError: stack expects each tensor to be equal size, but got [512, 896, 3] at entry 0 and [512, 1152, 3] at entry 1