Will release in-context data generation pipelines~?
We plan to release some data synthesization scripts, but we have to pass the internal control approval in bytedance first. Our planB is to upload a demo dataset with just a few samples(10-).
So if you are hurry about that you can go ahead to the paper, especially supplementary F. It is not hard to prototype replicates our process from the description in the paper which starts from a t2i script.😊
I read the paper and tried to reproduce the process. I have some questions. When I have two images, for me, should the input image be the entire image on the left: the ref image, or the ref image cropped after text detection?
I read the paper and tried to reproduce the process. I have some questions. When I have two images, for me, should the input image be the entire image on the left: the ref image, or the ref image cropped after text detection?
If you are trying to train a single-image conditioned S2I model, we recommend treating the entire image (either the left or right one) as the ref_img and the other as the tgt_img. This is the approach we used in our paper.
Hello, we have open-sourced the dataset (UNO-1M) used in our paper and released all instructions used in in-context data generation pipelines, and we hope these will help you.