latent-diffusion icon indicating copy to clipboard operation
latent-diffusion copied to clipboard

Constraining the output to within the borders?

Open rjp opened this issue 3 years ago • 2 comments

(Might be able to be solved as part of https://github.com/CompVis/latent-diffusion/issues/34 where e.g. transparent areas are forbidden?)

I'm generating movie posters / book covers / etc. and most of the time, the output is off the edge of the image (see attachment.)

Would be super if there was a way to hint / constrain the output - it shouldn't have seen anything cut-off like that in the training sets, I think? VQGAN-CLIP doesn't have this issue (but also isn't generating as good output in as quick a time which is why I'd prefer to use LD.)

000544_BROGUE_NATION_in_the_style_of_a_1950s_book_cover_cl1oz9co00003ucobpewjzwmd_s9 0_3x2

rjp avatar Apr 07 '22 19:04 rjp

I am not sure if this could be solved by Text + partial image prompting #34 , but I have implemented #34 and made a PR https://github.com/CompVis/latent-diffusion/pull/57

maybe you could try to enforce a writing desk, then generate a book/poster on that desk ? not sure if it works

lxj616 avatar Apr 19 '22 13:04 lxj616

@rjp You'll need to do multiple generation steps with a little masking in-between.

So create an image mask that allows for an editing space of where you want the text. Then generate a stack of text prompts for .. the text part.

Then use that image, and create a mask where you want the the frame. Text to Image a frame.

Then use that image, and create a mask for where you want the picture, probably inside that frame. Text to Image prompt that.

Hopefully it will generate images that stay within the frame.

Finally, thank @lxj616 for sharing an effective Masking technique. It really gives a massive boost to the creative process.

Instead of Mining for Art, we can now sculpt what is mined into something more refined.

Tollanador avatar Jun 06 '22 01:06 Tollanador