diffusers Latents / seeds are a mess. Make it easier to replicate a generated image using a seed.

Problem:

We often generate images with a batch_size >1.

However, images in the batch (after the first image) by default have an seed that is unknown to the user, so all but the first image in a batch can't be directly replicated.

To get around this, the docs suggest that we manually feed in latents.

What's a latent?? Like most devs, I'm arriving here with zero domain expertise.

But whatever, I figured it out (a latent seems to be an image of white noise, generated from a seed, which the diffuser looks at to begin dreaming up its image), and I did as I was told.

I decided it would make sense for the seeds in a batch to be sequential, so, for any given batch of images, if you specify that txt2img(prompt="astronaut riding horse", myManualSeed= 42069, batch_size=6), the second image in the batch can be replicated with the seed yourManualSeed + 1, and so on:

def getSequentialLatents(settings:DreamSettings,pipe=txt2imgPipe):
  theDevice="cuda"
  generator = torch.Generator(device=theDevice)
  batchWidth = settings.batchWidth
  width = settings.width
  height = settings.height
  latents = None
  thisSeed=settings.seed
  for _ in range(batchWidth):
    generator = generator.manual_seed(thisSeed)
    newLatent = torch.randn(
          (1, pipe.unet.in_channels, height // 8, width // 8),
          generator = generator,
          device = theDevice
      )
    latents = newLatent if latents is None else torch.cat((latents, newLatent))
    thisSeed += 1
  return latents

This was a pain to merely know the seeds that are present in a batch! This is a basic need for generating and refining images, and as such I believe this should be under the hood.

Furthermore, this hacky solution doesn't work for img2img, as "latents" can't be specified!

img2imgPipe(latents = sequentialLatents)
---------------------------------------------------------------------------
TypeError: __call__() got an unexpected keyword argument 'latents'

So, there is currently no easy way to know the seeds that make up your batch in img2img. If you want to perform more inference steps specifically on the second image in an img2img batch, you're out of luck.

Proposed solution:

Make manual_seed() by default create sequential seeds for a batch, as I have sketched out above. Make manual_seed() universally do this, for txt2img, img2img, and inpainting.

Then, if you specify a seed for a batch, you will know that the second image of a batch will be (seed+1), and so on. Simple and easy.

Sep 21 '22 22:09 exo-pla-net

For img2img and inpainting you have to provide a generator instead of latents. You can take a look on how I did it in diffusion-ui-backend.

In my case you either don't provide a seed or you need to provide a list of seeds (comma separated). Once the images are generated, it is possible to select only one image to regenerate only this one.

Sep 21 '22 22:09 leszekhanusz

@leszekhanusz Looking at your code, it appears that you're only using batch_size of 1. The problem of unknown seeds only appears when batch_size > 1.

(Why not just keep batch_size at 1? Because generating a batch of 4 images is much faster than generating images 4 times.)

Sep 21 '22 22:09 exo-pla-net

Yes, currently the only way with batch_size > 1 is to regenerate the full batch with the same seed and know the index of the image you want. The generator is used at different places in the pipelines in img2img so I'm not really sure if there is a way to modify the pipeline to be able to set the latents deterministically at those different places for a specific image in the batch.

Sep 21 '22 23:09 leszekhanusz

Hey @exo-pla-net,

Sorry you are right that we are maybe a bit too technical here. Would this google: https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines#tweak-prompts-reusing-seeds-and-latents colab help you?

We should indeed think more about how to make better docs

Sep 27 '22 09:09 patrickvonplaten

Hi @patrickvonplaten, I'm afraid I wasn't very clear about the issue.

The issue is that the latents can only be specified in txt2img, not img2img. Therefore, in img2img, the "seed" of all the images after the first, when batch_size >1, are unknowable.

Without looking under the hood to see the feasibility of this, I am proposing that when batch_size > 1, the generator should make its latents from a series of sequential seeds. So, if you specify generator.manual_seed(42), you would then be able to replicate the second image in the batch using generator.manual_seed(43).

This solution would do two things:

make all generated images in a batch directly replicable (across all pipes and all future pipes), and
make image replication easier on the user, no longer requiring the user to figure out and implement latents to replicate images in a batch.

In other words, instead of:

> pseudoPipe(generator = generator.manual_seed(42), batch_size = 3)
Image 1: Seed 42
Image 2: Seed <unknowable chaos seed>
Image 3: Seed <unknowable chaos seed>

We would have:

> pseudoPipe(generator = generator.sequential_seed(42), batch_size = 3)
Image 1: Seed 42
Image 2: Seed 43
Image 3: Seed 44

Sep 27 '22 20:09 exo-pla-net

I'm afraid that the full explanation of the current behavior is even worse than this:

Image 1: Seed 42 Image 2: Seed <unknowable chaos seed> Image 3: Seed <unknowable chaos seed>

There's a category of schedulers described as stochastic: they add noise during inference steps. DDIM is one of these https://github.com/huggingface/diffusers/blob/c16761e9d94a3374710110ba5e3087cb9f8ba906/src/diffusers/schedulers/scheduling_ddim.py#L277

The way the current implementation is written means that even if you run two jobs with the same generator and their initial latents for Image 1 are the same, the noise added to Image 1 during each step is going to be different depending on the batch size.

i.e. if generator(seed=42) returns chunks of noise A, B, C, D, then at batch sizes 1, 2, and 4, those end up at:

batch_size	1	2	4
A	image 1 step 1	image 1 step 1	image 1 step 1
B	image 1 step 2	image 2 step 1	image 2 step 1
C	image 1 step 3	image 1 step 2	image 3 step 1
D	image 1 step 4	image 2 step 2	image 4 step 1

[thoughts continued in next comment…]

Sep 28 '22 18:09 keturn

What do you do about it? You could, as exo-pla-net suggested, have a generator for each entry in the batch. But currently, schedulers are blissfully unaware of the batch width, and having to shove that loop over the generators and concatenating the results would be no fun to add to every scheduler (or other place that needs noise).

In response to that, you could write some sort of meta-generator that does that part:

class MetaGenerator:
    def __init__(self, sub_generators, channels=4, height=64, width=64):
        self.sub_generators = sub_generators
        self.chunk_size = channels * height * width
    
    def rand(self, size):
        chunks = []
        for generator in self.sub_generators:
            chunks.append(generator.rand(size % self.chunk_size)
            size -= size % self.chunk_size
            if size <= 0:  # FIXME: yell if given size is not consistent
                break      #     with len(sub_generators)
        return torch.cat(chunks)

This has the advantage of being compatible with the current pipeline and scheduler interfaces, but it also makes a lot of assumptions about how it's going to be invoked. If anything at any point anywhere during the pipeline decides to ask that generator for any amount of noise for any other reason, everything gets thrown off in some difficult-to-detect way.

(Did you remember to adjust MetaGenerator's expected chunk_size depending on whether there's a guidance function doubling the amount of input per step? Do you know every such implementation detail of every scheduler or guidance function you might choose?)

[continued…]

Sep 28 '22 19:09 keturn

My favorite idea for this so far is to use a coordinate-based noise system.

A torch.Generator is a one-dimensional function, and it has internal state that advances its position every time it's called.

A coordinate-based function would look more like this:

def noise(position, shape, seed) -> np.ndarray:

for use like

latents = noise(
    position = (0, 0, 0),
    shape = (4, height, width), 
    seed = 42,
)

to say "give me the three-dimensional box of noise(seed=42) that starts at (0, 0, 0) and is 4 layers deep, height tall and width wide."

That's an example I developed for three dimensions. A slightly different use case but it runs in to the same problems we've been discussing: if you change width with a one-dimensional noise generator, then everything gets all out of place, even if you just wanted to make things 12% wider. Or shift them to the left a bit. etc.

Adding another few dimensions — instead of (channel, height, width), using (step, batch_index, channel, height, width) — would enable us to do things like

channels = 4
noise(
    position = (  # starting from
        4, # step
        2, # batch entry
        0, 
        0, 
        0,
    ),
    shape = (
        1, # one step's worth
        3, # three consecutive batch items [2, 3, 4]
        channels, 
        width,
        height,
    ), 
    seed=42
)

Being explicit about the dimensionality and shape of the noise makes it a lot easier to reproduce later.

The major caveats being:

This would still require some changes to the way schedulers access noise functions.
While we can make a procedural noise function like this that's consistent across platforms, the seeds used for this are absolutely not going to be comparable with the seeds used by torch.Generator.

Sep 28 '22 20:09 keturn

Thanks for the thorough analysis, @keturn.

Given:

Batches are crucial for fast image generation.
Seeds are currently broken/inaccessible in batches.
Knowing seeds are crucial for exploring the seed space of a prompt and tweaking promising seeds, so batches are broken.
Hugging Face has the clout to drive the wide adoption of a better alternative.

I think it makes sense for a better noise generator to start here.

We're building the foundation that machine creativity will stand on. If we don't get this right, the foundation will be weak, user experience will be worse, and progress will be slower for the forseeable future.

I say we make a better generator and let everyone else adopt it, instead of worrying about our compatibility with something fundamentally broken.

Sep 28 '22 21:09 exo-pla-net

Throwing in another idea here. What do you think about changing the function argument:

generation: torch.Generator, ...

to

Union[List[torch.Generator], torch.Generator]

which in short means that we allow to pass a list of generators for which then one is applied for each batch dimension. We can easily check that if isinstance(generator, list)` then the length has to match the batch dimension. This should solve all reproducibility issues no?

What do you think here @pcuenca @anton-l @patil-suraj ?

Sep 29 '22 18:09 patrickvonplaten

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Oct 24 '22 15:10 github-actions[bot]

Some great ideas here that would be worth implementing.

Oct 24 '22 18:10 exo-pla-net

I support the requirement expressed by @exo-pla-net. Knowing the exact seed used to generate each image in a batch seems to be a basic/mandatory feature to provide reproducibility at image level. The sequential seed idea raised in the previous message seems good enough to me as a solution. This is exactly what is done in the Stable Diffusion WebUI.

Nov 14 '22 06:11 alexisrolland

@patrickvonplaten I like the idea of passing multiple generators, but this could be still a bit complex for users. What about a think wrapper like this

class StackedRandomGenerator:
    def __init__(self, device, seeds):
        super().__init__()
        self.generators = [torch.Generator(device).manual_seed(int(seed) % (1 << 32)) for seed in seeds]

    def randn(self, size, **kwargs):
        assert size[0] == len(self.generators)
        return torch.stack([torch.randn(size[1:], generator=gen, **kwargs) for gen in self.generators])

    def randn_like(self, input):
        return self.randn(input.shape, dtype=input.dtype, layout=input.layout, device=input.device)

    def randint(self, *args, size, **kwargs):
        assert size[0] == len(self.generators)
        return torch.stack([torch.randint(*args, size=size[1:], generator=gen, **kwargs) for gen in self.generators])

This will take multiple seeds and create multiple generators inside.

Nov 15 '22 11:11 patil-suraj

Hmm, I don't fully understand why we would need to provide a function such as randn if one can pass multiple generators directly to diffusers?

Nov 18 '22 12:11 patrickvonplaten

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Dec 12 '22 15:12 github-actions[bot]

Commenting on this issue to avoid the auto-marking to stale.

Being able to retrieve seed values for each image generated in a batch is still an important usability requirement to allow for reproducibility of results.

There does not seem to be a clear solution to do that for now (or I missed something?).

Thanks

Dec 12 '22 15:12 alexisrolland

Agree this is currently not nicely handled indeed. I'll open a PR for it :-)

Dec 15 '22 20:12 patrickvonplaten

https://github.com/huggingface/diffusers/pull/1718 should solve this. Also adding a nice doc page for it.

Dec 15 '22 23:12 patrickvonplaten

#1718 should solve this. Also adding a nice doc page for it.

Thank you! Can't wait to see if it works!

Dec 16 '22 03:12 ktncktnc