stablediffusion icon indicating copy to clipboard operation
stablediffusion copied to clipboard

Should call out the change in UNet model's attention heads

Open liuliu opened this issue 3 years ago • 0 comments

It is well known that in SD2, the text encoder changed, and downstream developers should take a notice and change the text encoder. But it is little known that the UNet model has changed as well. In particular, this line caused most troubles and can explain why a lot of people have problem running base model with their old code:

https://github.com/Stability-AI/stablediffusion/blob/main/configs/stable-diffusion/v2-inference.yaml#L32

Since for most implementations (SDv1 models), the multi-head attention is implemented as one matrix multiplication for many heads, the weights is unchanged and scripts can just take weights in SDv2 as is.

However, because we now fixed on number of head channels rather than number of heads, it will generate garbage values if people who ported Stable Diffusion to other platforms doesn't change their corresponding network configuration as well.

Saw a few mentions of they cannot make 512 base model work on HN and want to call it out here.

liuliu avatar Nov 25 '22 03:11 liuliu