Lamma
Lamma
Hello, in the original K-Diffusion paper the authors report FID scores for CIFAR in the low-single-digits range (eg 1.8). However, the FID scores from this repo all give in the...
### System Info Running a standard training loop where I save the optimizer state_dict using opt.state_dict(). Upon loading using opt.load_state_dict() to resume, the model immediately NaNs after the first backprop...
In the paper FFTConv is used on sequence lengths > 8192, however this line in the cpp code has: `TORCH_CHECK(fft_size >= 16 && fft_size