Lamma

Results 3 issues of Lamma

Hello, in the original K-Diffusion paper the authors report FID scores for CIFAR in the low-single-digits range (eg 1.8). However, the FID scores from this repo all give in the...

### System Info Running a standard training loop where I save the optimizer state_dict using opt.state_dict(). Upon loading using opt.load_state_dict() to resume, the model immediately NaNs after the first backprop...

Bug
Medium Priority
Contributions Welcome
Optimizers

In the paper FFTConv is used on sequence lengths > 8192, however this line in the cpp code has: `TORCH_CHECK(fft_size >= 16 && fft_size