SpeeD Timestep Sampling
This PR implements the timestep sampling method from: A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training.
Claims 3x faster pretraining at same quality:
Usage
- Set timestep distribution to
SPEED
⚠️ Notes
- Validated for pretraining only; finetuning impact unknown, but the core concept may apply.
TODO
- [ ] To be tested
- [ ] Minimal change: The current approach modifies _get_timestep_discrete and requires betas/sigmas, which is not ideal.
Should I expect better quality for the same steps amount for fine-tuning? Or, what should I pay attention to in order to test it?
@Koratahiu The image you included is 404.
I usually use "debiased estimation" as loss weight function. Should I set it to constant for using SpeeD?
Should I expect better quality for the same steps amount for fine-tuning? Or, what should I pay attention to in order to test it?
Yeah, if it works, then it should converge faster in the same number of steps.
I usually use "debiased estimation" as loss weight function. Should I set it to constant for using SpeeD?
The paper mentions that it’s compatible with loss weight functions (e.g., p2, min-SNR, debiased estimation, etc.), and in their official repo, they set the loss weight function to p2 by default.