OneTrainer SpeeD Timestep Sampling

This PR implements the timestep sampling method from: A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training.

Claims 3x faster pretraining at same quality:

Usage

Set timestep distribution to SPEED

⚠️ Notes

Validated for pretraining only; finetuning impact unknown, but the core concept may apply.

TODO

[ ] To be tested
[ ] Minimal change: The current approach modifies _get_timestep_discrete and requires betas/sigmas, which is not ideal.

Nov 14 '25 05:11 Koratahiu

Should I expect better quality for the same steps amount for fine-tuning? Or, what should I pay attention to in order to test it?

Nov 14 '25 07:11 miasik

@Koratahiu The image you included is 404.

Nov 14 '25 08:11 O-J1

I usually use "debiased estimation" as loss weight function. Should I set it to constant for using SpeeD?

Nov 14 '25 08:11 miasik

Should I expect better quality for the same steps amount for fine-tuning? Or, what should I pay attention to in order to test it?

Yeah, if it works, then it should converge faster in the same number of steps.

I usually use "debiased estimation" as loss weight function. Should I set it to constant for using SpeeD?

The paper mentions that it’s compatible with loss weight functions (e.g., p2, min-SNR, debiased estimation, etc.), and in their official repo, they set the loss weight function to p2 by default.

Nov 14 '25 11:11 Koratahiu