Compiled Optimizers: Accelerate all Advanced Optimizers with pre-compilation

Open Koratahiu opened this issue 3 months ago • 0 comments

This Pull Request introduces a new boolean option, Compiled Optimizer, to all advanced optimizers, allowing the core update logic to be compiled using torch.compile (Tested on PyTorch 2.8).

By using torch.compile, we can fuse operations and optimize the computational graph, resulting in significant performance improvements in high-throughput or heavily parallel environments.

Includes: #1020 and #1064

When to use:

Using features that add noticeable overhead on the optimizer side; with torch.compile, their overhead becomes unnoticeable:
- OrthoGrad: This introduces 33% overhead for small BS.
- 1-bit Factored mode: This also introduces some overhead.
- 3-state optimizers like AdEMAMix: more states = more optimizer calculations.
Full Finetuning: Larger models might be slower in optimizer-side calculations.
Orthogonal Optimizers: Muon and AdaMuon have noticeable overhead in their orthogonalization ops. Using torch.compile should reduce this overhead.

Usage

git fetch origin pull/1083/head:compile_optm
git checkout compile_optm

Run install.bat or update.bat

TODO

[ ] Ensure backward compatibility with older backups.

Known Issues

Thanks to @dxqb for initial support and helpful insights!

Oct 28 '25 18:10 Koratahiu