optimum
optimum copied to clipboard
Add Pixelated Butterfly for efficient sparse training
Description
Pixelated butterfly (Pixelfly) is a training technique that use a simple fixed sparsity pattern based on flat block butterfly and low-rank matrices to sparsify most network layers. From the paper:
On the ImageNet classification and WikiText-103 language modeling tasks, our sparse models train up to 2.5× faster than the dense MLP-Mixer, Vision Transformer, and GPT-2 medium with no drop in accuracy.
This seems like an interesting approach to consider for integration with optimum, and builds on prior work by @madlag using pytorch_block_sparse.
- Paper: https://arxiv.org/abs/2112.00029
- Twitter thread: https://twitter.com/BeidiChen/status/1469135402850082816?s=20