Add Pixelated Butterfly for efficient sparse training

Open lewtun opened this issue 4 years ago • 0 comments

Description

Pixelated butterfly (Pixelfly) is a training technique that use a simple fixed sparsity pattern based on flat block butterfly and low-rank matrices to sparsify most network layers. From the paper:

On the ImageNet classification and WikiText-103 language modeling tasks, our sparse models train up to 2.5× faster than the dense MLP-Mixer, Vision Transformer, and GPT-2 medium with no drop in accuracy.

This seems like an interesting approach to consider for integration with optimum, and builds on prior work by @madlag using pytorch_block_sparse.

Paper: https://arxiv.org/abs/2112.00029
Twitter thread: https://twitter.com/BeidiChen/status/1469135402850082816?s=20

Dec 12 '21 16:12 lewtun