Aggressor
Aggressor copied to clipboard
Ultra-minimal autoregressive diffusion model for image generation
Aggressor: Ultra-minimal autoregressive diffusion model for image and speech generation
|
CIFAR |
MNIST |
AUDIO |
|
|
|
A simplest possible implementation of Autoregressive Image Generation without Vector Quantization.
Key Features
- Simple Architecture: A tiny transformer for autoregression and an MLP for diffusion.
- Minimal Dependencies: Built from scratch using only basic MLX operations.
-
Single-File Implementation: Entire model in one Python file
aggressor.py.
Components
-
Aggressor: Main model class combining transformer and diffusion. -
Transformer: Multi-layer transformer with attention and MLP blocks. -
Denoiser: MLP-based diffusion process with time embedding. -
Scheduler: Handles forward and backward processes for diffusion.
Usage
python aggressor.py
(Training on 60000 images x 20 epochs takes approximately 7~8 minutes on 8GB M2 MacBook.)
Acknowledgements
Thanks to lucidrains' fantastic code that inspired this project. The official implementation is available here.

