Aggressor icon indicating copy to clipboard operation
Aggressor copied to clipboard

Ultra-minimal autoregressive diffusion model for image generation

Aggressor: Ultra-minimal autoregressive diffusion model for image and speech generation

CIFAR

MNIST

AUDIO

cifar

mnist

audio

A simplest possible implementation of Autoregressive Image Generation without Vector Quantization.

Key Features

  • Simple Architecture: A tiny transformer for autoregression and an MLP for diffusion.
  • Minimal Dependencies: Built from scratch using only basic MLX operations.
  • Single-File Implementation: Entire model in one Python file aggressor.py.

Components

  • Aggressor: Main model class combining transformer and diffusion.
  • Transformer: Multi-layer transformer with attention and MLP blocks.
  • Denoiser: MLP-based diffusion process with time embedding.
  • Scheduler: Handles forward and backward processes for diffusion.

Usage

python aggressor.py

(Training on 60000 images x 20 epochs takes approximately 7~8 minutes on 8GB M2 MacBook.)

Acknowledgements

Thanks to lucidrains' fantastic code that inspired this project. The official implementation is available here.