Aggressor: Ultra-minimal autoregressive diffusion model for image and speech generation

CIFAR	MNIST	AUDIO
		audio

Key Features

Simple Architecture: A tiny transformer for autoregression and an MLP for diffusion.
Minimal Dependencies: Built from scratch using only basic MLX operations.
Single-File Implementation: Entire model in one Python file aggressor.py.

python aggressor.py

(Training on 60000 images x 20 epochs takes approximately 7~8 minutes on 8GB M2 MacBook.)

Thanks to lucidrains' fantastic code that inspired this project. The official implementation is available here.