Genesis

License

A high-performance deep learning framework with educational clarity

📚 Documentation | 🚀 Quick Start | 📊 Performance

Overview

Genesis is a modern deep learning framework built from scratch, combining production-level performance with educational transparency. Featuring Triton-optimized kernels, automatic differentiation, and comprehensive neural network modules, Genesis serves both as a learning resource and a practical training framework.

Key Features

Core Capabilities

🔥 High Performance: Triton-optimized GPU kernels achieving near-native performance
⚡ Automatic Differentiation: Dynamic computational graph with full gradient support
🧠 Neural Networks: Complete module library including transformers and attention mechanisms
🎯 Mixed Precision: AMP support with FP16/BF16 training
🚀 Distributed Training: Multi-GPU training with NCCL backend
📦 Model Support: Built-in LLM implementations (Qwen) with training pipelines

Technical Highlights

Modular backend system with clean CPU/CUDA separation
Advanced CUDA memory management with pooling and statistics
Unified operation dispatch routing to optimal implementations
Complete optimizer suite (Adam, AdamW, SGD) with schedulers
Production-ready training pipeline with checkpointing

Performance

Genesis delivers competitive performance through hand-optimized Triton kernels:

Operation	Efficiency vs Reference
Matrix Multiplication	~95%
Softmax	~112%
LayerNorm	~120%
Multi-Head Attention	~97%

Benchmarked on NVIDIA A100 GPU

Quick Start

Installation

# Clone repository
git clone https://github.com/phonism/genesis.git
cd genesis

# Install (CPU only)
pip install -e .

# Install with LLM support
pip install -e ".[llm]"

# Verify installation
python -c "import genesis; print(genesis.__version__)"

Basic Usage

import genesis
import genesis.nn as nn
import genesis.optim as optim

# Define model
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 256)
        self.fc2 = nn.Linear(256, 10)
        self.dropout = nn.Dropout(0.2)

    def forward(self, x):
        x = self.fc1(x).relu()
        x = self.dropout(x)
        return self.fc2(x)

# Training setup
model = Net()
optimizer = optim.AdamW(model.parameters(), lr=1e-3)
criterion = nn.CrossEntropyLoss()

# Training loop
for data, target in dataloader:
    output = model(data)
    loss = criterion(output, target)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Mixed Precision Training

from genesis.cuda import amp

scaler = amp.GradScaler()

for data, target in dataloader:
    with amp.autocast():
        output = model(data)
        loss = criterion(output, target)

    scaler.scale(loss).backward()
    scaler.step(optimizer)
    scaler.update()
    optimizer.zero_grad()

Distributed Training

# Single command for multi-GPU training
torchrun --nproc_per_node=4 train.py

import genesis.distributed as dist

# Initialize
dist.init_process_group(backend='nccl')

# Wrap model
from genesis.distributed import DistributedDataParallel as DDP
model = DDP(model)

# Train normally - gradients synchronized automatically

Architecture

genesis/
├── tensor.py              # Core tensor with autograd
├── function.py            # Autodiff functions
├── backends/              # CPU/CUDA implementations
│   ├── cpu.py
│   ├── cuda.py
│   └── cuda_memory.py
├── ops/                   # Operation dispatch
├── nn/                    # Neural network modules
│   ├── modules/          # Layer implementations
│   ├── functional.py     # Functional operations
│   └── triton_ops/       # Optimized kernels
├── optim/                # Optimizers
├── distributed/          # Multi-GPU support
└── cuda/                 # CUDA utilities & AMP

Examples

Train Qwen LLM

cd apps/llm
python train_sft_qwen.py --amp --dtype fp16

Interactive Chat

cd apps/llm
python chat_qwen.py --checkpoint model.pth

Benchmarks

python benchmark/bench_matmul.py
python benchmark/bench_qwen_training.py

Documentation

Getting Started Guide
API Reference
Architecture Overview
Performance Tuning

Testing

# Run test suite
pytest tests/ -v

# With coverage
pytest tests/ --cov=genesis --cov-report=html

Contributing

We welcome contributions! Genesis is designed to be hackable and educational.

# Development setup
pip install -e ".[dev]"
black genesis/ && isort genesis/
pytest tests/

See CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE for details.

Acknowledgments

Genesis builds on ideas from PyTorch, Triton, TinyGrad, and JAX. We thank these projects for their inspiration and the deep learning community for their support.

Built for deep learning researchers and practitioners

⭐ Star us on GitHub if you find Genesis useful!

genesis
genesis copied to clipboard

Metadata

Genesis

Overview

Key Features

Performance

Quick Start

Installation

Basic Usage

Mixed Precision Training

Distributed Training

Architecture

Examples

Documentation

Testing

Contributing

License

Acknowledgments

← Metadata

Owner

Metadata

genesis genesis copied to clipboard

Metadata

Genesis

Overview

Key Features

Performance

Quick Start

Installation

Basic Usage

Mixed Precision Training

Distributed Training

Architecture

Examples

Documentation

Testing

Contributing

License

Acknowledgments

← Metadata

Owner

Metadata

genesis
genesis copied to clipboard