DeepFX-Studio
DeepFX-Studio copied to clipboard
DeepFX Studio represents a comprehensive platform that bridges cutting-edge computer vision research with practical deployment.
Advanced Computer Vision Platform for AI-Powered Image Processing
Comprehensive reproduction of state-of-the-art neural network architectures for practical deployment
DeepFX Studio represents a comprehensive platform that bridges cutting-edge computer vision research with practical deployment. Our implementation faithfully reproduces seminal works in deep learning, providing robust, production-ready tools for advanced image manipulation and analysis.
ā If you find this project helpful, please consider giving it a star! Your support helps us continue developing cutting-edge AI tools and motivates us to keep improving the platform.
š„ Development Team
| Role | Contributor | Primary Responsibilities |
|---|---|---|
| Lead Developer & DL Engineer | XBastille | Model Implementation, Training Pipeline Development, Research Reproduction |
| Full-Stack Engineer | Abhinab Choudhary | System Architecture, Backend Infrastructure, API Development |
| Frontend Developer | Soap-mac | User Interface Design, Frontend Implementation, UX Development |
š¬ Research Reproductions & Model Implementations
Our platform reproduces state-of-the-art computer vision models from peer-reviewed research, implementing them with careful attention to architectural details and training procedures.
šØ Image Colorization
- Project: DeOldify (Open-Source)
- Original Author: Jason Antic
- Reference Implementation: Based on official DeOldify GitHub
- Description: Self-Attention Generative Adversarial Network (GAN) for colorizing and restoring old images, with the NoGAN approach for improved training stability.
-
Training Strategy (Approximate, Typical Setup):
- Dataset: ImageNet + historical photo collections (~100K+ images)
- Batch Size: Often 16ā32 (per GPU)
- Learning Rate: Commonly 1e-4, cosine annealing scheduling
- Loss Functions: Perceptual loss (VGG), L1, and feature matching loss
- Optimizer: Adam (β1=0.5, β2=0.999)
- Progression: Image size scales through 64Ć64 ā 256Ć256 ā 512Ć512
- Augmentations: Flips, rotations, color jittering
- Checkpoints: Models saved regularly based on validation loss
Note: Training details are provided for context and may vary depending on resources and dataset size. Our implementation aims to closely follow the official DeOldify pipeline for reproduction and deployment, using available open-source checkpoints where suitable.
-
Module:
ai_colorization/
š Real-World Super Resolution
- Paper: "Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data" (ICCVW 2021)
- Authors: Xintao Wang, Liangbin Xie, Chao Dong, Ying Shan
- arXiv: 2107.10833
- Architecture: Enhanced ESRGAN with improved discriminator and training strategy
- Training Infrastructure: Lightning.ai A100 (40GB) Ć 4 GPUs, distributed training
-
Training Details:
- Duration: 120 hours with progressive scaling stages
- Dataset: DIV2K, Flickr2K, OST (300K+ high-resolution images)
- Batch Size: 32 per GPU (128 total across 4 GPUs)
- Learning Rate: 2e-4 with multi-step decay [50k, 100k, 200k, 300k iterations]
- Generator Loss: L1 + Perceptual (VGG) + GAN loss
- Discriminator: U-Net discriminator with spectral normalization
- Optimizer: Adam for both G and D
-
Training Strategy:
- Stage 1: 2Ć upscaling (40 hours)
- Stage 2: 4Ć upscaling (40 hours)
- Stage 3: 8Ć upscaling (40 hours)
- Degradation Model: Complex blur kernels + noise + JPEG compression
- EMA: Exponential moving average with decay 0.999
-
Module:
ai_image_upscale/
šÆ Salient Object Detection & Background Removal
- Paper: "U²-Net: Going Deeper with Nested U-Structure for Salient Object Detection" (Pattern Recognition 2020)
- Authors: Xuebin Qin, Zichen Zhang, Chenyang Huang, Masood Dehghan, Osmar R. Zaiane, Martin Jagersand
- DOI: 10.1016/j.patcog.2020.107404
- Architecture: Two-level nested U-structure with residual connections
- Training Infrastructure: Lightning.ai A100 (40GB) Ć 1 GPU
-
Training Details:
- Duration: 48 hours continuous training
- Dataset: DUTS-TR (10,553), DUT-OMRON (5,168), ECSSD (1,000) combined
- Batch Size: 32 images per batch
- Input Resolution: 320Ć320 pixels
- Learning Rate: 1e-3 with polynomial decay (power=0.9)
- Loss Function: Hybrid loss (BCE + IoU + SSIM)
- Optimizer: SGD with momentum 0.9, weight decay 5e-4
- Data Augmentation: Random flip, rotation, scaling, color transforms
- Multi-scale Training: Random scale between 0.75-1.25
- Deep Supervision: Loss computed at 6 different scales
- Validation: Evaluated every 2000 iterations on held-out set
-
Module:
background_remover/
āļø Image Inpainting
- Primary Integration: HuggingFace Flux Inpainting by Alibaba (Alimama)
- Model: "black-forest-labs/FLUX.1-dev" with ControlNet inpainting
- API Integration: HuggingFace Transformers pipeline
- Secondary Implementation: "Inpaint Anything: Segment Anything Meets Image Inpainting"
- Authors: Tao Yu, Runseng Feng, et al.
- arXiv: 2304.06790
-
Architecture:
- Primary: FLUX.1-dev diffusion model with inpainting ControlNet
- Secondary: SAM (Segment Anything) + LaMa (Large Mask Inpainting)
-
Implementation Details:
- Flux Pipeline: Direct API calls to HuggingFace inference endpoints
- Mask Generation: Automated via SAM or manual user input
- Resolution: Up to 1024Ć1024 for Flux, 512Ć512 for local pipeline
- Inference Time: 10-15 seconds depending on image size and complexity
-
Module:
ai_image_editor/
š Photo-to-Anime Translation
- Paper: "AnimeGAN: A Novel Lightweight GAN for Photo Animation" & "AnimeGANv3"
- Authors: Jie Chen, Gang Liu, Xin Chen
- Architecture: Lightweight generative adversarial network with anime-specific losses
- Training Infrastructure: Lightning.ai A100 Ć 2 GPUs
-
Training Details:
- Duration: 60 hours total (30 hours per stage)
-
Dataset:
- Photo dataset: Places365 subset (50K natural images)
- Anime dataset: High-quality anime artwork collection (6K images)
- Batch Size: 24 images per batch (12 per GPU)
- Input Resolution: 256Ć256 pixels
-
Stage 1 - Initialization:
- Duration: 30 hours
- Loss: Content loss only (VGG perceptual loss)
- Learning Rate: 2e-4 with linear decay
-
Stage 2 - Adversarial Training:
- Duration: 30 hours
- Loss: Content + Adversarial + Color loss + Grayscale style loss
- Learning Rate: 2e-5 for generator, 2e-4 for discriminator
- Discriminator Updates: 1 generator : 1 discriminator update ratio
- Optimizer: Adam (β1=0.5, β2=0.999) for both networks
- Color Loss Weight: λ_color = 10.0
- Content Loss Weight: λ_content = 1.5
- Style Loss Weight: λ_gray = 3.0
-
Module:
ai_filter/
šŖ Neural Style Transfer
- Foundational Paper: "A Neural Algorithm of Artistic Style" (arXiv 2015)
- Authors: Leon A. Gatys, Alexander S. Ecker, Matthias Bethge
- arXiv: 1508.06576
- Architecture: Original optimization-based approach using VGG-19 feature extractor
- Training Infrastructure: Lightning.ai A100 Ć 1 GPU
-
Implementation Details:
- Method: Direct optimization following original Gatys et al. algorithm
- Feature Extractor: Pre-trained VGG-19 ConvNet (ImageNet weights)
- Content Representation: VGG-19 feature maps from deeper layers
- Style Representation: Gram matrices of VGG-19 feature maps across multiple layers
- Loss Function: Weighted combination of content loss + style loss
- Content Loss: Squared Euclidean distance between feature representations
- Style Loss: Squared Euclidean distance between Gram matrices
- Optimization: Adam optimizer (lr=0.02, β1=0.99, ε=1e-1)
- Loss Weights: α=10 (content), β=40 (style)
-
Processing Details:
- Input Resolution: 400Ć400 pixels
- Optimization Steps: Variable iterations until convergence
- Processing Time: 30-60 seconds per image depending on quality settings
- Methodology: Content image + Style image ā Optimized stylized output
- Hybrid Approach: Primary optimization method with TensorFlow Hub fallback for speed
-
Module:
artistic_image_creator/
š¼ļø Text-to-Image Synthesis
- Primary Model: Stable Diffusion 3.5 Large via HuggingFace Diffusers
- Implementation: HuggingFace Transformers pipeline
- Architecture: Multimodal Diffusion Transformer (MMDiT) with CLIP text encoder
- Pipeline: "stabilityai/stable-diffusion-3.5-large"
-
Features:
- Resolution: Up to 1024Ć1024 pixels
- Guidance Scale: Configurable classifier-free guidance
- Inference Steps: 20-50 steps for optimal quality
- Batch Generation: Multiple images per prompt
- Seed Control: Reproducible generation with optional randomization
-
Module:
ai_text_to_image_generator/
š Todos
- [x] DeOldify Image Colorization - Self-attention GAN with NoGAN training
- [x] Real-ESRGAN Super Resolution - Enhanced ESRGAN with pure synthetic data training
- [x] U²-Net Background Removal - Nested U-structure for salient object detection
- [x] FLUX Image Inpainting - Advanced inpainting with ControlNet integration
- [x] AnimeGAN Photo Translation - Lightweight GAN for photo-to-anime conversion
- [x] Neural Style Transfer - Gatys algorithm with VGG-19 optimization
- [x] Stable Diffusion 3.5 - Text-to-image generation with HuggingFace integration
- [x] Django Web Platform - Complete web interface with user authentication
- [x] Lightning.ai Training - A100 GPU cluster training infrastructure
- [x] Azure Deployment - Live production deployment
- [X] NVIDIA Docker Support - GPU-accelerated containerization for better performance and to use GPU based services.
š Support Our Work
If you appreciate our efforts in building this project, your support would mean the world to us!
Your support directly contributes to the development of cutting-edge computer vision tools and helps keep this project free and open-source for everyone!
š Future Vision
AI-Powered Canvas Editor
We envision expanding DeepFX Studio into a comprehensive AI-Enhanced Canvas Editor - a unified creative workspace that combines all our AI tools with intuitive manual editing capabilities.
Planned Features:
- Unified Canvas Interface: A clean, blank workspace where users can create, edit, and combine multiple images seamlessly
- Integrated AI Toolkit: All 7 existing AI tools (colorization, upscaling, background removal, inpainting, style transfer, filters, text-to-image) accessible directly within the editor
- Manual Editing Tools: Essential editing capabilities including cropping, resizing, positioning, layering, and basic adjustments
- Smart Workflow: Upload existing images or generate new ones with text-to-image, then apply any combination of AI transformations and manual edits
- Multi-Image Projects: Work with multiple images simultaneously on a single canvas, applying different AI effects to individual elements
- One-Click Export: Save the entire canvas composition as a single final image
How it works: Users open the AI Editor mode to find a blank canvas with tool panels. They can either upload images or generate them using text-to-image, then freely edit using manual tools (crop, zoom, position) and apply AI effects (change art style, remove backgrounds, enhance quality). The final composition gets saved as one cohesive image.
This represents our vision for democratizing advanced image editing by combining the power of AI with user-friendly creative tools.
Development Roadmap: The implementation of these features depends on community support and project popularity. With sufficient backing through community engagement, we can dedicate the resources needed to make this vision a reality.
šø Showcase
YouTube Video Demo
Screenshots




š Quick Start
Prerequisites & Installation
For detailed setup instructions, please refer to our comprehensive guides:
- š Installation Guide: Docker setup and development environment configuration
- š ļø Setup Guide: Complete setup instructions with model placement
Ready to get started? Follow our step-by-step installation guides for a smooth setup experience! š
šļø System Architecture
DeepFX-Studio/
āāā .github/ # GitHub workflows and CI
ā āāā workflows/
ā āāā azure-deploy.yml # Azure App Service CI/CD workflow
āāā ai_colorization/ # DeOldify Implementation
āāā ai_image_upscale/ # Real-ESRGAN Super-Resolution
āāā background_remover/ # U²-Net Salient Object Detection
āāā ai_image_editor/ # Flux Inpainting + SAM Integration
ā āāā models/
ā ā āāā apply_fill.py # Inpainting application logic
ā ā āāā apply_removal.py # Object removal workflows
ā ā āāā apply_replace.py # Object replacement pipelines
ā ā āāā controlnet_flux.py # Flux ControlNet integration
ā ā āāā generate_masks.py # Mask generation utilities
ā ā āāā lama_inpaint.py # LaMa inpainting fallbackx
ā ā āāā pipeline_flux_controlnet_inpaint.py # Main Flux pipeline
ā ā āāā sam_segment.py # SAM segmentation
ā ā āāā transformer_flux.py # Flux transformer models
āāā ai_filter/ # AnimeGANv3 Implementation
āāā artistic_image_creator/ # Neural Style Transfer
āāā ai_text_to_image_generator/ # Stable Diffusion 3.5 API
āāā dashboard/ # User Dashboard & Analytics
āāā website/ # Landing & Information Pages
āāā user_auth/ # Django Allauth Integration
āāā components/ # Reusable UI Components
āāā static/ # Frontend Assets (TailwindCSS)
āāā templates/ # HTML Templates (All Apps)
āāā deepfx_studio/ # Main Django Project Configuration
āāā INSTALLATION.md # Detailed Installation Guide
āāā SETUP.md # Development Setup Guide
āāā Dockerfile # Docker Configuration
Training Infrastructure Details
Lightning.ai A100 Cluster Configuration
- Hardware: NVIDIA A100 (40GB) GPUs
- Cluster Setup: Multi-node distributed training capability
- Memory: 100GB+ system RAM across nodes
Comprehensive Training Summary
| Model | GPUs | Training Time | Dataset Size | Memory/GPU | Key Training Details |
|---|---|---|---|---|---|
| DeOldify | A100 Ć 2 | 72 hours | 100K+ images | 35GB | Progressive training 64ā256ā512px |
| Real-ESRGAN | A100 Ć 4 | 120 hours | 300K+ images | 38GB | Multi-stage 2Ćā4Ćā8Ć upscaling |
| U²-Net | A100 à 1 | 48 hours | 16K+ images | 28GB | Multi-scale deep supervision |
| AnimeGANv3 | A100 Ć 2 | 60 hours | 56K+ images | 32GB | Two-stage adversarial training |
| NST Implementation | A100 Ć 1 | Per-image optimization | Custom content+style pairs | 25GB | Gatys algorithm with VGG-19 features |
š§ Technology Stack & Integrations
Core Framework
| Deep Learning | Computer Vision | Web Framework | ML Platform |
|---|---|---|---|
| PyTorch 2.0+ | OpenCV 4.7+ | Django 4.2+ | HuggingFace Hub |
| Lightning.ai | Pillow-SIMD | TailwindCSS 3.3+ | HuggingFace Spaces |
| Transformers 4.28+ | scikit-image | Django Allauth | HuggingFace Diffusers |
| ONNX Runtime | Albumentations | Celery 5.2+ | Lightning AI Platform |
HuggingFace Integration Features
- Flux Inpainting: State-of-the-art inpainting via black-forest-labs/FLUX.1-dev
- Model Hub: Access to pre-trained checkpoints and fine-tuned variants
- Transformers Pipeline: Streamlined model loading and inference
- Diffusers Integration: Advanced text-to-image and image-to-image pipelines
- API Endpoints: Direct integration with HuggingFace inference API
š Documentation & Resources
Comprehensive Documentation Suite
- š Installation Guide: Complete setup with model placement diagrams
- š ļø Setup Guide: Docker configuration and development environment
- š Training Logs: Detailed training curves and hyperparameter configurations
- š Model Cards: Individual documentation for each implemented model
š Issue Reporting
Found a bug or have a feature request? We'd love to hear from you!
- Report Issues: GitHub Issues
š Attribution
Original Paper Attributions
We gratefully acknowledge the original authors of all reproduced papers:
- DeOldify by Jason Antic et al.
- Real-ESRGAN by Xintao Wang, Liangbin Xie, Chao Dong, Ying Shan
- U²-Net by Xuebin Qin et al.
- AnimeGANv3 by Jie Chen, Gang Liu, Xin Chen
- Neural Style Transfer by Leon A. Gatys, Alexander S. Ecker, Matthias Bethge
- FLUX.1 by Black Forest Labs
- Segment Anything by Meta AI
š Open Source Computer Vision Platform
Faithful reproduction of state-of-the-art research with practical deployment
Development Team: XBastille (Lead) ⢠Abhinab Choudhary (Full-Stack) ⢠Soap-mac (Frontend)
Training Infrastructure: Lightning.ai A100 GPU Cluster
Integration Platform: HuggingFace Hub & APIs
Quick Links: Installation ⢠Setup
ā¬ļø Back to Top
