[BUG] improve the packaging of PIP in modular way

Open DarshanKumar89 opened this issue 7 months ago • 1 comments

Current Package Size Analysis

Package Structure

multimind-sdk/
├── setup.py                    # Main package configuration
├── requirements-base.txt       # Base dependencies (16 packages)
├── requirements.txt            # Full dependencies (150+ packages)
├── multimind/                  # Source code (~40 modules)
│   ├── __init__.py
│   ├── agents/
│   ├── rag/
│   ├── fine_tuning/
│   ├── compliance/
│   ├── gateway/
│   └── ... (40+ modules)
└── examples/                   # Example code

Current Installation Options

1. Basic Installation (`pip install multimind-sdk`)

# Installs: requirements-base.txt (16 packages)
# Size: ~50MB
# Dependencies:
- openai>=1.0.0
- anthropic>=0.5.0
- pydantic>=2.0.0
- python-dotenv>=1.0.0
- fastapi>=0.100.0
- python-jose[cryptography]>=3.3.0
- python-multipart>=0.0.6
- click>=8.1.0
- rich>=13.0.0
- requests>=2.26.0
- typing-extensions>=4.5.0
- pytest>=7.0.0
- pytest-asyncio>=0.21.0
- black>=23.0.0
- isort>=5.12.0
- mypy>=1.0.0
- ruff>=0.1.0

2. Full Installation (`pip install multimind-sdk[full]`)

# Installs: requirements.txt (150+ packages)
# Size: ~3GB
# Major dependencies:
- torch==2.7.0 (2GB+)
- transformers==4.52.3 (500MB+)
- accelerate==1.7.0
- peft==0.15.2
- chromadb==1.0.10
- faiss-cpu==1.11.0
- sentence-transformers==4.1.0
- numpy==2.2.6
- pandas==2.2.3
- scikit-learn==1.6.1
- scipy==1.15.3
- onnxruntime==1.22.0
- opentelemetry-api==1.33.1
- pinecone-client==6.0.0
- ... (140+ more packages)

Package Size Breakdown

Source Code Size

multimind/ directory: ~2MB
├── __init__.py: 3.2KB
├── config.py: 3.2KB
├── agents/: ~500KB
├── rag/: ~300KB
├── fine_tuning/: ~800KB
├── compliance/: ~200KB
├── gateway/: ~400KB
└── other modules: ~1MB

Dependency Size Analysis

Heavy Dependencies (>100MB each)

PyTorch (torch==2.7.0): ~2GB
- Deep learning framework
- Used for fine-tuning and model operations
- CPU version: ~800MB, GPU version: ~2GB
Transformers (transformers==4.52.3): ~500MB
- Hugging Face transformers library
- Model loading and inference
- Includes model weights and tokenizers
Accelerate (accelerate==1.7.0): ~200MB
- Hugging Face accelerate
- Distributed training support

Medium Dependencies (10-100MB each)

ChromaDB (chromadb==1.0.10): ~50MB
FAISS (faiss-cpu==1.11.0): ~40MB
Sentence Transformers (sentence-transformers==4.1.0): ~30MB
NumPy (numpy==2.2.6): ~20MB
Pandas (pandas==2.2.3): ~15MB

Light Dependencies (<10MB each)

OpenAI client: ~5MB
Anthropic client: ~3MB
FastAPI: ~8MB
Pydantic: ~2MB
Click: ~1MB
Rich: ~2MB
... (100+ more packages)

Impact on Existing Users

Current User Base

800+ downloads of multimind-sdk
Users expect current functionality to work
Cannot break backward compatibility

User Scenarios

Scenario 1: RAG-Only Users

# Current: Gets everything (3GB)
pip install multimind-sdk

# What they actually need: ~200MB
- OpenAI/Anthropic clients
- Sentence transformers
- ChromaDB/FAISS
- NumPy/scikit-learn

Scenario 2: Agent-Only Users

# Current: Gets everything (3GB)
pip install multimind-sdk

# What they actually need: ~10MB
- Click, Rich
- Async support
- Core utilities

Scenario 3: Fine-tuning Users

# Current: Gets everything (3GB)
pip install multimind-sdk

# What they actually need: ~2.5GB
- PyTorch, Transformers
- PEFT, Accelerate
- Datasets, Tokenizers

Recommendations for Existing Users

Immediate Actions (Keep Current Package)

Don't change current package - 800+ users depend on it
Keep backward compatibility - All existing installations must work
Add better documentation - Help users understand size implications

Short-term Improvements

Add feature-based extras (optional for users)
Improve documentation about package sizes
Add size warnings for large installations

Long-term Strategy

Create modular packages alongside current package
Encourage gradual migration to smaller packages
Maintain legacy support for 1+ years

User Communication Strategy

1. Size Transparency

# README.md
## Package Sizes

### Current Installation
- `pip install multimind-sdk`: ~50MB (basic)
- `pip install multimind-sdk[full]`: ~3GB (complete)

### Recommended for New Users
- RAG only: `pip install multimind-sdk[rag]` (~200MB)
- Agents only: `pip install multimind-sdk[agents]` (~10MB)
- Full AI: `pip install multimind-sdk[ai-core]` (~2.5GB)

2. Backward Compatibility

## For Existing Users

Your current installation will continue to work:
```bash
pip install multimind-sdk  # Still works!

No breaking changes will be made to the current package.


## **Conclusion**

### **Current State**
- **Basic installation**: ~50MB (reasonable)
- **Full installation**: ~3GB (very large)
- **800+ existing users**: Must maintain compatibility

### **Recommended Actions**
1. **Keep current package unchanged** (critical)
2. **Add feature-based extras** (improvement)
3. **Create modular packages** (future)
4. **Maintain backward compatibility** (long-term)

### **Benefits**
- ✅ No disruption to existing users
- ✅ Better experience for new users
- ✅ Path to true modular architecture
- ✅ Sustainable development model

Jul 06 '25 14:07 DarshanKumar89

Thanks for laying this out so clearly — the breakdown by dependency class is super helpful!

If you're exploring ways to further modularize or compress install size (especially for RAG-heavy scenarios), one additional idea is to separate reasoning layers from memory/vector layers more explicitly. I’ve been experimenting with a semantic-first approach where the model logic is delegated to a .txt-based reasoning engine, and surprisingly it trims not only dependencies but also aligns system prompt reasoning with file-based behaviors.

I tested this with FAISS + agentic workflows + Streamlit in the loop — the real optimization came from treating the semantic logic as its own “package,” letting me run meaningful queries without invoking large model weights.

If that’s something your roadmap might intersect with, I’d be happy to share a proof-of-concept or replicate the setup.

Appreciate the work here — great issue.

Jul 27 '25 02:07 onestardao

[BUG] improve the packaging of PIP in modular way

Current Package Size Analysis

Package Structure

Current Installation Options

1. Basic Installation (pip install multimind-sdk)

2. Full Installation (pip install multimind-sdk[full])

Package Size Breakdown

Source Code Size

Dependency Size Analysis

Heavy Dependencies (>100MB each)

Medium Dependencies (10-100MB each)

Light Dependencies (<10MB each)

Impact on Existing Users

Current User Base

User Scenarios

Scenario 1: RAG-Only Users

Scenario 2: Agent-Only Users

Scenario 3: Fine-tuning Users

Recommendations for Existing Users

Immediate Actions (Keep Current Package)

Short-term Improvements

Long-term Strategy

User Communication Strategy

1. Size Transparency

2. Backward Compatibility

1. Basic Installation (`pip install multimind-sdk`)

2. Full Installation (`pip install multimind-sdk[full]`)