agent-zero
agent-zero copied to clipboard
Kokoro-tts
- Added Kokoro TTS support in preload and run_ui scripts.
- Introduced a new API endpoint for text-to-speech synthesis.
- Updated settings to include TTS enable/disable option.
- Refactored speech handling to utilize a centralized speech store.
- Enhanced UI with new speech button and SVG icon.
- Updated dependencies in requirements.txt for Kokoro TTS.
Docker Image Size Impact
- Current: ~9.17 GB
- Addition: ~400-450 MB (4-5% increase)
- Kokoro model: ~350 MB (or 80 MB if quantized)
- Dependencies: ~50-100 MB
Memory Consumption Impact
- Base memory: +80-150 MB when model is loaded
- Active TTS: +200-400 MB during audio generation
- Peak usage: +500-600 MB during heavy TTS workloads
Key Optimizations Already in Place
- Lazy loading - model only loads on first TTS request
- Single instance - shared pipeline across requests
- Async processing - non-blocking operations
- Text chunking - 300-char limits for memory efficiency
Recommendations for Further Optimization
- Model quantization - reduce from 350MB to 80MB (75% savings)
- Memory monitoring - track actual usage patterns
- Resource limits - set container memory caps for TTS processes
The impact is quite modest - about 5% image size increase and reasonable memory overhead. The Kokoro model is specifically designed for edge devices, so it's already optimized for low resource usage.