Kokoro-tts

Open TerminallyLazy opened this issue 7 months ago • 0 comments

Added Kokoro TTS support in preload and run_ui scripts.
Introduced a new API endpoint for text-to-speech synthesis.
Updated settings to include TTS enable/disable option.
Refactored speech handling to utilize a centralized speech store.
Enhanced UI with new speech button and SVG icon.
Updated dependencies in requirements.txt for Kokoro TTS.

Docker Image Size Impact

Current: ~9.17 GB
Addition: ~400-450 MB (4-5% increase)
- Kokoro model: ~350 MB (or 80 MB if quantized)
- Dependencies: ~50-100 MB

Memory Consumption Impact

Base memory: +80-150 MB when model is loaded
Active TTS: +200-400 MB during audio generation
Peak usage: +500-600 MB during heavy TTS workloads

Key Optimizations Already in Place

Lazy loading - model only loads on first TTS request
Single instance - shared pipeline across requests
Async processing - non-blocking operations
Text chunking - 300-char limits for memory efficiency

Recommendations for Further Optimization

Model quantization - reduce from 350MB to 80MB (75% savings)
Memory monitoring - track actual usage patterns
Resource limits - set container memory caps for TTS processes

The impact is quite modest - about 5% image size increase and reasonable memory overhead. The Kokoro model is specifically designed for edge devices, so it's already optimized for low resource usage.

Jun 29 '25 02:06 TerminallyLazy