agent-zero icon indicating copy to clipboard operation
agent-zero copied to clipboard

Kokoro-tts

Open TerminallyLazy opened this issue 7 months ago • 0 comments

  • Added Kokoro TTS support in preload and run_ui scripts.
  • Introduced a new API endpoint for text-to-speech synthesis.
  • Updated settings to include TTS enable/disable option.
  • Refactored speech handling to utilize a centralized speech store.
  • Enhanced UI with new speech button and SVG icon.
  • Updated dependencies in requirements.txt for Kokoro TTS.

Docker Image Size Impact

  • Current: ~9.17 GB
  • Addition: ~400-450 MB (4-5% increase)
    • Kokoro model: ~350 MB (or 80 MB if quantized)
    • Dependencies: ~50-100 MB

Memory Consumption Impact

  • Base memory: +80-150 MB when model is loaded
  • Active TTS: +200-400 MB during audio generation
  • Peak usage: +500-600 MB during heavy TTS workloads

Key Optimizations Already in Place

  1. Lazy loading - model only loads on first TTS request
  2. Single instance - shared pipeline across requests
  3. Async processing - non-blocking operations
  4. Text chunking - 300-char limits for memory efficiency

Recommendations for Further Optimization

  1. Model quantization - reduce from 350MB to 80MB (75% savings)
  2. Memory monitoring - track actual usage patterns
  3. Resource limits - set container memory caps for TTS processes

The impact is quite modest - about 5% image size increase and reasonable memory overhead. The Kokoro model is specifically designed for edge devices, so it's already optimized for low resource usage.

TerminallyLazy avatar Jun 29 '25 02:06 TerminallyLazy