devika
devika copied to clipboard
Inference shouldn't have a timeout
Especially with larger local models on OLLAMA, inference might take some time. Especially on the initial loading of the model. Currently devika throws a timeout, basically rendering its useless for such a setup.
Seems like the timeout is hardcoded in src/llm/llm.py
if int(elapsed_time) == 30:
emit_agent("inference", {"type": "warning", "message": "Inference is taking longer than expected"})
if elapsed_time > 60:
raise concurrent.futures.TimeoutError
time.sleep(1)
response = future.result(timeout=60).strip()
As a quick hack to make it work you can increase those values, or you can just comment the fiirst 4 lines and remove the timeout on the last one.
Anyway, i agree. it shouldn't have a timeout, or at least it should be easly configurable to increase/disable it if needed