llm-rs-python
llm-rs-python copied to clipboard
Is streaming supported with langchain AsyncIteratorCallbackHandler?
I am getting no reponses when using with langchain callback AsyncIteratorCallbackHandler?
It gives only this warning
RuntimeWarning: coroutine 'AsyncCallbackManagerForLLMRun.on_llm_new_token' was never awaited
run_manager.on_llm_new_token(chunk, verbose=self.verbose)
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
I don't think the current langchain wrapper support async calls, but it shouldn't be to hard to add, as the model.stream() call already unlocks the GIL internally while generating tokens. But you would have to ensure that the model is never used in parallel as that would probably create some sort of memory access problems or simply crash if you offloaded your model onto a gpu.
Do you know what function needs to be implemented by langchains LLM class to enable async processing?