guidance Support for Async

First off thank you for this project! It has been very useful to us.

I want to share that I'm a bit confused about the async_mode, I use openai and so I thought it was support for acreate completion method. I did some time benchmark between the two async modes and the performance difference is stark.

Here's the benchmark.

RAW openai:

@timer_func
async def run_openai():
    tasks = [openai.ChatCompletion.acreate(model="gpt-3.5-turbo", temperature=0, messages=[{"role": "user", "content": "The best thing about the oce
    an is"}]) for i in range(10)]
    chat_completion = await asyncio.gather(*tasks)
    return [chat.choices[0].message.content for chat in chat_completion]

Results: run_openai took 15.321205854415894 seconds to complete its execution.

Guidance:

@timer_func
 async def run_program():
     tasks = [program() for i in range(10)]
     res = await asyncio.gather(*tasks)
     return [r['results'] for r in res]

Results: run_program took 132.3437740802765 seconds to complete its execution.

Performance is on the same for single request.

A few questions:

What is async_modes use case
Is there a possibility to disable "display" functions, I see we call "update_display" each time during async mode. Maybe I'm confused

May 25 '23 11:05 Dhul-Husni

Thanks! I'll look into it when I get a chance, we have not performance optimized yet too much. It might be that you are running in parallel vs sequential and that might be because we should be using acreate in the OpenAI LLM object.

If you or others get a chance to dig into this and learn more post here (or just post a PR :) ).

May 25 '23 19:05 slundberg

Do you have any update on this? Has it been resolved?

These are the only calls to openai that the openai guidance module makes:

        if self.chat_mode:
            kwargs['messages'] = prompt_to_messages(kwargs['prompt'])
            del kwargs['prompt']
            del kwargs['echo']
            del kwargs['logprobs']
            # print(kwargs)
            out = await openai.ChatCompletion.acreate(**kwargs)
            out = add_text_to_chat_mode(out)
        else:
            out = await openai.Completion.acreate(**kwargs)

So I am assuming contrary to the original post, this has been implemented now and is working?

Jun 21 '23 11:06 MarkPopovich

Thanks! I'll look into it when I get a chance, we have not performance optimized yet too much. It might be that you are running in parallel vs sequential and that might be because we should be using acreate in the OpenAI LLM object.

If you or others get a chance to dig into this and learn more post here (or just post a PR :) ).

I just became interested in this project and async is a must. This issue should be relatively easy to trace, right? acreate should be used for every call, unless I'm missing something there. Will try for myself.

I find it hard to believe that something called update_display is blocking the event loop. Let's take a look.

Jul 07 '23 19:07 bitnom

Any news on this?

Aug 28 '23 14:08 fjfricke

Does async work on the current version?

Apr 02 '24 18:04 ryanpeach

guys, I want to tell you something.

asyncio + LLM libraries is the devil. Should it be? of course not. We all want fast event loops but how many repos have you had to fork to fix the asyncio? I've lost count. This is especially present in an emerging 'field' where there are no solid libraries for anything.

I don't see anything about async in the docs for the new version. Personally, I'm just going to take it as a W and pass it to run_in_executor

Apr 11 '24 04:04 bitnom

Or you know we could contribute a fix. Just need the developers to confirm/deny its current implementation and approve a work plan / strategy.

Apr 12 '24 14:04 ryanpeach