guidance icon indicating copy to clipboard operation
guidance copied to clipboard

Support for huggingface/text-generation-inference

Open sam-h-bean opened this issue 2 years ago • 7 comments

This library from HF is pretty great and I get use out of it in production settings for LLMs. Would love to figure out how to integrate a system like this for LLM safety with it so I can use HF models, get dynamic batching, and be able to stream tokens with the guidance library!

sam-h-bean avatar May 18 '23 14:05 sam-h-bean

Are you using the client to connect to a running text-generation-inference server? You would probably create your own subclass of guidance.llms.LLM . If the text-generation-inference server is OpenAI compatible (i don't think it is...) then you would be able to try the OpenAI client.

I'll take a look at it as I'm testing out the guidance library and huggingface/text-generation-inference looks compelling. No promises, not a microsoft employee, et-cetera.

sheenobu avatar May 18 '23 15:05 sheenobu

@sheenobu let us know what you find out!

There are two ways to support this. The first is just to create a LLM backend like OpenAI's, that's the first step (that I think you are looking into).

Second, I was planning to work on the remote inference story for guidance here soon, it is still in flux a bit. But some keys aspects will be:

  • One of the goals of guidance is to be able to send a whole program to a remote server for high speed inference/control (lot's of fine grained template control with no REST overhead etc).
  • When user-level functions are called we pause the program execution (like await does), and send it back to the client, which can eval the user function until it gets to a command that uses the LLM, then it sends it back to the server for more eval.
  • All this should happen while allows seamless streaming of results for the client.

I just share the above for context, it is not implemented fully yet :)

slundberg avatar May 18 '23 15:05 slundberg

this answers my question (issue #48) as well? Any ideas on timeline?

bdambrosio avatar May 19 '23 03:05 bdambrosio

Looks like much of OpenAIs guidance.llms.LLM implementation applies for text-generation-inference since they both support standard REST calls. I'm surprised the OpenAI one isnt using aiohttp instead of requests, considering its in an asyncio context anyway but I'm open to being told i'm missing something.

This was a very messy version I got working. https://gist.github.com/sheenobu/9bdd03609e2b1125a3cfd7e5cbd046fc . if you are desperate you could probably extend guidance.llms.OpenAI and override the critical methods.

Like I side elsewhere, I'll have to drop this work for now. Thanks.

sheenobu avatar May 20 '23 00:05 sheenobu

I can take a swing at implementing the rest

sam-h-bean avatar May 27 '23 21:05 sam-h-bean

That would be GREAT, I haven't had much luck. I do have a llms compatible server with access between encode and generate, and streaming access between generate a decode, if we need any server-side to get full guidance capability...

bdambrosio avatar May 27 '23 21:05 bdambrosio

+1 to this being a feature that would be useful! It's not critical for us yet, but could give it a try if @sam-h-bean doesn't finish up.

andreykurenkov avatar Jun 14 '23 00:06 andreykurenkov

+1 to this feature request.

HarshTrivedi avatar Jun 21 '23 03:06 HarshTrivedi

+1 to this

faizanahemad avatar Jul 14 '23 06:07 faizanahemad

+1

zacharyblank avatar Jul 17 '23 16:07 zacharyblank

+2

nkey0 avatar Jul 20 '23 14:07 nkey0

Looks like much of OpenAIs guidance.llms.LLM implementation applies for text-generation-inference since they both support standard REST calls. I'm surprised the OpenAI one isnt using aiohttp instead of requests, considering its in an asyncio context anyway but I'm open to being told i'm missing something.

This was a very messy version I got working. https://gist.github.com/sheenobu/9bdd03609e2b1125a3cfd7e5cbd046fc . if you are desperate you could probably extend guidance.llms.OpenAI and override the critical methods.

Like I side elsewhere, I'll have to drop this work for now. Thanks.

It seems the provided Gist link is no longer valid. Could you kindly re-upload the code or provide an updated link? Thank you!

slchenchn avatar Sep 12 '23 09:09 slchenchn

+1

julienripoche avatar Sep 25 '23 12:09 julienripoche

Actually, it is really possible to extend full fledge guidance to Text Generation Inference ? For example, what could we do about additional logits_processors such as TokenHealingLogitsProcessor ?

julienripoche avatar Sep 25 '23 13:09 julienripoche

@sam-h-bean have you managed to put a PR together for this?

marioplumbarius avatar Oct 26 '23 11:10 marioplumbarius