FastChat Vicuna Inference demo code

Hi Vicuna authors,

I appreciate your excellent work in making it public. I am curious if you could provide a demo code for using vicuna to do the inference in a non-interactive way. Many thanks in advance!

Best

May 26 '23 23:05 CSerxy

Hi, I'm not one the authors, but in case you need a minimal example on how to use vicuna in a python script, you can have a look at the description of issue #1666. I added one there.

Hope that helps!

Jun 12 '23 13:06 Klettner

@Klettner thank you for your advice! But I wonder if it is possible to inference with vicuna locally, instead of taking advantage of OpenAI servers? Many thanks! :-)

Jul 25 '23 02:07 Zsbyqx20

Yes, it is possible to do inference with vicuna locally if you have a good enough GPU. The code I provided uses an instance of vicuna running on a local server. The openAi library is just the interface used for communication. The line openai.api_base = "http://localhost:8000/v1" set's the url that is utilised for communication. Hence, it sends the requests to localhost (i.e. the instance of vicuna running on your local computer/server) instead of sending it to OpenAI.

Jul 25 '23 18:07 Klettner

@Klettner Many thanks! I didn't really understand the API part so clearly until you show me this. I followed this document also and I made it successfully. Thank you! 😄

Jul 26 '23 02:07 Zsbyqx20