llama-cpp-python (WIP) Openapi client gen

Putting this up for feedback, but this is the autogenerated openapi client for the server.

There's a lot of commits in here since this branch is based on:

My "developer setup" branch: https://github.com/abetlen/llama-cpp-python/pull/135
The "better server fields / parameters" branch: https://github.com/abetlen/llama-cpp-python/pull/130

So the changes will be easier to review if / when those make it into master.

But the last 5 or 6 commits are what this PR is ultimately for: infrastructure to autogenerate a client based on the openapi specification.

At a high level here's what these commits do:

Implement a script that downloads the openapi.json from the server to a local folder
Implement another script that uses this to generate a client via autorest
Commits for dependencies for the new client, as well as the client itself
Some runnable examples (that work!)

Missing:

Tests :)

Maybe the most controversial part of this is the decision to go with autorest when there are other codegen solutions out there for openapi clients. I wrote a long commit message about this (which i'll copy-paste here) but here are my thoughts on the matter:

Its got some things going for it:

It works!
Its tested and documented
It gives decent configuration options for the generator
It allows for decent customization by users and consumers of the client
It can (eventually) support streaming responses from the server
The only dependency is docker
Its actually supported by a large organization / number of developers
It could be trivially extended to generate clients for multiple languages besides python

The downsides:

Its still generating a lot of code that isn't very idiomatic python (but its still reasonable to follow)
It depends on some Microsoft / Azure libraries for basic things instead... they're probably fine but its still a bit weird to be using msrest and azure-core instead of like... pydantic and requests.
The docker container is slow to start since it always downloads the latest version of autorest

May 03 '23 00:05 Stonelinks

https://github.com/Stonelinks/llama-cpp-python/issues/4

May 03 '23 00:05 Stonelinks

Fixed up the last commit to include examples that:

Synchronously use the openapi client to make requests
Asynchronously use the openapi client to make requests
Make a streaming request (just uses the autogenerated models for now)

May 03 '23 01:05 Stonelinks

Its still generating a lot of code that isn't very idiomatic python (but its still reasonable to follow)

@Stonelinks I found this too, so I co-founded a company to generate idiomatic client code. Check out Fern as an easy way to pass in your OpenAPI spec and get a Python client that supports for sync and async clients out of the box. Read more about the Python generator

May 31 '23 00:05 dannysheridan