Integrate IPEX-LLM into serve worker

Open gc-fu opened this issue 1 year ago • 1 comments

Why are these changes needed?

Description: ipex-llm is a library for running LLM on Intel CPU/XPU (from Laptop to GPU to Cloud) using INT4/FP4/INT8/FP8 with very low latency (for any PyTorch model). This PR adds ipex-llm integrations to FastChat.

An example of using Arc A770 for serving:

https://github.com/lm-sys/FastChat/assets/110874468/65d1b42d-5a6e-4ddd-bf8f-573e1e306a5d

Related issue number (if applicable)

None.

Checks

[x] I've run format.sh to lint the changes in this PR.
[x] I've included any doc changes needed.
[x] I've made sure the relevant tests are passing (if applicable).

Mar 12 '24 07:03 gc-fu

@infwinston Hi, great thanks for maintaining this fantastic project. Any thoughts on this integration PR? :smiley:

Mar 28 '24 02:03 gc-fu