FastChat
FastChat copied to clipboard
Integrate IPEX-LLM into serve worker
Why are these changes needed?
Description: ipex-llm is a library for running LLM on Intel CPU/XPU (from Laptop to GPU to Cloud) using INT4/FP4/INT8/FP8 with very low latency (for any PyTorch model). This PR adds ipex-llm integrations to FastChat.
An example of using Arc A770 for serving:
https://github.com/lm-sys/FastChat/assets/110874468/65d1b42d-5a6e-4ddd-bf8f-573e1e306a5d
Related issue number (if applicable)
None.
Checks
- [x] I've run
format.shto lint the changes in this PR. - [x] I've included any doc changes needed.
- [x] I've made sure the relevant tests are passing (if applicable).
@infwinston Hi, great thanks for maintaining this fantastic project. Any thoughts on this integration PR? :smiley: