xtuner
xtuner copied to clipboard
[Refactor & Feature] Refactor `xtuner chat` to support `lmdeploy` &`vLLM`
Motivation
- 可以对接推理引擎加速
xtuner chat - 支持 xtuner 训练得到的模型直接部署
- 方便 xtuner 开发 gradio 应用
- 保证训练部署时对话模板一致
- 简化部署流程
Usage
- xtuner chat 启动命令
# HF
python xtuner/tools/new_chat.py internlm/internlm-chat-7b
# LMDeploy (w/o adapter)
python xtuner/tools/new_chat.py internlm/internlm-chat-7b --lmdeploy
# LMDeploy (w/o adapter)
python xtuner/tools/new_chat.py internlm/internlm-chat-7b --vllm
# HF Moss
python xtuner/tools/new_chat.py meta-llama/Llama-2-7b-hf --adapter xtuner/Llama-2-7b-qlora-moss-003-sft --bot-name Llama2 --prompt-template moss_sft --system-prompt moss_sft --with-plugins calculate solve search
# LMDeploy Moss (w/o adapter)
python xtuner/tools/new_chat.py MOSS_MERGED --bot-name Llama2 --prompt-template moss_sft --system-prompt moss_sft --with-plugins calculate solve search --lmdeploy
# Lagent (only support HF)
python xtuner/tools/new_chat.py internlm/internlm-7b --adapter xtuner/internlm-7b-qlora-msagent-react --lagent
# Llava (only support HF)
python xtuner/tools/new_chat.py internlm/internlm-chat-7b \
--visual-encoder openai/clip-vit-large-patch14-336 \
--llava xtuner/llava-internlm-7b \
--prompt-template internlm_chat \
--image $IMAGE_PATH
- ChatBot 用法
from xtuner.chat import BaseChat, CHAT_TEMPLATE
template = CHAT_TEMPLATE['internlm2-chat']
################# 使用 HF 推理 #####################
from xtuner.chat import HFBot
bot = HFBot('internlm/internlm2-chat-7b')
hf_bot = BaseChat( bot, chat_template=template)
## 对话
print(hf_bot.chat( '你是谁'))
## 流式输出
streamer = hf_bot.create_streamer()
hf_bot.chat( '你是谁', streamer=streamer)
## 流式输出迭代器(for gradio)
streamer = hf_bot.create_streamer(iterable=True)
from threading import Thread
chat_kwargs = dict(text='你是谁', streamer=streamer)
thread = Thread(target=hf_bot.chat, kwargs=chat_kwargs)
thread.start()
for new_text in streamer:
print(new_text, flush=True, end='')
## 清空历史
hf_bot.reset_history()
## 离线批处理
results = hf_bot.predict(['你是谁?', '你叫什么?'])
################# 使用 HF Llava 推理 #####################
from xtuner.chat import HFLlavaBot, LlavaChat
bot = HFLlavaBot(
'internlm/internlm2-chat-7b',
'xtuner/llava-internlm2-7b',
'openai/clip-vit-large-patch14-336')
image1 = 'https://llava.hliu.cc/file=/nobackup/haotian/code/LLaVA_dev/llava/serve/examples/extreme_ironing.jpg'
image2 = 'https://llava.hliu.cc/file=/nobackup/haotian/code/LLaVA_dev/llava/serve/examples/waterview.jpg'
llava_bot = LlavaChat( bot, image1, chat_template=template)
## 对话
print(llava_bot.chat( 'What is unusual about this image?'))
## 流式输出
streamer = bot.create_streamer()
llava_bot.chat( 'What is unusual about this image?', streamer=streamer)
## 流式输出迭代器(for gradio)
streamer = bot.create_streamer(iterable=True)
from threading import Thread
chat_kwargs = dict(text='What is unusual about this image?', streamer=streamer)
thread = Thread(target=llava_bot.chat, kwargs=chat_kwargs)
thread.start()
for new_text in streamer:
print(new_text, flush=True, end='')
## 清空历史
llava_bot.reset_history()
## 替换图像
llava_bot.reset_image(img2)
print(llava_bot.chat( 'What are the things I should be cautious about when I visit here?'))
TODO
- [x] Test HF Chat
- [x] Test LMDeploy Chat
- [x] Test vLLM Chat
- [x] Test HF Predict
- [x] Test LMDeploy Predict
- [x] Test vLLM Predict
- [ ] Test HF Moss Chat
- [ ] Test LMDeploy Moss Chat (w/o adapter)
- [ ] Test HF Lagent Chat
- [x] Test HF Llava Chat
New Args
-
repetition-penalty -
lmdeploy(LMDeploy) -
dynamic-ntk(LMDeploy) -
logn-attn(LMDeploy) -
rope_scaling_factor(LMDeploy) -
batch-size(LMDeploy) -
predict, the file path that need to be predicted offline -
predict-repeat
BC-Breakings
- Remove
torch-dtype - Remove
offload-folder - Remove
no-streamer(only support no-streamer)
@pppppM 目前可以用了吗