[Refactor & Feature] Refactor `xtuner chat` to support `lmdeploy` &`vLLM`

Open pppppM opened this issue 2 years ago • 1 comments

Motivation

可以对接推理引擎加速 xtuner chat
支持 xtuner 训练得到的模型直接部署
方便 xtuner 开发 gradio 应用
保证训练部署时对话模板一致
简化部署流程

Usage

xtuner chat 启动命令

# HF 
python xtuner/tools/new_chat.py internlm/internlm-chat-7b 

# LMDeploy (w/o adapter)
python xtuner/tools/new_chat.py internlm/internlm-chat-7b --lmdeploy

# LMDeploy (w/o adapter)
python xtuner/tools/new_chat.py internlm/internlm-chat-7b --vllm

# HF Moss
python xtuner/tools/new_chat.py meta-llama/Llama-2-7b-hf --adapter xtuner/Llama-2-7b-qlora-moss-003-sft --bot-name Llama2 --prompt-template moss_sft --system-prompt moss_sft --with-plugins calculate solve search 

# LMDeploy Moss (w/o adapter)
python xtuner/tools/new_chat.py MOSS_MERGED --bot-name Llama2 --prompt-template moss_sft --system-prompt moss_sft --with-plugins calculate solve search  --lmdeploy

# Lagent (only support HF)
python xtuner/tools/new_chat.py internlm/internlm-7b --adapter xtuner/internlm-7b-qlora-msagent-react --lagent

# Llava (only support HF)
python xtuner/tools/new_chat.py internlm/internlm-chat-7b \
  --visual-encoder openai/clip-vit-large-patch14-336 \
  --llava xtuner/llava-internlm-7b \
  --prompt-template internlm_chat \
  --image $IMAGE_PATH

ChatBot 用法


from xtuner.chat import BaseChat, CHAT_TEMPLATE
template = CHAT_TEMPLATE['internlm2-chat']

################# 使用 HF 推理 #####################
from xtuner.chat import HFBot
bot = HFBot('internlm/internlm2-chat-7b')
hf_bot = BaseChat( bot, chat_template=template)

## 对话
print(hf_bot.chat( '你是谁'))

## 流式输出
streamer = hf_bot.create_streamer()
hf_bot.chat( '你是谁',  streamer=streamer)

## 流式输出迭代器（for gradio）
streamer = hf_bot.create_streamer(iterable=True)

from threading import Thread
chat_kwargs = dict(text='你是谁', streamer=streamer)
thread = Thread(target=hf_bot.chat, kwargs=chat_kwargs)
thread.start()

for new_text in streamer:
      print(new_text, flush=True, end='')

## 清空历史
hf_bot.reset_history()

## 离线批处理
results = hf_bot.predict(['你是谁？', '你叫什么？'])


################# 使用 HF Llava 推理 #####################
from xtuner.chat import HFLlavaBot, LlavaChat
bot = HFLlavaBot(
                 'internlm/internlm2-chat-7b', 
                 'xtuner/llava-internlm2-7b',
                 'openai/clip-vit-large-patch14-336')

image1 = 'https://llava.hliu.cc/file=/nobackup/haotian/code/LLaVA_dev/llava/serve/examples/extreme_ironing.jpg'
image2 = 'https://llava.hliu.cc/file=/nobackup/haotian/code/LLaVA_dev/llava/serve/examples/waterview.jpg'
llava_bot = LlavaChat( bot, image1, chat_template=template)

## 对话
print(llava_bot.chat( 'What is unusual about this image?'))

## 流式输出
streamer = bot.create_streamer()
llava_bot.chat( 'What is unusual about this image?',  streamer=streamer)

## 流式输出迭代器（for gradio）
streamer = bot.create_streamer(iterable=True)

from threading import Thread
chat_kwargs = dict(text='What is unusual about this image?', streamer=streamer)
thread = Thread(target=llava_bot.chat, kwargs=chat_kwargs)
thread.start()

for new_text in streamer:
      print(new_text, flush=True, end='')

## 清空历史
llava_bot.reset_history()

## 替换图像
llava_bot.reset_image(img2)
print(llava_bot.chat( 'What are the things I should be cautious about when I visit here?'))

TODO

[x] Test HF Chat
[x] Test LMDeploy Chat
[x] Test vLLM Chat
[x] Test HF Predict
[x] Test LMDeploy Predict
[x] Test vLLM Predict
[ ] Test HF Moss Chat
[ ] Test LMDeploy Moss Chat (w/o adapter)
[ ] Test HF Lagent Chat
[x] Test HF Llava Chat

New Args

repetition-penalty
lmdeploy(LMDeploy)
dynamic-ntk(LMDeploy)
logn-attn(LMDeploy)
rope_scaling_factor(LMDeploy)
batch-size(LMDeploy)
predict, the file path that need to be predicted offline
predict-repeat

BC-Breakings

Remove torch-dtype
Remove offload-folder
Remove no-streamer(only support no-streamer)

Jan 15 '24 07:01 pppppM

@pppppM 目前可以用了吗

Mar 14 '24 03:03 chynphh