ms-swift icon indicating copy to clipboard operation
ms-swift copied to clipboard

在网页版的demo中推理仅仅只能进行语言模型的对话,没有多模态的推理

Open a2382625920 opened this issue 1 year ago • 5 comments

例如: 我已经微调好了Yi-VL-6B,想使用网页版推理进行对话,但是推理页面没有专门的图片输入选项,只能进行chat,希望可以改进一下

a2382625920 avatar Feb 27 '24 06:02 a2382625920

例如: 我已经微调好了Yi-VL-6B,想使用网页版推理进行对话,但是推理页面没有专门的图片输入选项,只能进行chat,希望可以改进一下

不对,应该是yi-vl-6B没有图片的输入也无法进行chat,网页版推理仅对纯语言模型有效

a2382625920 avatar Feb 27 '24 06:02 a2382625920

是的, 网页端暂时不支持多模态模型的推理, 可以使用swift infer进行命令行的推理

Jintao-Huang avatar Feb 27 '24 07:02 Jintao-Huang

是的, 网页端暂时不支持多模态模型的推理, 可以使用进行命令行的推理swift infer

好的谢谢,那能不能通过一个循环,将一个json文件中的关键字段全部提取出来,然后json中的每个数据都推理一遍

a2382625920 avatar Feb 28 '24 07:02 a2382625920

类似于这样(这是Qwen-VL的批量预测的代码): def _load_model_tokenizer(): tokenizer = AutoTokenizer.from_pretrained( DEFAULT_CKPT_PATH, trust_remote_code=True, resume_download=True, )

device_map = "cuda"
model = AutoPeftModelForCausalLM.from_pretrained(
DEFAULT_CKPT_PATH, # path to the output directory
device_map="cuda",
trust_remote_code=True
).eval()
# model.generation_config = GenerationConfig.from_pretrained(
#     DEFAULT_CKPT_PATH, trust_remote_code=True, resume_download=True,
# )

return model, tokenizer

def parse_text(text): lines = text.split("\n") lines = [line for line in lines if line != ""] count = 0 for i, line in enumerate(lines): if "```" in line: count += 1 items = line.split("") if count % 2 == 1: lines[i] = f'<pre><code class="language-{items[-1]}">' else: lines[i] = f"<br></code></pre>" else: if i > 0: if count % 2 == 1: line = line.replace("", r"`") line = line.replace("<", "<") line = line.replace(">", ">") line = line.replace(" ", " ") line = line.replace("*", "*") line = line.replace("", "_") line = line.replace("-", "-") line = line.replace(".", ".") line = line.replace("!", "!") line = line.replace("(", "(") line = line.replace(")", ")") line = line.replace("$", "$") lines[i] = "
" + line text = "".join(lines) return text

def predict(message): start = time.time() message = _parse_text(message) print("用户: " + _parse_text(message)) history = [] response, history = model.chat(tokenizer, message, history=history) full_response = _parse_text(response) print("Qwen-VL-Chat: " + _parse_text(full_response)) print(f"耗时{time.time()-start}") return full_response

model, tokenizer = _load_model_tokenizer()

if name == 'main': with open(r'your/path', 'r', encoding='utf-8') as f: valid_data = json.load(f) data_list = [] start = time.time() for index, i in enumerate(valid_data): image_id = i.get("id") print(f'进度{index + 1}/{len(valid_data)};{image_id}') conversations = i.get("conversations") query = conversations[0].get("value") true_ = conversations[1].get("value") full_response = predict(query) print("实际: ", true_) data = {"图像名称": image_id, "实际结果": true_, "预测结果": full_response, "查询": query} data_list.append(data) print(f"耗时{time.time()-start}")

output_path = 'your/predict.json'
with open(output_path, 'w', encoding='utf-8') as f:
    json.dump(data_list, f, ensure_ascii=False, indent=2)

print(f"输出保存在 {output_path}")

a2382625920 avatar Feb 28 '24 07:02 a2382625920

是的, 网页端暂时不支持多模态模型的推理, 可以使用swift infer进行命令行的推理

此前的1.5.4版本可以加载qwen-vl-chat,图片的推理也可以用“<img>xxx.jpg</img>“来进行输入,结果到了1.7.0这个也不支持了,请问这个能恢复吗?

GUOGUO-lab avatar Mar 22 '24 02:03 GUOGUO-lab