[Feature] <title>6b-32k的多卡部署一直报错,但是2k的没问题,求助?
Is your feature request related to a problem? Please describe.
(chatglm) dingyifan@ciccy004:~/code/ChatGLM2-6B$ python web_demo.py
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████| 7/7 [00:06<00:00, 1.07it/s]
web_demo.py:89: GradioDeprecationWarning: The style method is deprecated. Please set these arguments in the constructor instead.
user_input = gr.Textbox(show_label=False, placeholder="Input...", lines=10).style(
Running on local URL: http://127.0.0.1:7860
To create a public link, set share=True in launch().
Traceback (most recent call last):
File "/home/anaconda/envs/chatglm/lib/python3.8/site-packages/gradio/queueing.py", line 388, in call_prediction
output = await route_utils.call_process_api(
File "/home/anaconda/envs/chatglm/lib/python3.8/site-packages/gradio/route_utils.py", line 219, in call_process_api
output = await app.get_blocks().process_api(
File "/home/anaconda/envs/chatglm/lib/python3.8/site-packages/gradio/blocks.py", line 1437, in process_api
result = await self.call_function(
File "/home/anaconda/envs/chatglm/lib/python3.8/site-packages/gradio/blocks.py", line 1123, in call_function
prediction = await utils.async_iteration(iterator)
File "/home/anaconda/envs/chatglm/lib/python3.8/site-packages/gradio/utils.py", line 503, in async_iteration
return await iterator.anext()
File "/home/anaconda/envs/chatglm/lib/python3.8/site-packages/gradio/utils.py", line 496, in anext
return await anyio.to_thread.run_sync(
File "/home/anaconda/envs/chatglm/lib/python3.8/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/anaconda/envs/chatglm/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/home/anaconda/envs/chatglm/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/home/anaconda/envs/chatglm/lib/python3.8/site-packages/gradio/utils.py", line 479, in run_sync_iterator_async
return next(iterator)
File "/home/anaconda/envs/chatglm/lib/python3.8/site-packages/gradio/utils.py", line 629, in gen_wrapper
yield from f(*args, **kwargs)
File "web_demo.py", line 65, in predict
for response, history, past_key_values in model.stream_chat(tokenizer, input, history, past_key_values=past_key_values,
File "/home/anaconda/envs/chatglm/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "/home/dingyifan/.cache/huggingface/modules/transformers_modules/model-32k/modeling_chatglm.py", line 1072, in stream_chat
for outputs in self.stream_generate(**inputs, past_key_values=past_key_values,
File "/home/anaconda/envs/chatglm/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "/home/dingyifan/.cache/huggingface/modules/transformers_modules/model-32k/modeling_chatglm.py", line 1157, in stream_generate
outputs = self(
File "/home/anaconda/envs/chatglm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/anaconda/envs/chatglm/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/dingyifan/.cache/huggingface/modules/transformers_modules/model-32k/modeling_chatglm.py", line 946, in forward
transformer_outputs = self.transformer(
File "/home/anaconda/envs/chatglm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/dingyifan/.cache/huggingface/modules/transformers_modules/model-32k/modeling_chatglm.py", line 836, in forward
hidden_states, presents, all_hidden_states, all_self_attentions = self.encoder(
File "/home/anaconda/envs/chatglm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/dingyifan/.cache/huggingface/modules/transformers_modules/model-32k/modeling_chatglm.py", line 655, in forward
presents = torch.cat((presents, kv_cache), dim=0)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument tensors in method wrapper_CUDA_cat)
Solutions
不知道
Additional context
No response
我也是同样问题,换回ChatGLM2-6B就可以多卡
同样的问题,求优化