xtuner icon indicating copy to clipboard operation
xtuner copied to clipboard

合并llama3时出现如下报错,这个问题再使用zero3时也出现了

Open 1518630367 opened this issue 1 year ago • 4 comments

Traceback (most recent call last): File "/home/wumao/xtuner-main/xtuner/tools/model_converters/pth_to_hf.py", line 158, in main() File "/home/wumao/xtuner-main/xtuner/tools/model_converters/pth_to_hf.py", line 78, in main model = BUILDER.build(cfg.model) File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build return self.build_func(cfg, *args, **kwargs, registry=self) File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg obj = obj_cls(**args) # type: ignore File "/home/wumao/xtuner-main/xtuner/model/sft.py", line 115, in init self._prepare_for_lora(peft_model, use_activation_checkpointing) File "/home/wumao/xtuner-main/xtuner/model/sft.py", line 144, in _prepare_for_lora self.llm = get_peft_model(self.llm, self.lora) File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/peft/mapping.py", line 136, in get_peft_model return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](model, peft_config, adapter_name=adapter_name) File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/peft/peft_model.py", line 1094, in init super().init(model, peft_config, adapter_name) File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/peft/peft_model.py", line 129, in init self.base_model = cls(model, {adapter_name: peft_config}, adapter_name) File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 136, in init super().init(model, config, adapter_name) File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 148, in init self.inject_adapter(self.model, adapter_name) File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 325, in inject_adapter self._create_and_replace(peft_config, adapter_name, target, target_name, parent, current_key=key) File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 220, in _create_and_replace new_module = self._create_new_module(lora_config, adapter_name, target, **kwargs) File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 295, in _create_new_module new_module = dispatcher(target, adapter_name, lora_config=lora_config, **kwargs) File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/peft/tuners/lora/bnb.py", line 506, in dispatch_bnb_4bit new_module = Linear4bit(target, adapter_name, **fourbit_kwargs) File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/peft/tuners/lora/bnb.py", line 293, in init self.update_layer( File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/peft/tuners/lora/layer.py", line 122, in update_layer self.to(weight.device) File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1173, in to return self._apply(convert) File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply module._apply(fn) File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply module._apply(fn) File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 804, in _apply param_applied = fn(param) File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1166, in convert raise NotImplementedError( NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.

Process finished with exit code 1

1518630367 avatar May 14 '24 06:05 1518630367

显存不够导致模型中有些参数是 meta tensor,在命令后加一下 --device cpu

pppppM avatar May 14 '24 06:05 pppppM

显存不够导致模型中有些参数是 meta tensor,在命令后加一下 --device cpu

按道理来说A100 80G 不应该会显存不够,看了一下显存没有占到1/8的时候就报错了,然后如果使用cpu会出现以下报错 pth_to_hf.py: error: unrecognized arguments: --device cpu

1518630367 avatar May 14 '24 06:05 1518630367

显存不够导致模型中有些参数是 meta tensor,在命令后加一下 --device cpu

使用旧版的xtuner 可以秒合并,新版的就会出现那个问题

1518630367 avatar May 14 '24 06:05 1518630367

@1518630367 请问使用的新旧 xtuner 都是多少?

pppppM avatar May 16 '24 02:05 pppppM

@1518630367 这个问题已经被定位到并修复 #697

pppppM avatar May 17 '24 08:05 pppppM