训练失败,怎么处理
环境:aliyun PAI-DSW, modelscope:1.10.0-pytorch2.1.0tensorlow2.14.0-gpu-py310 按照步骤全部成功,只有部分部兼容报错,但上传图片后,点击训练,模型下载均没有问题。 然后显示ERROR,具体后台日志如下:
Traceback (most recent call last):
File "/mnt/workspace/facechain/facechain/train_text_to_image_lora.py", line 1224, in
请各位帮忙看看
看样子应该是训练lora时vae latent的梯度没有开
在1100行拿到latent后加一句
latents.requires_grad_(True)
应该就可以了
请问是那个文件哈,train_text_to_image_lora.py 还是 app.py 呢?没找到啊,多谢
训练失败:mmcv这个软件包在linux如何安装? Process Process-1: Traceback (most recent call last): File "/home/saizong/.local/lib/python3.11/site-packages/modelscope/utils/registry.py", line 210, in build_from_cfg return obj_cls._instantiate(**args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/saizong/.local/lib/python3.11/site-packages/modelscope/models/base/base_model.py", line 67, in _instantiate return cls(**kwargs) ^^^^^^^^^^^^^ File "/home/saizong/.local/lib/python3.11/site-packages/modelscope/models/cv/face_detection/scrfd/damofd_detect.py", line 31, in init super().init(model_dir, **kwargs) File "/home/saizong/.local/lib/python3.11/site-packages/modelscope/models/cv/face_detection/scrfd/scrfd_detect.py", line 33, in init from mmcv import Config ModuleNotFoundError: No module named 'mmcv'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/saizong/.local/lib/python3.11/site-packages/modelscope/utils/registry.py", line 212, in build_from_cfg return obj_cls(**args) ^^^^^^^^^^^^^^^ File "/home/saizong/.local/lib/python3.11/site-packages/modelscope/pipelines/cv/face_detection_pipeline.py", line 36, in init super().init(model=model, **kwargs) File "/home/saizong/.local/lib/python3.11/site-packages/modelscope/pipelines/base.py", line 100, in init self.model = self.initiate_single_model(model, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/saizong/.local/lib/python3.11/site-packages/modelscope/pipelines/base.py", line 53, in initiate_single_model return Model.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^ File "/home/saizong/.local/lib/python3.11/site-packages/modelscope/models/base/base_model.py", line 183, in from_pretrained model = build_model(model_cfg, task_name=task_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/saizong/.local/lib/python3.11/site-packages/modelscope/models/builder.py", line 35, in build_model model = build_from_cfg( ^^^^^^^^^^^^^^^ File "/home/saizong/.local/lib/python3.11/site-packages/modelscope/utils/registry.py", line 215, in build_from_cfg raise type(e)(f'{obj_cls.name}: {e}') ModuleNotFoundError: DamoFdDetect: No module named 'mmcv'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/usr/lib64/python3.11/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/usr/lib64/python3.11/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/home/saizong/stable-diffusion/extensions/facechain/facechain/inference.py", line 25, in _data_process_fn_process Blipv2()(input_img_dir) ^^^^^^^^ File "/home/saizong/stable-diffusion/extensions/facechain/facechain/data_process/preprocessing.py", line 207, in init self.face_detection = pipeline(task=Tasks.face_detection, model='damo/cv_ddsar_face-detection_iclr23-damofd', model_revision='v1.1') ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/saizong/.local/lib/python3.11/site-packages/modelscope/pipelines/builder.py", line 170, in pipeline return build_pipeline(cfg, task_name=task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/saizong/.local/lib/python3.11/site-packages/modelscope/pipelines/builder.py", line 65, in build_pipeline return build_from_cfg( ^^^^^^^^^^^^^^^ File "/home/saizong/.local/lib/python3.11/site-packages/modelscope/utils/registry.py", line 215, in build_from_cfg raise type(e)(f'{obj_cls.name}: {e}') ModuleNotFoundError: FaceDetectionPipeline: DamoFdDetect: No module named 'mmcv'
I resovle the error by adding "loss.requires_grad = True" after line 1033 in train_text_to_image_lora.py
train_loss += avg_loss.item() / args.gradient_accumulation_steps
loss.requires_grad = True
I resovle the error by adding "loss.requires_grad = True" after line 1033 in train_text_to_image_lora.py
train_loss += avg_loss.item() / args.gradient_accumulation_steps loss.requires_grad = True
I added this, but a new problem happened.
Traceback (most recent call last):0%|████████████████████████████████████████████████████████████████████████████| 7/7 [00:03<00:00, 2.09it/s]
File "/root/facechain-main/facechain/train_text_to_image_lora.py", line 1225, in
OK,thanks a lot!
I resovle the error by adding "loss.requires_grad = True" after line 1033 in train_text_to_image_lora.py
train_loss += avg_loss.item() / args.gradient_accumulation_steps loss.requires_grad = TrueI added this, but a new problem happened. Traceback (most recent call last):0%|████████████████████████████████████████████████████████████████████████████| 7/7 [00:03<00:00, 2.09it/s] File "/root/facechain-main/facechain/train_text_to_image_lora.py", line 1225, in main() File "/root/facechain-main/facechain/train_text_to_image_lora.py", line 1211, in main pipeline.unet.load_attn_procs(args.output_dir) File "/root/miniconda3/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn return fn(*args, **kwargs) File "/root/miniconda3/lib/python3.8/site-packages/diffusers/loaders/unet.py", line 297, in load_attn_procs raise ValueError(f"Module {key} is not a LoRACompatibleConv or LoRACompatibleLinear module.") ValueError: Module down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_q is not a LoRACompatibleConv or LoRACompatibleLinear module.
I met this problem too...
please try out the newest train-free, 10s inference version facechain-fact.