lora-scripts icon indicating copy to clipboard operation
lora-scripts copied to clipboard

Liunx上训练报错

Open xujipm opened this issue 2 years ago • 2 comments

running training / 学習開始
  num train images * repeats / 学習画像の数×繰り返し回数: 78
  num reg images / 正則化画像の数: 0
  num batches per epoch / 1epochのバッチ数: 26
  num epochs / epoch数: 20
  batch size per device / バッチサイズ: 3
  gradient accumulation steps / 勾配を合計するステップ数 = 1
  total optimization steps / 学習ステップ数: 520
steps:   0%|                                                                                    | 0/520 [00:00<?, ?it/s]epoch 1/20
Traceback (most recent call last):
  File "/home/stable/lora-scripts/./sd-scripts/train_network.py", line 699, in <module>
    train(args)
  File "/home/stable/lora-scripts/./sd-scripts/train_network.py", line 538, in train
    noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
  File "/home/stable/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/stable/anaconda3/lib/python3.10/site-packages/accelerate/utils/operations.py", line 490, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/home/stable/anaconda3/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
    return func(*args, **kwargs)
  File "/home/stable/anaconda3/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py", line 381, in forward
    sample, res_samples = downsample_block(
  File "/home/stable/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/stable/anaconda3/lib/python3.10/site-packages/diffusers/models/unet_2d_blocks.py", line 612, in forward
    hidden_states = attn(hidden_states, encoder_hidden_states=encoder_hidden_states).sample
  File "/home/stable/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/stable/anaconda3/lib/python3.10/site-packages/diffusers/models/attention.py", line 216, in forward
    hidden_states = block(hidden_states, context=encoder_hidden_states, timestep=timestep)
  File "/home/stable/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/stable/anaconda3/lib/python3.10/site-packages/diffusers/models/attention.py", line 484, in forward
    hidden_states = self.attn1(norm_hidden_states) + hidden_states
  File "/home/stable/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/stable/lora-scripts/sd-scripts/library/train_util.py", line 1700, in forward_xformers
    out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=None)  # 最適なのを選んでくれる
  File "/home/stable/anaconda3/lib/python3.10/site-packages/xformers/ops/memory_efficient_attention.py", line 975, in memory_efficient_attention
    return op.apply(query, key, value, attn_bias, p, scale).reshape(output_shape)
  File "/home/stable/anaconda3/lib/python3.10/site-packages/xformers/ops/memory_efficient_attention.py", line 360, in forward
    out, lse = cls.FORWARD_OPERATOR(
  File "/home/stable/anaconda3/lib/python3.10/site-packages/torch/_ops.py", line 442, in __call__
    return self._op(*args, **kwargs or {})
NotImplementedError: Could not run 'xformers::efficient_attention_forward_cutlass' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'xformers::efficient_attention_forward_cutlass' is only available for these backends: [BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PythonDispatcher].

BackendSelect: fallthrough registered at ../aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:140 [backend fallback]
FuncTorchDynamicLayerBackMode: registered at ../aten/src/ATen/functorch/DynamicLayer.cpp:488 [backend fallback]
Functionalize: registered at ../aten/src/ATen/FunctionalizeFallbackKernel.cpp:291 [backend fallback]
Named: registered at ../aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Conjugate: registered at ../aten/src/ATen/ConjugateFallback.cpp:18 [backend fallback]
Negative: registered at ../aten/src/ATen/native/NegateFallback.cpp:18 [backend fallback]
ZeroTensor: registered at ../aten/src/ATen/ZeroTensorFallback.cpp:86 [backend fallback]
ADInplaceOrView: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:64 [backend fallback]
AutogradOther: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:35 [backend fallback]
AutogradCPU: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:39 [backend fallback]
AutogradCUDA: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:47 [backend fallback]
AutogradXLA: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:51 [backend fallback]
AutogradMPS: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:59 [backend fallback]
AutogradXPU: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:43 [backend fallback]
AutogradHPU: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:68 [backend fallback]
AutogradLazy: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:55 [backend fallback]
Tracer: registered at ../torch/csrc/autograd/TraceTypeManual.cpp:296 [backend fallback]
AutocastCPU: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:482 [backend fallback]
AutocastCUDA: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:324 [backend fallback]
FuncTorchBatched: registered at ../aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:743 [backend fallback]
FuncTorchVmapMode: fallthrough registered at ../aten/src/ATen/functorch/VmapModeRegistrations.cpp:28 [backend fallback]
Batched: registered at ../aten/src/ATen/BatchingRegistrations.cpp:1064 [backend fallback]
VmapMode: fallthrough registered at ../aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
FuncTorchGradWrapper: registered at ../aten/src/ATen/functorch/TensorWrapper.cpp:189 [backend fallback]
PythonTLSSnapshot: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:148 [backend fallback]
FuncTorchDynamicLayerFrontMode: registered at ../aten/src/ATen/functorch/DynamicLayer.cpp:484 [backend fallback]
PythonDispatcher: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:144 [backend fallback]

steps:   0%|                                                                                    | 0/520 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/stable/anaconda3/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/stable/anaconda3/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/home/stable/anaconda3/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1104, in launch_command
    simple_launcher(args)
  File "/home/stable/anaconda3/lib/python3.10/site-packages/accelerate/commands/launch.py", line 567, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/stable/anaconda3/bin/python', './sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=./sd-models/model.ckpt', '--train_data_dir=./train/yazi', '--output_dir=./output', '--logging_dir=./logs', '--resolution=512,704', '--network_module=networks.lora', '--max_train_epochs=20', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=32', '--network_alpha=32', '--output_name=ba_yazi_V10', '--train_batch_size=3', '--save_every_n_epochs=2', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1337', '--cache_latents', '--clip_skip=2', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1024', '--keep_tokens=0', '--xformers', '--shuffle_caption', '--use_8bit_adam', '--noise_offset', '0']' returned non-zero exit status 1.

大佬们帮忙看下是什么情况呢

xujipm avatar Mar 27 '23 15:03 xujipm

ERROR: Exception in ASGI application Traceback (most recent call last): File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/anyio/streams/memory.py", line 94, in receive return self.receive_nowait() File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/anyio/streams/memory.py", line 89, in receive_nowait raise WouldBlock anyio.WouldBlock

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/starlette/middleware/base.py", line 78, in call_next message = await recv_stream.receive() File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/anyio/streams/memory.py", line 114, in receive raise EndOfStream anyio.EndOfStream

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 428, in run_asgi result = await app( # type: ignore[func-returns-value] File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in call return await self.app(scope, receive, send) File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/fastapi/applications.py", line 276, in call await super().call(scope, receive, send) File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/starlette/applications.py", line 122, in call await self.middleware_stack(scope, receive, send) File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/starlette/middleware/errors.py", line 184, in call raise exc File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/starlette/middleware/errors.py", line 162, in call await self.app(scope, receive, _send) File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/starlette/middleware/base.py", line 108, in call response = await self.dispatch_func(request, call_next) File "/media/zhi/sd/Ai-test/new-lora-scripts/lora-scripts/gui.py", line 123, in add_cache_control_header response = await call_next(request) File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/starlette/middleware/base.py", line 84, in call_next raise app_exc File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/starlette/middleware/base.py", line 70, in coro await self.app(scope, receive_or_disconnect, send_no_error) File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 79, in call raise exc File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 68, in call await self.app(scope, receive, sender) File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in call raise e File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in call await self.app(scope, receive, send) File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/starlette/routing.py", line 718, in call await route.handle(scope, receive, send) File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle await self.app(scope, receive, send) File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/starlette/routing.py", line 66, in app response = await func(request) File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/fastapi/routing.py", line 237, in app raw_response = await run_endpoint_function( File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/fastapi/routing.py", line 163, in run_endpoint_function return await dependant.call(**values) File "/media/zhi/sd/Ai-test/new-lora-scripts/lora-scripts/gui.py", line 117, in create_toml_file f.write(toml.dumps(j)) AttributeError: module 'toml' has no attribute 'dumps'

我的点训练直接报这个错

jzjbyq avatar May 08 '23 07:05 jzjbyq

pip install albumentations toml accelerate einops voluptuous -i https://pypi.tuna.tsinghua.edu.cn/simple some-package

HardySimpson avatar May 16 '23 06:05 HardySimpson