ColossalAI
ColossalAI copied to clipboard
[BUG]: `Segmentation fault (core dumped)` when running ldm on Cifer-10dataset
🐛 Describe the bug
...
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
DiffusionWrapper has 865.91 M params.
=========================================================================================
No pre-built kernel is found, build and load the cpu_adam kernel during runtime now
=========================================================================================
Emitting ninja build file /home/fangfei/.cache/colossalai/torch_extensions/torch1.13_cu11.7/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.413104295730591 seconds
=========================================================================================
No pre-built kernel is found, build and load the fused_optim kernel during runtime now
=========================================================================================
Detected CUDA files, patching ldflags
Emitting ninja build file /home/fangfei/.cache/colossalai/torch_extensions/torch1.13_cu11.7/build.ninja...
Building extension module fused_optim...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_optim...
Time to load fused_optim op: 1.8385069370269775 seconds
Segmentation fault (core dumped)
when running stable diffusion with python main.py --logdir /tmp/ --train --base configs/train_colossalai_cifar10.yaml, and I cannot locate the cause of the error
Environment
pytorch-lightning ==1.8.1
There's maybe some problems with your --logdir, please check your /tmp folder whether it has enough space to store the log files.
We have updated a lot. This issue was closed due to inactivity. Thanks.