Always get the Cuda Out Of Memory error when training LoRA despite fixing the batch_size
Here is my logs:
STARTIG JOB WITH CONFIG: adaptive_loss_weight: null allow_tf32: true backup_every: 1000 batch_size: 4 bucketeer_random_ratio: 0.05 captions_getter: null checkpoint_extension: safetensors checkpoint_path: output clip_image_model_name: openai/clip-vit-large-patch14 clip_text_model_name: laion/CLIP-ViT-bigG-14-laion2B-39B-b160k dataset_filters: null dist_file_subfolder: '' dtype: null effnet_checkpoint_path: models/effnet_encoder.safetensors ema_beta: null ema_iters: null ema_start_iters: null experiment_id: stage_c_3b_lora generator_checkpoint_path: models/stage_c_bf16.safetensors grad_accum_steps: 4 image_size: 768 lora_checkpoint_path: null lr: 0.0001 model_version: 3.6B module_filters:
- .attn multi_aspect_ratio:
- 1/1
- 1/2
- 1/3
- 2/3
- 3/4
- 1/5
- 2/5
- 3/5
- 4/5
- 1/6
- 5/6
- 9/16 output_path: output previewer_checkpoint_path: models/previewer.safetensors rank: 4 save_every: 100 train_tokens:
-
- '[fernando]'
- ^dog training: true updates: 10000 use_fsdp: false wandb_entity: quocanh34 wandb_project: StableCascade warmup_updates: 1 webdataset_path: file:data/fernando.tar
INFO: adaptive_loss: null ema_loss: null iter: 0 total_steps: 0 train_tokens: null wandb_run_id: 7spfifem
['transforms', 'clip_preprocess', 'gdf', 'sampling_configs', 'effnet_preprocess'] Training with batch size 4 (1/GPU) ['dataset', 'dataloader', 'iterator'] DATA: dataloader: DataLoader dataset: WebDataset iterator: Bucketeer training: NoneType
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]/usr/local/lib/python3.10/dist-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.get(instance, owner)() Loading checkpoint shards: 100%|██████████████████| 2/2 [00:07<00:00, 3.88s/it] Updating tokens: [(49408, '[fernando]')] LoRA training 128 layers ['tokenizer', 'text_model', 'generator', 'effnet', 'previewer', 'lora'] MODELS: effnet: EfficientNetEncoder - trainable params 0 generator: StageC - trainable params 3592249360 generator_ema: NoneType - Not a nn.Module image_model: CLIPVisionModelWithProjection - trainable params 0 lora: ModuleDict - trainable params 3147008 previewer: Previewer - trainable params 0 text_model: CLIPTextModelWithProjection - trainable params 1280 tokenizer: CLIPTokenizerFast - Not a nn.Module training: NoneType - Not a nn.Module
['lora'] OPTIMIZERS: generator: NoneType lora: AdamW training: NoneType
[] SCHEDULERS: lora: GradualWarmupScheduler training: NoneType
['transforms', 'clip_preprocess', 'gdf', 'sampling_configs', 'effnet_preprocess']
['transforms', 'clip_preprocess', 'gdf', 'sampling_configs', 'effnet_preprocess']
EXTRAS:
clip_preprocess: "Compose(\n Resize(size=224, interpolation=bicubic, max_size=None,
\ antialias=warn)\n CenterCrop(size=(224, 224))\n Normalize(mean=(0.48145466,
\ 0.4578275, 0.40821073), std=(0.26862954, 0.26130258, 0.27577711))\n)"
effnet_preprocess: "Compose(\n Normalize(mean=(0.485, 0.456, 0.406), std=(0.229,
\ 0.224, 0.225))\n)"
gdf: <gdf.GDF object at 0x7f40dd905d80>
sampling_configs: '{''cfg'': 5, ''sampler'': <gdf.samplers.DDPMSampler object at 0x7f40dd907430>,
''shift'': 1, ''timesteps'': 20}'
training: None
transforms: "Compose(\n ToTensor()\n Resize(size=768, interpolation=bilinear,
\ max_size=None, antialias=True)\n SmartCrop(\n (saliency_model): MicroResNet(\n
\ (downsampler): Sequential(\n (0): ReflectionPad2d((4, 4, 4, 4))\n
\ (1): Conv2d(3, 8, kernel_size=(9, 9), stride=(4, 4))\n (2): InstanceNorm2d(8,
\ eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)\n (3): ReLU()\n
\ (4): ReflectionPad2d((1, 1, 1, 1))\n (5): Conv2d(8, 16, kernel_size=(3,
\ 3), stride=(2, 2))\n (6): InstanceNorm2d(16, eps=1e-05, momentum=0.1, affine=True,
\ track_running_stats=False)\n (7): ReLU()\n (8): ReflectionPad2d((1,
\ 1, 1, 1))\n (9): Conv2d(16, 32, kernel_size=(3, 3), stride=(2, 2))\n
\ (10): InstanceNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)\n
\ (11): ReLU()\n )\n (residual): Sequential(\n (0): ResBlock(\n
\ (resblock): Sequential(\n (0): ReflectionPad2d((1, 1, 1, 1))\n
\ (1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))\n (2):
\ InstanceNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)\n
\ (3): ReLU()\n (4): ReflectionPad2d((1, 1, 1, 1))\n
\ (5): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))\n (6): InstanceNorm2d(32,
\ eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)\n )\n
\ )\n (1): Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1), groups=32,
\ bias=False)\n (2): ResBlock(\n (resblock): Sequential(\n
\ (0): ReflectionPad2d((1, 1, 1, 1))\n (1): Conv2d(64, 64, kernel_size=(3,
\ 3), stride=(1, 1))\n (2): InstanceNorm2d(64, eps=1e-05, momentum=0.1,
\ affine=True, track_running_stats=False)\n (3): ReLU()\n (4):
\ ReflectionPad2d((1, 1, 1, 1))\n (5): Conv2d(64, 64, kernel_size=(3, 3),
\ stride=(1, 1))\n (6): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True,
\ track_running_stats=False)\n )\n )\n )\n (segmentator): Sequential(\n
\ (0): ReflectionPad2d((1, 1, 1, 1))\n (1): Conv2d(64, 16, kernel_size=(3,
\ 3), stride=(1, 1))\n (2): InstanceNorm2d(16, eps=1e-05, momentum=0.1, affine=True,
\ track_running_stats=False)\n (3): ReLU()\n (4): Upsample2d()\n
\ (5): ReflectionPad2d((4, 4, 4, 4))\n (6): Conv2d(16, 1, kernel_size=(9, 9),
\ stride=(1, 1))\n (7): Sigmoid()\n )\n )\n)\n)"
TRAINING STARTING...
STARTING AT STEP: 1/40000
0%| | 0/40000 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/workspace/StableCascade/train/train_c_lora.py", line 332, in
Per this: https://github.com/Stability-AI/StableCascade/issues/26
You likely need to downsize to the 1B Model. Update your config:
model_version: 1B
and
generator_checkpoint_path: models/stage_c_lite_bf16.safetensors
Thanks mate