Thomas-MMJ

Results 55 comments of Thomas-MMJ

> i tried `pip install triton==1.0.0` There is no pypi 1.0.0 build for triton for windows. There is nothing recent. You will have to get triton built from source on...

Updated to latest, but I still get the two fails if I run the one test first. pytest ./tests/test_layers_utils.py::AttentionBlockTest ./tests/test_models_unet.py ================================================================================= short test summary info ================================================================================== FAILED tests/test_models_unet.py::UNetLDMModelTests::test_from_pretrained_accelerate - IndexError:...

Now this is bizarre, if I run test_attention_block_default then test_from_pretrained_accelerate immediately after each other, then test_from_pretrained_accelerate passes; if I run all three, the first two pass, and the third fails....

Changing the order changes the results also, here none fail. pytest ./tests/test_layers_utils.py::AttentionBlockTests::test_attention_block_default ./tests/test_models_unet.py::UNetLDMModelTests::test_from_pretrained_accelerate_wont_change_results ./tests/test_models_unet.py::UNetLDMModelTests::test_from_pretrained_accelerate ./tests/test_models_unet.py::UNetLDMModelTests::test_from_pretrained_hub here 1 fails, pytest ./tests/test_layers_utils.py::AttentionBlockTests::test_attention_block_default ./tests/test_models_unet.py::UNetLDMModelTests::test_from_pretrained_accelerate ./tests/test_models_unet.py::UNetLDMModelTests::test_from_pretrained_accelerate_wont_change_results ./tests/test_models_unet.py::UNetLDMModelTests::test_from_pretrained_hub

Note that it is reproducable in wsl linux debian on this same device. the debian is using different pytorch, etc.

@sgugger suggested that the memory wasn't being cleared, if I add ``` def clear_memory(self): if torch.cuda.is_available(): torch.cuda.synchronize() torch.cuda.empty_cache() # https://forums.fast.ai/t/clearing-gpu-memory-pytorch/14637 gc.collect() ``` from https://www.programcreek.com/python/?CodeExample=clear+memory and run it at the end...

I've submitted a pull request to fix this, hopefully will be reviewed and committed this coming week.

To work on a 3090 with 12GB you need to use deepspeed. ``` accelerate launch --use_deepspeed --zero_stage=2 --gradient_accumulation_steps=1 --offload_param_device=cpu --offload_optimizer_device=cpu train_dreambooth.py \ --pretrained_model_name_or_path=$MODEL_NAME --use_auth_token \ --instance_data_dir=$INSTANCE_DIR \ --class_data_dir=$CLASS_DIR \ --output_dir=$OUTPUT_DIR...