Thomas-MMJ
Thomas-MMJ
> i tried `pip install triton==1.0.0` There is no pypi 1.0.0 build for triton for windows. There is nothing recent. You will have to get triton built from source on...
Any progress on the blockers?
Updated to latest, but I still get the two fails if I run the one test first. pytest ./tests/test_layers_utils.py::AttentionBlockTest ./tests/test_models_unet.py ================================================================================= short test summary info ================================================================================== FAILED tests/test_models_unet.py::UNetLDMModelTests::test_from_pretrained_accelerate - IndexError:...
Now this is bizarre, if I run test_attention_block_default then test_from_pretrained_accelerate immediately after each other, then test_from_pretrained_accelerate passes; if I run all three, the first two pass, and the third fails....
Changing the order changes the results also, here none fail. pytest ./tests/test_layers_utils.py::AttentionBlockTests::test_attention_block_default ./tests/test_models_unet.py::UNetLDMModelTests::test_from_pretrained_accelerate_wont_change_results ./tests/test_models_unet.py::UNetLDMModelTests::test_from_pretrained_accelerate ./tests/test_models_unet.py::UNetLDMModelTests::test_from_pretrained_hub here 1 fails, pytest ./tests/test_layers_utils.py::AttentionBlockTests::test_attention_block_default ./tests/test_models_unet.py::UNetLDMModelTests::test_from_pretrained_accelerate ./tests/test_models_unet.py::UNetLDMModelTests::test_from_pretrained_accelerate_wont_change_results ./tests/test_models_unet.py::UNetLDMModelTests::test_from_pretrained_hub
Note that it is reproducable in wsl linux debian on this same device. the debian is using different pytorch, etc.
@sgugger suggested that the memory wasn't being cleared, if I add ``` def clear_memory(self): if torch.cuda.is_available(): torch.cuda.synchronize() torch.cuda.empty_cache() # https://forums.fast.ai/t/clearing-gpu-memory-pytorch/14637 gc.collect() ``` from https://www.programcreek.com/python/?CodeExample=clear+memory and run it at the end...
I've submitted a pull request to fix this, hopefully will be reviewed and committed this coming week.
note has been discussed and there are changes I need to make before accepted.
To work on a 3090 with 12GB you need to use deepspeed. ``` accelerate launch --use_deepspeed --zero_stage=2 --gradient_accumulation_steps=1 --offload_param_device=cpu --offload_optimizer_device=cpu train_dreambooth.py \ --pretrained_model_name_or_path=$MODEL_NAME --use_auth_token \ --instance_data_dir=$INSTANCE_DIR \ --class_data_dir=$CLASS_DIR \ --output_dir=$OUTPUT_DIR...