CLIP Training Example Bug - Overfitting
System Info
-
transformersversion: 4.40.1 - Platform: Linux-5.15.0-1053-gcp-x86_64-with-glibc2.29
- Python version: 3.8.10
- Huggingface_hub version: 0.23.0
- Safetensors version: 0.4.1
- Accelerate version: 0.26.1
- Accelerate config: - compute_environment: LOCAL_MACHINE
- distributed_type: MULTI_GPU
- mixed_precision: fp16
- use_cpu: False
- debug: False
- num_processes: 4
- machine_rank: 0
- num_machines: 1
- gpu_ids: all
- rdzv_backend: static
- same_network: True
- main_training_function: main
- downcast_bf16: no
- tpu_use_cluster: False
- tpu_use_sudo: False
- tpu_env: []
- PyTorch version (GPU?): 2.1.0+cu118 (True)
- Tensorflow version (GPU?): 2.13.1 (True)
- Flax version (CPU?/GPU?/TPU?): 0.7.2 (cpu)
- Jax version: 0.4.13
- JaxLib version: 0.4.13
- Using GPU in script?: YES
- Using distributed or parallel set-up in script?: NO
Who can help?
@patil-suraj I am training a CLIP for sanskrit. Although, it keeps diverging. I have tried changing weight-decay 0.1 to 0.4. Also tried adding early stopping:
early_stop = EarlyStoppingCallback(3)
With standard params(lr 5e-5, weight_decay 0.1, no max_grad_norm)
2nd Run: Change Weight decay to 0.2, max_grad_norm 0.9)
I tried numerous other trials, and everything diverges.
Information
- [X] The official example scripts
- [X] My own modified scripts
Tasks
- [x] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [X] My own task or dataset (give details below)
Reproduction
I initialized the clip with this script: https://gist.github.com/humanely/b73cb9e53c3879c3bc50890f6318cb7b The tokenizer built using this: https://gist.github.com/humanely/fdf2f46d37cc9de8d84e1579006a7828
Expected behavior
Expected a normal convergence of eval_loss for CLIP to under 0.1.
Other discussions: https://github.com/huggingface/diffusers/issues/7836 https://github.com/huggingface/diffusers/discussions/7835
Hi @humanely, thanks for raising an issue!
This is a question best placed in our forums. We try to reserve the github issues for feature requests and bug reports.