CLIP Training Example Bug - Overfitting

Open humanely opened this issue 1 year ago • 1 comments

System Info

transformers version: 4.40.1
Platform: Linux-5.15.0-1053-gcp-x86_64-with-glibc2.29
Python version: 3.8.10
Huggingface_hub version: 0.23.0
Safetensors version: 0.4.1
Accelerate version: 0.26.1
Accelerate config: - compute_environment: LOCAL_MACHINE
- distributed_type: MULTI_GPU
- mixed_precision: fp16
- use_cpu: False
- debug: False
- num_processes: 4
- machine_rank: 0
- num_machines: 1
- gpu_ids: all
- rdzv_backend: static
- same_network: True
- main_training_function: main
- downcast_bf16: no
- tpu_use_cluster: False
- tpu_use_sudo: False
- tpu_env: []
PyTorch version (GPU?): 2.1.0+cu118 (True)
Tensorflow version (GPU?): 2.13.1 (True)
Flax version (CPU?/GPU?/TPU?): 0.7.2 (cpu)
Jax version: 0.4.13
JaxLib version: 0.4.13
Using GPU in script?: YES
Using distributed or parallel set-up in script?: NO

Who can help?

@patil-suraj I am training a CLIP for sanskrit. Although, it keeps diverging. I have tried changing weight-decay 0.1 to 0.4. Also tried adding early stopping: early_stop = EarlyStoppingCallback(3) With standard params(lr 5e-5, weight_decay 0.1, no max_grad_norm)

2nd Run: Change Weight decay to 0.2, max_grad_norm 0.9)

I tried numerous other trials, and everything diverges.

Information

[X] The official example scripts
[X] My own modified scripts

Tasks

[x] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

I initialized the clip with this script: https://gist.github.com/humanely/b73cb9e53c3879c3bc50890f6318cb7b The tokenizer built using this: https://gist.github.com/humanely/fdf2f46d37cc9de8d84e1579006a7828

Expected behavior

Expected a normal convergence of eval_loss for CLIP to under 0.1.

Other discussions: https://github.com/huggingface/diffusers/issues/7836 https://github.com/huggingface/diffusers/discussions/7835

May 06 '24 18:05 humanely

Hi @humanely, thanks for raising an issue!

This is a question best placed in our forums. We try to reserve the github issues for feature requests and bug reports.

May 07 '24 08:05 amyeroberts