transformers icon indicating copy to clipboard operation
transformers copied to clipboard

CLIP Training Example Bug - Overfitting

Open humanely opened this issue 1 year ago • 1 comments

System Info

  • transformers version: 4.40.1
  • Platform: Linux-5.15.0-1053-gcp-x86_64-with-glibc2.29
  • Python version: 3.8.10
  • Huggingface_hub version: 0.23.0
  • Safetensors version: 0.4.1
  • Accelerate version: 0.26.1
  • Accelerate config: - compute_environment: LOCAL_MACHINE
    • distributed_type: MULTI_GPU
    • mixed_precision: fp16
    • use_cpu: False
    • debug: False
    • num_processes: 4
    • machine_rank: 0
    • num_machines: 1
    • gpu_ids: all
    • rdzv_backend: static
    • same_network: True
    • main_training_function: main
    • downcast_bf16: no
    • tpu_use_cluster: False
    • tpu_use_sudo: False
    • tpu_env: []
  • PyTorch version (GPU?): 2.1.0+cu118 (True)
  • Tensorflow version (GPU?): 2.13.1 (True)
  • Flax version (CPU?/GPU?/TPU?): 0.7.2 (cpu)
  • Jax version: 0.4.13
  • JaxLib version: 0.4.13
  • Using GPU in script?: YES
  • Using distributed or parallel set-up in script?: NO

Who can help?

@patil-suraj I am training a CLIP for sanskrit. Although, it keeps diverging. I have tried changing weight-decay 0.1 to 0.4. Also tried adding early stopping: early_stop = EarlyStoppingCallback(3) With standard params(lr 5e-5, weight_decay 0.1, no max_grad_norm) image

2nd Run: Change Weight decay to 0.2, max_grad_norm 0.9) image

I tried numerous other trials, and everything diverges.

Information

  • [X] The official example scripts
  • [X] My own modified scripts

Tasks

  • [x] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [X] My own task or dataset (give details below)

Reproduction

I initialized the clip with this script: https://gist.github.com/humanely/b73cb9e53c3879c3bc50890f6318cb7b The tokenizer built using this: https://gist.github.com/humanely/fdf2f46d37cc9de8d84e1579006a7828

Expected behavior

Expected a normal convergence of eval_loss for CLIP to under 0.1.

Other discussions: https://github.com/huggingface/diffusers/issues/7836 https://github.com/huggingface/diffusers/discussions/7835

humanely avatar May 06 '24 18:05 humanely

Hi @humanely, thanks for raising an issue!

This is a question best placed in our forums. We try to reserve the github issues for feature requests and bug reports.

amyeroberts avatar May 07 '24 08:05 amyeroberts