EasyRec icon indicating copy to clipboard operation
EasyRec copied to clipboard

DropoutNet - Use official config & sample data but AUC and loss worsen with more training steps

Open martin0258 opened this issue 1 year ago • 2 comments

Description

I attempted to train the official DropoutNet model using the provided sample Taobao dataset and the sample configuration file. However, during training, I observed that the AUC decreased and the losses increased as the training steps progressed. Based on my understanding, the expected behavior is that the AUC should increase and the losses should decrease as training continues.

Steps to reproduce

OS: Ubuntu 20.04 GPU: 1 NVIDIA RTX 3090 Python: 3.10.16 TensorFlow: 2.14.0 with CUDA

  1. git clone the easyrec repo (commit SHA: https://github.com/alibaba/EasyRec/commit/4b0b1f5a2a990b253737ec532611fa4f9387d372)
  2. install easyrec
  3. download the sample taobao dataset:
wget http://easyrec.oss-cn-beijing.aliyuncs.com/data/git_oss_sample_data/data_test_tb_data_b1579db090d72b3b70b59ba3c7692701 -O tb_data.tar.gz
tar -zxf tb_data.tar.gz
  1. run the training with the sample dropoutnet config and sample dataset
python -m easy_rec.python.train_eval --pipeline_config_path samples/model_config/dropoutnet_on_taobao.config

Actual training result

TensorBoard:

tensorboard --logdir experiments/dropoutnet_taobao_ckpt/eval_val

image

Initial AUC and loss: image

Final AUC and loss: image

Expected behavior

  • AUC should increase with more training steps.
  • Losses should decrease with more training steps.

Could you please confirm if this is expected behavior or if there might be an issue with the sample configuration or dataset? If additional debugging information is needed, I am happy to provide more details.

Thank you!

martin0258 avatar Jan 07 '25 21:01 martin0258

FYR: I could not reproduce the same AUC/loss curve with more steps (2500 -> 25000)

image

martin0258 avatar Jan 07 '25 21:01 martin0258