adanet Change dataset and caused NaN loss

Hi, while following the tutorial example in SimpleDnn, I changed the dataset which is after one-hot process, have 43 features, 11 labels. However I faced the problem shows 'Nan loss during training'. I check the labels from 1-11 which doesn't including any zero. Does anyone have the same problem? Besides, when adanet goes to second layer, the report becomes: INFO: tensorflow: Report materialization [1000/??] What can this problem be?

Feb 26 '19 21:02 kylechang523

Try removing the ReportMaterializer from the Estimator constructor you are using. However I the NaN could be caused from several things. The most common is dividing by zero or applying log to zero.

Can you copy here what your estimator construction code looks like?

Feb 27 '19 03:02 cweill

Try removing the ReportMaterializer from the Estimator constructor you are using. However I the NaN could be caused from several things. The most common is dividing by zero or applying log to zero.

Can you copy here what your estimator construction code looks like?

I didn't change any codes from the Dnn Example, thus I think the estimator construction is totally same. I already tried adding 0.01 on each features and changed labels from 0-10 to 1-11, thus there shouldn't be any zero for the input.

'estimator = adanet.Estimator( # Since we are predicting housing prices, we'll use a regression # head that optimizes for MSE. head=tf.contrib.estimator.regression_head( loss_reduction=tf.losses.Reduction.SUM_OVER_BATCH_SIZE),

  # Define the generator, which defines our search space of subnetworks
  # to train as candidates to add to the final AdaNet model.
  subnetwork_generator=SimpleDNNGenerator(
      optimizer=tf.train.RMSPropOptimizer(learning_rate=LEARNING_RATE),
      learn_mixture_weights=learn_mixture_weights,
      seed=RANDOM_SEED),

  # Lambda is a the strength of complexity regularization. A larger
  # value will penalize more complex subnetworks.
  adanet_lambda=adanet_lambda,

  # The number of train steps per iteration.
  max_iteration_steps=TRAIN_STEPS // ADANET_ITERATIONS,

  # The evaluator will evaluate the model on the full training set to
  # compute the overall AdaNet loss (train loss + complexity
  # regularization) to select the best candidate to include in the
  # final AdaNet model.
  evaluator=adanet.Evaluator(
      input_fn=input_fn("train", training=False, batch_size=BATCH_SIZE)),

  # Configuration for Estimators.
  config=tf.estimator.RunConfig(
      save_summary_steps=5000,
      save_checkpoints_steps=5000,
      tf_random_seed=RANDOM_SEED,
      model_dir=model_dir))`

Feb 27 '19 15:02 kylechang523

@kylechang523 : Hello, I am facing a similar problem. Did you find a solution for it? Thanks!

Mar 18 '19 13:03 priyanka-chaudhary

@kylechang523 @priyanka-chaudhary

You can also get NaNs during training if your learning rate is too high. Try decreasing the learning rate to see if that makes a difference.

Mar 22 '19 19:03 EugenHotaj

I already tried that but still the same.

On Fri, Mar 22, 2019 at 15:58 Eugen Hotaj [email protected] wrote:

@kylechang523 https://github.com/kylechang523 @priyanka-chaudhary https://github.com/priyanka-chaudhary

You can also get NaNs during training if your learning rate is too high. Try decreasing the learning rate to see if that makes a difference.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/adanet/issues/69#issuecomment-475760220, or mute the thread https://github.com/notifications/unsubscribe-auth/AgB4h53HEZBZT0OVSdCk8zK165Uh0BYbks5vZTXQgaJpZM4bTKGp .

Mar 22 '19 20:03 kylechang523

If you don't mind building from HEAD on master, you can try the new adanet.Estimator(debug=True) parameter for finding NaNs in your datasets. It will be available in our upcoming v0.6.0 release.

Mar 26 '19 01:03 cweill

In my case, the NaN values were a result of NaN in the training datasets , while I was working on multiclass classifier , the problem was a dataframe positional filter on the [ one hot encoding ] labels.

Resolving the the target dataset resolved my issue - hope this help someone else. Best of luck.

Jul 21 '20 19:07 Gergues