lingvo icon indicating copy to clipboard operation
lingvo copied to clipboard

What's the difference between decoder and evaler?

Open DapengFeng opened this issue 5 years ago • 0 comments

When I run the script bazel-bin/lingvo/trainer --job=evaler_test --run_locally=gpu --model=car.waymo.StarNetVehicle --logdir=/data3/fengdapeng/waymo/log --logtostderr, I get the results eval_test: step: 3120, error/center_distance: 0, error/height: 0, error/length: 0, error/rotation_deg: 0, error/width: 0, loss: 8.7859153e-05, loss/classification: 8.7859153e-05, loss/regression: 0, loss/regression/corner: 0, loss/regression/dim: 0, loss/regression/loc: 0, loss/regression/rot: 0, num_samples_in_batch: 4, but without the ap metrics.

When I run the script bazel-bin/lingvo/trainer --job=decoder_test --run_locally=gpu --model=car.waymo.StarNetVehicle --logdir=/data3/fengdapeng/waymo/log --logtostderr, but I get the errors `ERROR:tensorflow:Session failed to close after 30 seconds. Continuing after this point may leave your program in an undefined state. E0323 11:33:13.130135 139881355396864 session.py:1639] Session failed to close after 30 seconds. Continuing after this point may leave your program in an undefined state. E0323 11:33:13.131589 139881355396864 base_runner.py:239] decoder_dev done (fatal error): <class 'AssertionError'> I0323 11:33:13.131879 139881355396864 base_runner.py:113] decoder_dev exception: 0 vs. 5

E0323 11:33:13.160843 139881355396864 base_runner.py:246] Traceback (most recent call last): E0323 11:33:13.161247 139881355396864 base_runner.py:246] File "/data3/fengdapeng/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/base_runner.py", line 195, in _RunLoop E0323 11:33:13.161435 139881355396864 base_runner.py:246] loop_func(*loop_args) E0323 11:33:13.161572 139881355396864 base_runner.py:246] File "/data3/fengdapeng/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/trainer.py", line 1199, in _Loop E0323 11:33:13.161701 139881355396864 base_runner.py:246] if not path or self.DecodeCheckpoint(sess, path): E0323 11:33:13.161825 139881355396864 base_runner.py:246] File "/data3/fengdapeng/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/trainer.py", line 1254, in DecodeCheckpoint E0323 11:33:13.161944 139881355396864 base_runner.py:246] decode_out = self._model_task.PostProcessDecodeOut(dec_out, dec_metrics) E0323 11:33:13.162065 139881355396864 base_runner.py:246] File "/data3/fengdapeng/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/tasks/car/point_detector.py", line 299, in PostProcessDecodeOut E0323 11:33:13.162252 139881355396864 base_runner.py:246] dec_metrics_dict) E0323 11:33:13.162390 139881355396864 base_runner.py:246] File "/data3/fengdapeng/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/tasks/car/waymo/waymo_decoder.py", line 248, in PostProcessDecodeOut E0323 11:33:13.162521 139881355396864 base_runner.py:246] detection_heights_in_pixels=heights, E0323 11:33:13.162678 139881355396864 base_runner.py:246] File "/data3/fengdapeng/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/tasks/car/ap_metric.py", line 430, in Update E0323 11:33:13.162800 139881355396864 base_runner.py:246] num_points, rotations, speed)) E0323 11:33:13.162916 139881355396864 base_runner.py:246] File "/data3/fengdapeng/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/tasks/car/ap_metric.py", line 205, in _AddGroundtruth E0323 11:33:13.163055 139881355396864 base_runner.py:246] '{} vs. {}'.format(classid, self.metadata.NumClasses())) E0323 11:33:13.163175 139881355396864 base_runner.py:246] AssertionError: 0 vs. 5 E0323 11:33:13.163317 139881355396864 base_runner.py:246] 2020-03-23 11:33:14.484811: I lingvo/core/ops/record_yielder.cc:365] 0x7f35d64bb140Basic record yielder exit`

DapengFeng avatar Mar 23 '20 03:03 DapengFeng