PySGG icon indicating copy to clipboard operation
PySGG copied to clipboard

bgnn training problem during validation processing (images_per_batch can only be one when at validation process)

Open hszhoushen opened this issue 3 years ago • 0 comments

When training on tesla v100, e.g., The training on VG dataset can be fed with 12 images at a time, however, it seems one card can only validate one image at a time during the validation process? Is there any chance to validate 12 images at one time during validation?

Training .sh python tools/relation_train_net.py \ --config-file "configs/e2e_relBGNN_vg.yaml" \ DEBUG False \ EXPERIMENT_NAME "BGNN-PreCls" \ SOLVER.IMS_PER_BATCH $[3*4] \ TEST.IMS_PER_BATCH $[4] \ SOLVER.VAL_PERIOD 3000 \ SOLVER.CHECKPOINT_PERIOD 3000 \ MODEL.ROI_RELATION_HEAD.USE_GT_BOX True \ MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL True \

Problem encountered: `` instance name: sgdet-BGNNPredictor/(2022-07-01_13)BGNN-PreCls(resampling) elapsed time: 0:06:51 eta: 3 days, 7:48:18 iter: 100/70000 loss: 0.6129 (0.7214) loss_rel: 0.1183 (0.1323) pre_rel_classify_loss_iter-0: 0.1641 (0.2069) pre_rel_classify_loss_iter-1: 0.1628 (0.1891) pre_rel_classify_loss_iter-2: 0.1618 (0.1932) time: 3.9448 (4.1101) data: 0.0559 (0.0689) lr: 0.026707 max mem: 19994

[07/01 13:31:28 pysgg]: relness module pretraining.. [07/01 13:31:28 pysgg]: Start validating [07/01 13:31:28 pysgg]: Start evaluation on VG_stanford_filtered_with_attribute_val dataset(5000 images). 0%| | 0/417 [00:06<?, ?it/s] Traceback (most recent call last): File "tools/relation_train_net.py", line 714, in main() File "tools/relation_train_net.py", line 705, in main model = train(cfg, args.local_rank, args.distributed, logger) File "tools/relation_train_net.py", line 496, in train val_result = run_val(cfg, model, val_data_loaders, distributed, logger) File "tools/relation_train_net.py", line 565, in run_val logger=logger, File "/lintianlin_group_v100/lgzhou/scene_graph_generation/bgnn/pysgg/engine/inference.py", line 123, in inference timer=inference_timer, logger=logger) File "/lintianlin_group_v100/lgzhou/scene_graph_generation/bgnn/pysgg/engine/inference.py", line 41, in compute_on_dataset output = model(images.to(device), targets, logger=logger) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.6/dist-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/amp/_initialize.py", line 197, in new_fwd **applier(kwargs, input_caster)) File "/lintianlin_group_v100/lgzhou/scene_graph_generation/bgnn/pysgg/modeling/detector/generalized_rcnn.py", line 52, in forward x, result, detector_losses = self.roi_heads(features, proposals, targets, logger) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/lintianlin_group_v100/lgzhou/scene_graph_generation/bgnn/pysgg/modeling/roi_heads/roi_heads.py", line 69, in forward x, detections, loss_relation = self.relation(features, detections, targets, logger) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/lintianlin_group_v100/lgzhou/scene_graph_generation/bgnn/pysgg/modeling/roi_heads/relation_head/relation_head.py", line 215, in forward logger, File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/lintianlin_group_v100/lgzhou/scene_graph_generation/bgnn/pysgg/modeling/roi_heads/relation_head/roi_relation_predictors.py", line 604, in forward roi_features, union_features, inst_proposals, rel_pair_idxs, rel_binarys, logger File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/lintianlin_group_v100/lgzhou/scene_graph_generation/bgnn/pysgg/modeling/roi_heads/relation_head/model_bgnn.py", line 796, in forward rel_pair_inds, File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/lintianlin_group_v100/lgzhou/scene_graph_generation/bgnn/pysgg/modeling/roi_heads/relation_head/model_msg_passing.py", line 261, in forward obj_embed_by_pred_dist = self.obj_embed_on_prob_dist(obj_labels.long()) AttributeError: 'NoneType' object has no attribute 'long' ``

hszhoushen avatar Jul 01 '22 14:07 hszhoushen