Mask3D icon indicating copy to clipboard operation
Mask3D copied to clipboard

Training ScanNet200 Error

Open xiaotiancai899 opened this issue 2 years ago • 1 comments

When I train ScanNet200 dataset using this command "python main_instance_segmentation.py general.experiment_name="scannet200" general.project_name="scannet200" data/datasets=scannet200 general.num_targets=201 data.num_labels=200 general.eval_on_segments=true general.train_on_segments=true"

When at the epoch87, error occured. Epoch 85: 100%|█████████| 241/241 [04:45<00:00, 1.19s/it, loss=131, v_num=t200]Checkpoint created Epoch 86: 100%|█████████| 241/241 [04:45<00:00, 1.19s/it, loss=128, v_num=t200]Checkpoint created Epoch 87: 52%|████▋ | 125/241 [02:30<02:20, 1.21s/it, loss=135, v_num=t200]Traceback (most recent call last): File "/home/pub-7/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/hydra/_internal/utils.py", line 198, in run_and_report return func() File "/home/pub-7/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/hydra/_internal/utils.py", line 347, in lambda: hydra.run( File "/home/pub-7/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 107, in run return run_job( File "/home/pub-7/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/hydra/core/utils.py", line 128, in run_job ret.return_value = task_function(task_cfg) File "/home/pub-7/Documents/Mask3D/main_instance_segmentation.py", line 108, in main train(cfg) File "/home/pub-7/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/hydra/main.py", line 27, in decorated_main return task_function(cfg_passthrough) File "/home/pub-7/Documents/Mask3D/main_instance_segmentation.py", line 84, in train runner.fit(model) File "/home/pub-7/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 696, in fit self._call_and_handle_interrupt( File "/home/pub-7/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 650, in _call_and_handle_interrupt return trainer_fn(*args, **kwargs) File "/home/pub-7/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 737, in _fit_impl results = self._run(model, ckpt_path=self.ckpt_path) File "/home/pub-7/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1168, in _run results = self._run_stage() File "/home/pub-7/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1254, in _run_stage return self._run_train() File "/home/pub-7/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1285, in _run_train self.fit_loop.run() File "/home/pub-7/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run self.advance(*args, **kwargs) File "/home/pub-7/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 270, in advance self._outputs = self.epoch_loop.run(self._data_fetcher) File "/home/pub-7/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run self.advance(*args, **kwargs) File "/home/pub-7/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 174, in advance batch = next(data_fetcher) File "/home/pub-7/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/pytorch_lightning/utilities/fetching.py", line 185, in next return self.fetching_function() File "/home/pub-7/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/pytorch_lightning/utilities/fetching.py", line 264, in fetching_function self._fetch_next_batch(self.dataloader_iter) File "/home/pub-7/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/pytorch_lightning/utilities/fetching.py", line 278, in _fetch_next_batch batch = next(iterator) File "/home/pub-7/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/pytorch_lightning/trainer/supporters.py", line 558, in next return self.request_next_batch(self.loader_iters) File "/home/pub-7/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/pytorch_lightning/trainer/supporters.py", line 570, in request_next_batch return apply_to_collection(loader_iters, Iterator, next) File "/home/pub-7/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/pytorch_lightning/utilities/apply_func.py", line 100, in apply_to_collection return function(data, *args, **kwargs) File "/home/pub-7/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 681, in next data = self._next_data() File "/home/pub-7/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1356, in _next_data return self._process_data(data) File "/home/pub-7/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data data.reraise() File "/home/pub-7/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/torch/_utils.py", line 461, in reraise raise exception IndexError: Caught IndexError in DataLoader worker process 1. Original Traceback (most recent call last): File "/home/pub-7/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop data = fetcher.fetch(index) File "/home/pub-7/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/pub-7/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 49, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/pub-7/Documents/Mask3D/datasets/semseg.py", line 486, in getitem coordinates = elastic_distortion( File "/home/pub-7/Documents/Mask3D/datasets/semseg.py", line 832, in elastic_distortion pointcloud[:, :3] = coords + interp(coords) * magnitude File "/home/pub-7/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/scipy/interpolate/_rgi.py", line 325, in call result = self._evaluate_linear(indices, File "/home/pub-7/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/scipy/interpolate/_rgi.py", line 358, in _evaluate_linear values += np.asarray(self.values[edge_indices]) * weight[vslice] IndexError: index 70368744177703 is out of bounds for axis 1 with size 54

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/pub-7/Documents/Mask3D/main_instance_segmentation.py", line 114, in main() File "/home/pub-7/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/hydra/main.py", line 32, in decorated_main _run_hydra( File "/home/pub-7/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/hydra/_internal/utils.py", line 346, in _run_hydra run_and_report( File "/home/pub-7/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/hydra/_internal/utils.py", line 267, in run_and_report print_exception(etype=None, value=ex, tb=final_tb) # type: ignore TypeError: print_exception() got an unexpected keyword argument 'etype' wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing. wandb: / 1.048 MB of 1.048 MB uploaded (0.000 MB deduped) wandb: Run history: wandb: epoch ▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███ wandb: lr-AdamW ▁▁▁▁▁▁▁▁▁▁▂▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▅▅▅▆▆▆▆▇▇▇▇██ wandb: train_loss_ce █▇▅▆▆▅▆▅▅▅▄▅▅▅▃▄▄▃▄▃▃▃▃▄▃▃▃▃▄▃▃▃▂▂▃▃▂▂▁▁ wandb: train_loss_ce_0 ████████████▇█▇▇▇▇▇▇▇▇▇▆▆▆▆▅▆▅▅▅▄▃▄▄▃▃▁▁ wandb: train_loss_ce_1 ██▆▆▆▅▆▅▅▅▄▅▅▄▃▄▄▃▄▃▃▃▃▄▃▃▃▃▅▃▃▃▃▃▄▄▃▃▁▁ wandb: train_loss_ce_10 █▇▆▆▅▅▆▅▅▅▄▅▅▄▃▄▄▃▄▃▃▃▃▄▃▃▃▃▄▃▃▃▂▂▃▃▂▂▁▁

Any ideas about that? Thanks so much in advance.

xiaotiancai899 avatar Jun 10 '23 02:06 xiaotiancai899

@JonasSchult

xiaotiancai899 avatar Jun 10 '23 06:06 xiaotiancai899