Question regarding loading pretraind weights
How can I load pretrained weights provided in your repository during training time?
[COMMAND] --resume-from {A MODEL PATH} or [COMMAND] --cfg-options load_from={A MODEL PATH}. The former one will load both model and optimizer state.
Should I add this in the script file dist_train_partially.sh?
No. Just append it to your command. For eg, bash tools/dist_train_partially.sh semi 0 10 8 --resume-from {A MODEL PATH}.
I followed your instructions. But I am having this error
Traceback (most recent call last):
File "/hdd/purbayan/SoftTeacher/tools/train.py", line 198, in <module>
main()
File "/hdd/purbayan/SoftTeacher/tools/train.py", line 186, in main
train_detector(
File "/hdd/purbayan/SoftTeacher/tools/ssod/apis/train.py", line 205, in train_detector
runner.load_checkpoint(cfg.load_from)
File "/hdd/purbayan/envs/env_st/lib/python3.9/site-packages/mmcv/runner/base_runner.py", line 337, in load_checkpoint
return load_checkpoint(
File "/hdd/purbayan/envs/env_st/lib/python3.9/site-packages/mmcv/runner/checkpoint.py", line 528, in load_checkpoint
checkpoint = _load_checkpoint(filename, map_location, logger)
File "/hdd/purbayan/envs/env_st/lib/python3.9/site-packages/mmcv/runner/checkpoint.py", line 467, in _load_checkpoint
return CheckpointLoader.load_checkpoint(filename, map_location, logger)
File "/hdd/purbayan/envs/env_st/lib/python3.9/site-packages/mmcv/runner/checkpoint.py", line 245, in load_checkpoint
return checkpoint_loader(filename, map_location)
File "/hdd/purbayan/envs/env_st/lib/python3.9/site-packages/mmcv/runner/checkpoint.py", line 262, in load_from_local
checkpoint = torch.load(filename, map_location=map_location)
File "/hdd/purbayan/envs/env_st/lib/python3.9/site-packages/torch/serialization.py", line 608, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/hdd/purbayan/envs/env_st/lib/python3.9/site-packages/torch/serialization.py", line 777, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
EOFError: Ran out of input
index created!
2022-01-20 19:19:52,119 - mmdet.ssod - INFO - load checkpoint from local path: /hdd/purbayan/SoftTeacher/pretrained_weights/1/split-1/iter_180000.pth
Traceback (most recent call last):
File "/hdd/purbayan/SoftTeacher/tools/train.py", line 198, in <module>
main()
File "/hdd/purbayan/SoftTeacher/tools/train.py", line 186, in main
train_detector(
File "/hdd/purbayan/SoftTeacher/tools/ssod/apis/train.py", line 205, in train_detector
runner.load_checkpoint(cfg.load_from)
File "/hdd/purbayan/envs/env_st/lib/python3.9/site-packages/mmcv/runner/base_runner.py", line 337, in load_checkpoint
return load_checkpoint(
File "/hdd/purbayan/envs/env_st/lib/python3.9/site-packages/mmcv/runner/checkpoint.py", line 528, in load_checkpoint
checkpoint = _load_checkpoint(filename, map_location, logger)
File "/hdd/purbayan/envs/env_st/lib/python3.9/site-packages/mmcv/runner/checkpoint.py", line 467, in _load_checkpoint
return CheckpointLoader.load_checkpoint(filename, map_location, logger)
File "/hdd/purbayan/envs/env_st/lib/python3.9/site-packages/mmcv/runner/checkpoint.py", line 245, in load_checkpoint
return checkpoint_loader(filename, map_location)
File "/hdd/purbayan/envs/env_st/lib/python3.9/site-packages/mmcv/runner/checkpoint.py", line 262, in load_from_local
checkpoint = torch.load(filename, map_location=map_location)
File "/hdd/purbayan/envs/env_st/lib/python3.9/site-packages/torch/serialization.py", line 608, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/hdd/purbayan/envs/env_st/lib/python3.9/site-packages/torch/serialization.py", line 777, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
EOFError: Ran out of input
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3938940) of binary: /hdd/purbayan/envs/env_st/bin/python
Traceback (most recent call last):
File "/hdd/purbayan/envs/env_st/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/hdd/purbayan/envs/env_st/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/hdd/purbayan/envs/env_st/lib/python3.9/site-packages/torch/distributed/launch.py", line 193, in <module>
main()
File "/hdd/purbayan/envs/env_st/lib/python3.9/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/hdd/purbayan/envs/env_st/lib/python3.9/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/hdd/purbayan/envs/env_st/lib/python3.9/site-packages/torch/distributed/run.py", line 710, in run
elastic_launch(
File "/hdd/purbayan/envs/env_st/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/hdd/purbayan/envs/env_st/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 259, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
tools/train.py FAILED
------------------------------------------------------------
Failures:
[1]:
time : 2022-01-20_19:19:59
host : insrisrvsr-0275
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 3938941)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2022-01-20_19:19:59
host : insrisrvsr-0275
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 3938940)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
It seems the model file is corrupted. Which model have you tried? I will have a try on my machine.
I have tried this one https://drive.google.com/drive/folders/1QA8sAw49DJiMHF-Cr7q0j7KgKjlJyklV. These are the checkpoints for 1% labelled data provided in your repository.
It seems some files are corrupted due to my google one is expired. I have updated the models. Cloud you have a try again? https://drive.google.com/file/d/1dUWoWDmYqNBx6lko59xrs2ZMGGuzn_5y/view?usp=sharing
I have tried https://drive.google.com/file/d/1dUWoWDmYqNBx6lko59xrs2ZMGGuzn_5y/view?usp=sharing this file. It is working perfectly now. Thank you very much for the prompt responses. Are the weights files in the repository updated now?
I have checked it and the generated link is not changed.