SRe2L icon indicating copy to clipboard operation
SRe2L copied to clipboard

[KeyError: 542]running SRe2L/validate/train_FKD.py with given config after mod pytorch code

Open Luo-Zhongwei opened this issue 1 year ago • 5 comments

cd /home/zhanglf/lzw/code/SRe2L_ ; /usr/bin/env /home/zhanglf/anaconda3/envs/iid/bin/python /home/zhanglf/.vscode-server/extensions/ms-python.debugpy-2024.8.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher 35311 -- /home/zhanglf/lzw/code/SRe2L_/SRe2L/validate/train_FKD.py --batch-size 1000 --gradient-accumulation-steps 2 --model resnet18 --cos -j 4 -T 20 --mix-type cutmix --output-dir ./save/val_rn18_fkd/rn18_\[4K\]_T20/ --train-dir /home/zhanglf/lzw/code/SRe2L_/syn_data/rn18_bn0.01_\[4K\]_x_l2_x_tv.crop --val-dir /data/ImageNet/val --fkd-path /home/zhanglf/lzw/code/SRe2L_/FKD_cutmix_fp16 wandb: Currently logged in as: lsy. Use wandb login --reloginto force relogin wandb: Tracking run with wandb version 0.16.6 wandb: Run data is saved locally in /home/zhanglf/lzw/code/SRe2L_/wandb/run-20240903_080243-ltj9xxlo wandb: Runwandb offline` to turn off syncing. wandb: Syncing run generous-monkey-5 wandb: ⭐️ View project at https://wandb.ai/lsy/Temperature wandb: 🚀 View run at https://wandb.ai/lsy/Temperature/runs/ltj9xxlo ======= FKD: dataset info ====== path: /home/zhanglf/lzw/code/SRe2L_/FKD_cutmix_fp16 num img: 1476 batch size: 1000 max epoch: 300

load data successfully => loading student model 'resnet18' /home/zhanglf/anaconda3/envs/iid/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead. warnings.warn( /home/zhanglf/anaconda3/envs/iid/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing weights=None. warnings.warn(msg)

Epoch: 0 Traceback (most recent call last): File "/home/zhanglf/anaconda3/envs/iid/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/zhanglf/anaconda3/envs/iid/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/zhanglf/.vscode-server/extensions/ms-python.debugpy-2024.8.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/main.py", line 39, in cli.main() File "/home/zhanglf/.vscode-server/extensions/ms-python.debugpy-2024.8.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main run() File "/home/zhanglf/.vscode-server/extensions/ms-python.debugpy-2024.8.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file runpy.run_path(target, run_name="main") File "/home/zhanglf/.vscode-server/extensions/ms-python.debugpy-2024.8.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path return _run_module_code(code, init_globals, run_name, File "/home/zhanglf/.vscode-server/extensions/ms-python.debugpy-2024.8.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code _run_code(code, mod_globals, init_globals, File "/home/zhanglf/.vscode-server/extensions/ms-python.debugpy-2024.8.0-linux-x64/bundled/libs/debugpy/vendored/pydevd/pydevd_bundle/pydevd_runpy.py", line 124, in run_code exec(code, run_globals) File "/home/zhanglf/lzw/code/SRe2L/SRe2L/validate/train_FKD.py", line 363, in main() File "/home/zhanglf/lzw/code/SRe2L/SRe2L/validate/train_FKD.py", line 182, in main train(model, args, epoch) File "/home/zhanglf/lzw/code/SRe2L/SRe2L/validate/train_FKD.py", line 222, in train for batch_idx, batch_data in enumerate(args.train_loader): File "/home/zhanglf/anaconda3/envs/iid/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 681, in next data = self._next_data() File "/home/zhanglf/anaconda3/envs/iid/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1376, in _next_data return self._process_data(data) File "/home/zhanglf/anaconda3/envs/iid/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data data.reraise() File "/home/zhanglf/anaconda3/envs/iid/lib/python3.9/site-packages/torch/_utils.py", line 461, in reraise raise exception KeyError: Caught KeyError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/zhanglf/anaconda3/envs/iid/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop data = fetcher.fetch(index) File "/home/zhanglf/anaconda3/envs/iid/lib/python3.9/site-packages/torch/utils/data/utils/fetch.py", line 57, in fetch mix_index, mix_lam, mix_bbox, soft_label = self.dataset.load_batch_config(possibly_batched_index[0]) File "/home/zhanglf/lzw/code/SRe2L/SRe2L/relabel/utils_fkd.py", line 170, in load_batch_config batch_idx = self.img2batch_idx_list[self.epoch][img_idx] KeyError: 542`

Luo-Zhongwei avatar Sep 03 '24 00:09 Luo-Zhongwei

Could you print the value of self.epoch and img_idx to provide more information for debugging?

zeyuanyin avatar Sep 03 '24 08:09 zeyuanyin

self.epoch您能否打印和的值img_idx以提供更多调试信息?

hi, thanks for replying

Epoch: 0 self.epoch: 0 img_idx: 542

Luo-Zhongwei avatar Sep 03 '24 18:09 Luo-Zhongwei

hi,Epoch: 0 self.epoch: 0 img_idx: 542

Luo-Zhongwei avatar Sep 17 '24 09:09 Luo-Zhongwei

I can't reproduce your error case. Did you follow the instructions at https://github.com/VILA-Lab/SRe2L/tree/main/SRe2L/validate to conduct the experiments? Or did you modify any settings, like batch-size? Please disclose more details which will help me to figure out why the error happened.

zeyuanyin avatar Sep 23 '24 08:09 zeyuanyin

我无法重现您的错误情况。您是否按照https://github.com/VILA-Lab/SRe2L/tree/main/SRe2L/validate上的说明进行实验?或者您是否修改了任何设置,例如batch-size?请透露更多详细信息,这将有助于我找出错误发生的原因。

ok, I will provide more info, thanks for reply

Luo-Zhongwei avatar Sep 29 '24 08:09 Luo-Zhongwei