zson icon indicating copy to clipboard operation
zson copied to clipboard

Error at loading pretrain weights

Open zhhiyuan opened this issue 1 year ago • 0 comments

Thank you for your excellent work. I encountered an error regarding a missing configuration file when evaluating the hm3d dataset with your pretrained model using script objnav-eval-v2-hm3d.sh

Traceback (most recent call last): File "/home/mdisk1/heqisheng/embody/navigation/zson/zson/run.py", line 90, in main() File "/home/mdisk1/heqisheng/embody/navigation/zson/zson/run.py", line 38, in main run_exp(**vars(args)) File "/home/mdisk1/heqisheng/embody/navigation/zson/zson/run.py", line 85, in run_exp config = get_config(exp_config, opts) File "/home/mdisk1/heqisheng/embody/navigation/zson/zson/zson/config.py", line 259, in get_config config.TASK_CONFIG = get_task_config(config.BASE_TASK_CONFIG_PATH) File "/home/mdisk1/heqisheng/embody/navigation/zson/zson/zson/config.py", line 155, in get_task_config config.merge_from_file(config_path) File "/home/zhangzhiyuan/miniconda3/envs/zson/lib/python3.7/site-packages/yacs/config.py", line 211, in merge_from_file with open(cfg_filename, "r") as f: FileNotFoundError: [Errno 2] No such file or directory: 'configs/tasks/pointnav.yaml'

After modifying the configuration file path as configs/tasks/objectnav_v1.yaml, the model reported an error that the checkpoints’ weights and the model’s weights are inconsistent.

Traceback (most recent call last): File "/home/mdisk1/heqisheng/embody/navigation/zson/zson/run.py", line 90, in main() File "/home/mdisk1/heqisheng/embody/navigation/zson/zson/run.py", line 38, in main run_exp(**vars(args)) File "/home/mdisk1/heqisheng/embody/navigation/zson/zson/run.py", line 86, in run_exp execute_exp(config, run_type) File "/home/mdisk1/heqisheng/embody/navigation/zson/zson/run.py", line 71, in execute_exp trainer.eval() File "/home/mdisk1/heqisheng/embody/navigation/zson/habitat-lab-challenge-2022/habitat_baselines/common/base_trainer.py", line 112, in eval checkpoint_index=ckpt_idx, File "/home/mdisk1/heqisheng/embody/navigation/zson/zson/zson/trainer.py", line 179, in _eval_checkpoint msg = self.agent.load_state_dict(ckpt_dict["state_dict"], strict=False) File "/home/zhangzhiyuan/miniconda3/envs/zson/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1672, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for ZSON_PPO: size mismatch for actor_critic.net.state_encoder.rnn.weight_ih_l0: copying a param with shape torch.Size([2048, 1568]) from checkpoint, the shape in current model is torch.Size([1536, 1568]). size mismatch for actor_critic.net.state_encoder.rnn.weight_hh_l0: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([1536, 512]). size mismatch for actor_critic.net.state_encoder.rnn.bias_ih_l0: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for actor_critic.net.state_encoder.rnn.bias_hh_l0: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for actor_critic.net.state_encoder.rnn.weight_ih_l1: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([1536, 512]). size mismatch for actor_critic.net.state_encoder.rnn.weight_hh_l1: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([1536, 512]). size mismatch for actor_critic.net.state_encoder.rnn.bias_ih_l1: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for actor_critic.net.state_encoder.rnn.bias_hh_l1: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]). Exception ignored in: <function VectorEnv.del at 0x7fa9f2275050> Traceback (most recent call last): File "/home/mdisk1/heqisheng/embody/navigation/zson/habitat-lab-challenge-2022/habitat/core/vector_env.py", line 592, in del self.close() File "/home/mdisk1/heqisheng/embody/navigation/zson/habitat-lab-challenge-2022/habitat/core/vector_env.py", line 463, in close write_fn((CLOSE_COMMAND, None)) File "/home/mdisk1/heqisheng/embody/navigation/zson/habitat-lab-challenge-2022/habitat/core/vector_env.py", line 118, in call self.write_fn(data) File "/home/mdisk1/heqisheng/embody/navigation/zson/habitat-lab-challenge-2022/habitat/utils/pickle5_multiprocessing.py", line 62, in send self.send_bytes(buf.getvalue()) File "/home/zhangzhiyuan/miniconda3/envs/zson/lib/python3.7/multiprocessing/connection.py", line 200, in send_bytes self._send_bytes(m[offset:offset + size]) File "/home/zhangzhiyuan/miniconda3/envs/zson/lib/python3.7/multiprocessing/connection.py", line 404, in _send_bytes self._send(header + buf) File "/home/zhangzhiyuan/miniconda3/envs/zson/lib/python3.7/multiprocessing/connection.py", line 368, in _send n = write(self._handle, buf) BrokenPipeError: [Errno 32] Broken pipe

Could it be that I have made a mistake in some settings? Could you give me some advice? Thank you.

zhhiyuan avatar Jun 06 '24 08:06 zhhiyuan