[Fix]fix issue #6
Introduction
Issue #6 comes from the bug that statement os.path.exists(complete_path_logs_current_training_job) always be True.
When we start a new training job, this code https://github.com/MIT-SPARK/PD-MeshNet/blob/e3f6c01ceff260778daf5fea66125413309e4399/pd_mesh_net/utils/base_training_job.py#L202
# Create the checkpoint subfolder if nonexistent.
self.__checkpoint_subfolder = os.path.join(self.__log_folder,
self.__training_job_name,
'checkpoints')
if (not os.path.exists(self.__checkpoint_subfolder)):
try:
os.makedirs(self.__checkpoint_subfolder)
except OSError:
raise OSError("Error while trying to create folder "
f"'{self.__checkpoint_subfolder}'. Exiting.")
will generate folder $PD_MESH_NET_ROOT/training_logs/new_job/checkpoints. It means that folder $PD_MESH_NET_ROOT/training_logs/new_job has been already created.
Then, https://github.com/MIT-SPARK/PD-MeshNet/blob/e3f6c01ceff260778daf5fea66125413309e4399/pd_mesh_net/utils/base_training_job.py#L217
complete_path_logs_current_training_job is defined as $PD_MESH_NET_ROOT/training_logs/new_job, which is created before this definition. So the statement os.path.exists(complete_path_logs_current_training_job) will always be True and the program will try to load previous checkpoint although we launch a new training job.
Changes
To fix it, just move the checkpoint creation codes after the if (not os.path.exists(complete_path_logs_current_training_job)) statement.