PD-MeshNet icon indicating copy to clipboard operation
PD-MeshNet copied to clipboard

[Fix]fix issue #6

Open HarmonJiang opened this issue 4 years ago • 0 comments

Introduction

Issue #6 comes from the bug that statement os.path.exists(complete_path_logs_current_training_job) always be True. When we start a new training job, this code https://github.com/MIT-SPARK/PD-MeshNet/blob/e3f6c01ceff260778daf5fea66125413309e4399/pd_mesh_net/utils/base_training_job.py#L202

# Create the checkpoint subfolder if nonexistent.
self.__checkpoint_subfolder = os.path.join(self.__log_folder,
                                           self.__training_job_name,
                                           'checkpoints')
if (not os.path.exists(self.__checkpoint_subfolder)):
    try:
        os.makedirs(self.__checkpoint_subfolder)
    except OSError:
        raise OSError("Error while trying to create folder "
                      f"'{self.__checkpoint_subfolder}'. Exiting.")

will generate folder $PD_MESH_NET_ROOT/training_logs/new_job/checkpoints. It means that folder $PD_MESH_NET_ROOT/training_logs/new_job has been already created. Then, https://github.com/MIT-SPARK/PD-MeshNet/blob/e3f6c01ceff260778daf5fea66125413309e4399/pd_mesh_net/utils/base_training_job.py#L217 complete_path_logs_current_training_job is defined as $PD_MESH_NET_ROOT/training_logs/new_job, which is created before this definition. So the statement os.path.exists(complete_path_logs_current_training_job) will always be True and the program will try to load previous checkpoint although we launch a new training job.

Changes

To fix it, just move the checkpoint creation codes after the if (not os.path.exists(complete_path_logs_current_training_job)) statement.

HarmonJiang avatar Jan 20 '22 16:01 HarmonJiang