Enable auto checkpointing on SIGTERM
Motivation As a follow up to #252, we want CoreNeuron to be able to create checkpoints right before an allocation expires. Since most job schedulers send a SIGTERM before sigkill, we implement a handler for such signal. It may, however, be needed to tune the time to sending this signal, since long simulations may take a bit of time to write everything out.
Implementation
Checkpoints are created in a folder _corenrn_ckpt inside the output root only if a minimum amount of time elapsed.
This directory is checked for existence on startup if no --restore is provided.
I don't really understand why the CI could fail in GPU. @pramodk Ideas?
I don't really understand why the CI could fail in GPU. @pramodk Ideas?
sorry for delay - this issue is being investigated. You can ignore this error.
Can you rebase this @ferdonline?