Generated data splits should be tracked along with model outputs, and not stored with data.

Open yalaudah opened this issue 5 years ago • 0 comments

When running the prepare_dutchf3.py script, the results should be stored, and tracked, along with the outputs of each model run (logs, snapshots, configs, etc), and not separately. This means the code should run as part of the data prep in the training scripts, and not once. We should also make sure that all the required parameters (e.g. stride, or section_stride) are stored in the config files.

Otherwise, newer model runs might use older data splits, and there is no way to track which data split was used with a model.

Note: This also requires changes to the Docker implementation, and to the README file.

Apr 23 '20 19:04 yalaudah