Replicate an experiment
Why
Well, it's in the name isn't it.
Currently our attempt at replication is to check out, then it displays the command you ran to run the training. Theoretically, you can run that command and the experiment will be reproduced. But, a lot of stuff can change that means the results will be different.
How
Perhaps we need some sort of mode where you're running an experiment from before, or basing an experiment off something else. (See also #291.)
It could:
- Warn you if training data/dependencies/etc are different. (Or a "strict" mode to ensure exact reproducibility!)
- Inherit params, and perhaps let you override params. (Replicate this experiment, but with this param set.)
Related
- #291
- #294
From #415
I think we're also interested in making training completely reproducible. The problem with the current system of just recording the params, is we don't know how to feed those params back into experiments to reproduce the training. This might mean deeper integration with configuration libraries, or maybe doing some lightweight configuration management ourselves.
I think this is where some of the configuration libraries might be helpful as they tend to tackle this exact problem of tying parameter definition/validation to execution (i.e. knowing how to use an existing configuration to run something). Seems like it would remove some redundant efforts of rolling your own config management.
Most of my replies will be biased towards Spock (since I wrote it and is thus easier to know what it can do), so I'll talk specifically about functionality there but I think it should apply to at least OmegaConf as well.
Spock has a chained command .save() (docs are here) on the builder object that will dump the entire configuration state (+ git info like branch, commit sha, etc.) to a markup file (YAML, TOML, JSON). That dumped markup file is actually 100% useable as input for the Spock builder itself and can use all of the underlying features (e.g. inheritance, command line overrides, config file composition). Thus it would be a simple way to 'replicate' an experiment (just load the exported config file) or run a slightly 'tweaked' version of the same experiment (load the config file and tweak via a command line override or config composition -- which then could be saved to file again to for replication). It also adheres nicely to one of your mantras 'It's just plain old files on S3' as all you would need to store are the markup config files.
Obviously, this is all predicated on the fact that the user adheres to the 'slightly opinionated' way Spock defines parameters (via some decorator magic on attrs), generates the configuration object and subsequently uses that configuration object within their Python code to access configuration parameters (again same should hold for OmegaConf or other config libraries). This is the cost/downside if you are trying to stay un-opinionated or keeping Replicate dependency lite...
Not trying to push one way or another but I think this might be a good low-bar way to address part of the replication issue...