Design test procedures for `examples/`
We need a way to automatically test examples to ensure they work with framework core changes. One solution would be marking these tests can be marked with slow to avoid running them each time. Also, dataset loading and downloading, training models and similar slow running operations should all be mocked (perhaps a set of mocking tools can be created)
Currently, the test examples are not part of the docs and some of them are outdated.
When we add them, it shouldn't be too hard to implement automatic execution of these examples as part of the existing test suite. Not sure about the mocking tool, think this is a long term goal (probably because I'm not a big fan of mocking). Ideally, we would have an external runner (e.g. TakeLab server) connected to CI that runs slow examples if the file content is changed/new files are added. This means we would need the server only from time to time. IMO, these examples (only BERT comes to my mind) are "slow", but not that slow to mock parts of them.
This sounds fantastic, if we could setup a CI that detects changes and runs only in those cases. We could probably get away without mocking whatsoever, which sounds great.
I'd say the docs are a separate issue (though a valid point), so I wouldn't like to expand this issue too much.
How would we go about checking the correctness of non-deterministic parts of the examples, e.g. training of a model, which should be a fairly common example case? I see breaking examples into subfunctions and testing those subfunctions separately as a good first step, but would that impact the "esthetics" of the examples?
We can either fix training of the model (you can make it deterministic at the cost of speed) or simply not care about performance metrics (unless they are relevant) as long as the training completes.
We can either fix training of the model (you can make it deterministic at the cost of speed) or simply not care about performance metrics (unless they are relevant) as long as the training completes.
But can we guarantee the correctness of training examples?
We can either fix training of the model (you can make it deterministic at the cost of speed) or simply not care about performance metrics (unless they are relevant) as long as the training completes.
But can we guarantee the correctness of training examples?
Not sure what you mean by this.
I'd delegate this for post 1.1.0.
Will be closed via #318