playground
playground copied to clipboard
Add dataset to illustrate overfitting
This PR adds a dataset to exhibit the concept of overfitting. The dataset is linearly separable along the main diagonal, but its positive training samples only span a subset of the area under the diagonal while its positive testing samples span the entire area under the diagonal. Deep models with more complex functions will overfit the training data to minimize training loss while increasing the test loss, which illustrates overfitting.
Previously this was not possible as all the current datasets have testing and training samples drawn from the same distribution. To accommodate this type of dataset this PR refactors the way data generators are stored to include separate training and testing generators.