Add dataset to illustrate overfitting

Open aboggust opened this issue 7 years ago • 0 comments

This PR adds a dataset to exhibit the concept of overfitting. The dataset is linearly separable along the main diagonal, but its positive training samples only span a subset of the area under the diagonal while its positive testing samples span the entire area under the diagonal. Deep models with more complex functions will overfit the training data to minimize training loss while increasing the test loss, which illustrates overfitting.

Previously this was not possible as all the current datasets have testing and training samples drawn from the same distribution. To accommodate this type of dataset this PR refactors the way data generators are stored to include separate training and testing generators.

Jan 28 '19 01:01 aboggust