Accelerate learning by automatically reducing the size of the training dataset.

Open emmanuellujan opened this issue 3 years ago • 1 comments

Description here.

Sep 19 '22 23:09 emmanuellujan

Progress on https://github.com/cesmix-mit/PotentialLearning.jl/pull/25 by @dallasfoster From PR discussion:

DPP subset selector (as well as random subset selector) was added. Unit testing is sufficient.
An implementation of HDBSCAN is already present in PotentialLearning.jl. Perhaps this can be made more robust.
This AL code using HDBSCAN has some limitations vis-a-vis DPP subsampling.
A comparison between the two methods could be a nice publication. First the parallel Julia version of the method based on HDBSCAN (and/or other clustering algorithms) has to be implemented.
This issue could be closed if another issue with a follow-up is open. The new issue could have a more specific title that better represents the original description.

Jan 03 '23 21:01 emmanuellujan