MultivariateStats.jl icon indicating copy to clipboard operation
MultivariateStats.jl copied to clipboard

LDA docs clarification

Open pdimens opened this issue 3 years ago • 5 comments

The docs for LDA are quite technical but would benefit from some user-friendly explanations of the expected return values of a typical workflow including predict(lda_results, newdata) with the same level of detail as presented for the PCA section.

pdimens avatar May 31 '22 19:05 pdimens

Yes, some of the docs are scarce, and in need of more content, e.g. examples & explanations. If you have something in mind, please, let us know or send updates.

wildart avatar Jun 01 '22 19:06 wildart

I'd love to, but I'm struggling a bit with the MC-LDA, which is why I bring this up =/

pdimens avatar Jun 01 '22 19:06 pdimens

You can try to replicate PCA example with Iris dataset to show LDA dimetionality reduction mode, see this scikit-learn example or this one.

If you want to show how LDA is used as classifier, then it will be more complicated because the package doesn't provide any classifier. You can start splitting data on training and testing parts. Fit the training data to MC-LDA model (you can also reduce dimensionality, so you can visualize results). Next, you calculate predictions using training data, and feed the original data and the predicted one into the classifier, e.g. nearest neighbors. Look at MLBase.j for some classification primitives and performance evaluation functionality.

wildart avatar Jun 01 '22 20:06 wildart

Ah, I see. So MulticlassLDA does not provide the analagous functionality to the lda present in the R package MASS? I was trying to follow along this interesting post to see if it can be done with the present functionality. I suppose not. Unfortunately I don't have the intensive maths background to address this myself, but I will have a look at what MASS is doing out of curiosity.

pdimens avatar Jun 01 '22 20:06 pdimens

Basically, your article does what I proposed above. However, the post's part on dimensionality reduction is convoluted and unrelated to LDA. The scikit example more straightforward about the reduction.

I realized that MC-LDA still has problems that you reported in #187, and those things must be addressed first to make the interface more usable.

wildart avatar Jun 01 '22 21:06 wildart