Unsupervised

Open brooksjessup opened this issue 5 years ago • 1 comments

Explore the Data Using Pandas- typo: "interpretation. <3 your data"

Why not apply some of the preprocessing techniques from the last lesson here on the music reviews data?

Creating the DTM using scikit-learn- Explanation needed for why it's necessary to remove numbers.

Topic Modeling- typo: "what the ext is about" -> "text" The paragraph on the "theory" behind LDA is very dense and difficult to parse.

It is unnecessary to fit-transform both tf-idf and countvectorizer here - one or the other is fine.

Error message fitting the lda model: "LatentDirichletAllocation(n_topics=10...)" -> "LatentDirichletAllocation(n_components=10"

It might be nice to include an interpretation of the 10 topics identified by the model.

Error message in cosine similarity example at end of notebook.

Further resources- The link for the blog post is broken. Remove it?

Jan 07 '21 05:01 brooksjessup

Hi @brooksjessup -- Can you commit and push these changes? Please close this comment when you are done. Let me know if you have any questions. Thanks!

Feb 11 '21 21:02 EastBayEv