Unsupervised
Explore the Data Using Pandas- typo: "interpretation. <3 your data"
Why not apply some of the preprocessing techniques from the last lesson here on the music reviews data?
Creating the DTM using scikit-learn- Explanation needed for why it's necessary to remove numbers.
Topic Modeling- typo: "what the ext is about" -> "text" The paragraph on the "theory" behind LDA is very dense and difficult to parse.
It is unnecessary to fit-transform both tf-idf and countvectorizer here - one or the other is fine.
Error message fitting the lda model: "LatentDirichletAllocation(n_topics=10...)" -> "LatentDirichletAllocation(n_components=10"
It might be nice to include an interpretation of the 10 topics identified by the model.
Error message in cosine similarity example at end of notebook.
Further resources- The link for the blog post is broken. Remove it?
Hi @brooksjessup -- Can you commit and push these changes? Please close this comment when you are done. Let me know if you have any questions. Thanks!