Manuel Bickel comments

Results 13 comments of


                                            Manuel Bickel

Topic modeling guide

We might add some aspects regarding downstream analysis (and maybe visualization depending on the target audience or format of publication). Regarding downstream analysis we might do (feel free to change/adapt/add):...

How to train/modify collocation model with existing (ngram) dictionary? (question)

Thank you for your reply. So I was halfway on the right track by introducing dashes into the ngrams of the dictionary. Something like `cc_model$collocation_stat

How to train/modify collocation model with existing (ngram) dictionary? (question)

That`s fine, I guess you have some more important/complex problems to solve than some dictionary lookups. For the time being I think I can use my workaround, but as soon...

How to train/modify collocation model with existing (ngram) dictionary? (question)

Just realized that I had forgotten to insert the helper function that finds trailing ngrams into the code, sorry for that. I have updated my last code comment accordingly so...

How to train/modify collocation model with existing (ngram) dictionary? (question)

This is an update on this issue, however, not a solution, yet. As per your first comment in this thread, I have created a `collocation_stat` from a cc_dictionary (here only...

Norvig spell corrector

As a side note / hint to spell checking: just stumbled over the [ropensci/hunspell](https://github.com/ropensci/hunspell) package. Have not digged into the details of the implementation, but the basic idea is that...

[question] Topic number selection using Cross Validation

Thank you for your question. As in all tasks regarding the selection of the right number of clusters, topics, etc. there is no single correct answer. Each selection criterion has...

Reimplement createJSON() from LDAvis

With respect to the Jensen Shannon divergence I think that the fix proposed by [Maren-Eckhoff](https://github.com/maren-eckhoff) and pending as [open pull request](https://github.com/cpsievert/LDAvis/pull/77) already solves the problem. See adapted function and test...

Reimplement createJSON() from LDAvis

Maybe my comment was misleading, sorry. I agree that LDAvis will have to be reimplemented, just wanted to confirm that the fix works for this purpose. Hence, in the first...

compare biterm topic modelling to rainette, LDA, coclustering, structural topic model, embedding clustering, autoencoders

I have not worked with short texts. Therefore, I have no good sources at hand, unfortunately. Maybe Japanese Haiku to make Text Mining more philosophical ;-)? Side Note: sorry for...