text2vec icon indicating copy to clipboard operation
text2vec copied to clipboard

Reimplement createJSON() from LDAvis

Open dselivanov opened this issue 8 years ago • 3 comments

Seems that LDAvis package doesn't actively maintained and won't be updated on CRAN in near future. In particular we need option to not reorder topics and fixes for NaN in jensenShannon (see https://github.com/cpsievert/LDAvis/issues/56):

  1. https://github.com/cpsievert/LDAvis/pull/77
  2. https://github.com/cpsievert/LDAvis/pull/80

dselivanov avatar Jan 09 '18 12:01 dselivanov

With respect to the Jensen Shannon divergence I think that the fix proposed by Maren-Eckhoff and pending as open pull request already solves the problem. See adapted function and test below.

There was one last comment in above mentioned issue 56 about still getting NaN, however, without providing an example. At least to my understanding, there should be no NaNs as far as the input data is fine - which it should be at this point. (please correct me if I am wrong)

#adapted jensenShannon
jensenShannon <- function(x, y) {
    m <- 0.5*(x + y)
    #introduced fix proposed by Maren-Eckhoff to avoid log(0)
    #https://github.com/cpsievert/LDAvis/issues/56
    0.5*(sum(ifelse(x==0,0,x*log(x/m)))+sum(ifelse(y==0,0,y*log(y/m))))
}
#create phi for testing
p <-     c(0.25,   0, 0.25, 0,0.5)
q <-     c(   0,0.25, 0.25, 0,0.5)
zeros <- c(   0,   0,    0, 0,  0) #this does not make sense, since row should some up to one, just for demo
phi <- rbind(p, q, qrev = rev(q), prev = rev(p), zeros)
#       [,1] [,2] [,3] [,4] [,5]
# p     0.25 0.00 0.25 0.00 0.50
# q     0.00 0.25 0.25 0.00 0.50
# qrev  0.50 0.00 0.25 0.25 0.00
# prev  0.50 0.00 0.25 0.00 0.25
# zeros 0.00 0.00 0.00 0.00 0.00
dist.mat <- proxy::dist(x = phi, method = jensenShannon)
pca.fit <- stats::cmdscale(dist.mat, k = 2)
# [,1]       [,2]
# p      4.600278e-02 -0.1037688
# q      2.600304e-01 -0.0176260
# qrev  -2.600304e-01 -0.0176260
# prev  -4.600278e-02 -0.1037688
# zeros  2.073058e-16  0.2427896

manuelbickel avatar Jan 16 '18 13:01 manuelbickel

True, but

  1. PR was not merged yet
  2. I doubt maintainer will upload updated package on CRAN in near-future

dselivanov avatar Jan 17 '18 06:01 dselivanov

Maybe my comment was misleading, sorry. I agree that LDAvis will have to be reimplemented, just wanted to confirm that the fix works for this purpose. Hence, in the first step a modified copy of createJSON might quickly solve the issues raised above in terms of creating the data for visualization. Another thing is, of course, the potential reimplementation of visualization itself.

manuelbickel avatar Jan 17 '18 07:01 manuelbickel