Reimplement createJSON() from LDAvis
Seems that LDAvis package doesn't actively maintained and won't be updated on CRAN in near future. In particular we need option to not reorder topics and fixes for NaN in jensenShannon (see https://github.com/cpsievert/LDAvis/issues/56):
- https://github.com/cpsievert/LDAvis/pull/77
- https://github.com/cpsievert/LDAvis/pull/80
With respect to the Jensen Shannon divergence I think that the fix proposed by Maren-Eckhoff and pending as open pull request already solves the problem. See adapted function and test below.
There was one last comment in above mentioned issue 56 about still getting NaN, however, without providing an example. At least to my understanding, there should be no NaNs as far as the input data is fine - which it should be at this point. (please correct me if I am wrong)
#adapted jensenShannon
jensenShannon <- function(x, y) {
m <- 0.5*(x + y)
#introduced fix proposed by Maren-Eckhoff to avoid log(0)
#https://github.com/cpsievert/LDAvis/issues/56
0.5*(sum(ifelse(x==0,0,x*log(x/m)))+sum(ifelse(y==0,0,y*log(y/m))))
}
#create phi for testing
p <- c(0.25, 0, 0.25, 0,0.5)
q <- c( 0,0.25, 0.25, 0,0.5)
zeros <- c( 0, 0, 0, 0, 0) #this does not make sense, since row should some up to one, just for demo
phi <- rbind(p, q, qrev = rev(q), prev = rev(p), zeros)
# [,1] [,2] [,3] [,4] [,5]
# p 0.25 0.00 0.25 0.00 0.50
# q 0.00 0.25 0.25 0.00 0.50
# qrev 0.50 0.00 0.25 0.25 0.00
# prev 0.50 0.00 0.25 0.00 0.25
# zeros 0.00 0.00 0.00 0.00 0.00
dist.mat <- proxy::dist(x = phi, method = jensenShannon)
pca.fit <- stats::cmdscale(dist.mat, k = 2)
# [,1] [,2]
# p 4.600278e-02 -0.1037688
# q 2.600304e-01 -0.0176260
# qrev -2.600304e-01 -0.0176260
# prev -4.600278e-02 -0.1037688
# zeros 2.073058e-16 0.2427896
True, but
- PR was not merged yet
- I doubt maintainer will upload updated package on CRAN in near-future
Maybe my comment was misleading, sorry. I agree that LDAvis will have to be reimplemented, just wanted to confirm that the fix works for this purpose. Hence, in the first step a modified copy of createJSON might quickly solve the issues raised above in terms of creating the data for visualization. Another thing is, of course, the potential reimplementation of visualization itself.