Olaf

Results 87 comments of Olaf

thank you this is great! It would be interesting to see how the performance evolves as the number of rows grows larger. I will try a few things and report...

Actually, that was easy to test. Perhaps the table creation is a bottleneck here? ``` tib % as.tokens(), tt_default = tibble::tibble(text = tib) %>% unnest_tokens(output = "word", input = "text"),...

@kbenoit the more I think about it the more I am convinced there should be some inefficiency hiding somewhere... how is it possible that the performance deteriorates so much as...

thank you @koheiw for the clarification. i guess one key question is how to make good use of the indexing (which costs a little extra processing). Which `quanteda` function takes...

@kbenoit interesting. Do we have an idea which `tokens_` operation is most expensive computationally (that is, without indexing)? That could be one additional argument in favor of using `quanteda` in...

@kbenoit using the usual trick does not work ``` > fcm_select(mymatrix, min_freq = 2) Error in dfm_select(x, pattern, selection, valuetype, case_insensitive, : unused argument (min_freq = 2) ```

OK I figured it out using the (not yet) documented options in `dfm_trim` ``` dfm_trim(mymatrix, min_termfreq= 10, termfreq_type = 'count')` Feature co-occurrence matrix of: 15 by 4 features. 15 x...

@kbenoit is this the correct way to proceed?

haaa thats sad. R/sparklyr are pretty hot these days... maybe in the future? how difficult it is for you to do it? thanks!!