Olaf comments

Results 87 comments of


                                            Olaf

performance issue with tokenization

thank you this is great! It would be interesting to see how the performance evolves as the number of rows grows larger. I will try a few things and report...

performance issue with tokenization

Actually, that was easy to test. Perhaps the table creation is a bottleneck here? ``` tib % as.tokens(), tt_default = tibble::tibble(text = tib) %>% unnest_tokens(output = "word", input = "text"),...

performance issue with tokenization

@kbenoit the more I think about it the more I am convinced there should be some inefficiency hiding somewhere... how is it possible that the performance deteriorates so much as...

performance issue with tokenization

thank you @koheiw for the clarification. i guess one key question is how to make good use of the indexing (which costs a little extra processing). Which `quanteda` function takes...

performance issue with tokenization

@kbenoit interesting. Do we have an idea which `tokens_` operation is most expensive computationally (that is, without indexing)? That could be one additional argument in favor of using `quanteda` in...

filtering a FCM matrix

@kbenoit using the usual trick does not work ``` > fcm_select(mymatrix, min_freq = 2) Error in dfm_select(x, pattern, selection, valuetype, case_insensitive, : unused argument (min_freq = 2) ```

filtering a FCM matrix

OK I figured it out using the (not yet) documented options in `dfm_trim` ``` dfm_trim(mymatrix, min_termfreq= 10, termfreq_type = 'count')` Feature co-occurrence matrix of: 15 by 4 features. 15 x...

filtering a FCM matrix

@kbenoit is this the correct way to proceed?

compatibility with sparklyr?

haaa thats sad. R/sparklyr are pretty hot these days... maybe in the future? how difficult it is for you to do it? thanks!!

compatibility with sparklyr?

really cool