Kenneth Benoit comments

Results 308 comments of


                                            Kenneth Benoit

Consider parallelizing tokenization

Great idea to put this into `quanteda_options`, then we don't have to change `tokens()`, now, or in the future while parallelism evolves more in R. By turning it off by...

Consider parallelizing tokenization

Exactly - or even having the value be: "future" (for `future_lapply`), "parallel" (for `mclapply`), or "base" (for `lapply`). But it doesn't matter as much if we do it via options,...

Improved graphics for co-occurence networks

Thanks! Good idea.

Proposal for extending tokens objects to store annotations

Thanks @jwijffels that's really useful. We want to achieve all of these, although 3 was not in my immediate plan (but I think the scheme could be extended to use...

Output of fcm(x, context = "window", count = "boolean")

Hi - Yes that is intended, although I can see how it could have been implemented differently. Think of window as a "width" where the default "document" size is the...

Output of fcm(x, context = "window", count = "boolean")

Thanks @odelmarcelle for explaining the use case. In earlier versions of what is now `quanteda.textstats::textstat_collocations()` we used a similar method for computing pmi for ordered, adjacent word co-occurrences. What @koheiw...

How to return tokens matching a dictionary lookup?

That's a good and quick ("kwic"? 😄) solution! But how would we deal with the nested dictionary issue, so that in d1, we don't match "not good" as "pattern =...

Allows adding customized rules to the ICU tokenizer

Fully agreed with @koheiw, thanks @odelmarcelle this is great. I have been slow in replying because I'm just getting over COVID but we will review this thoroughly soon.

topfeature function return weird result

Can you illustrate what you are trying to do, using a keyword and an analysis of some of the inaugural corpus texts, so that we can reproduce the issue?

topfeature function return weird result

Since dfm_tfidf() does not work on fcm objects, I can only think that you have created a dfm first to which you are applying `dfm_tfidf()` before creating the fcm. If...