Kenneth Benoit issues

Results 104 issues of


                                            Kenneth Benoit

How to return tokens matching a dictionary lookup?

This comes from https://github.com/quanteda/quanteda.sentiment/issues/11, which is a more general question about how a function can return the set of original tokens matching a dictionary lookup, not just using `tokens_select()`, but...

question

dictionary

Rewrite quanteda.io pkgdown site

- [ ] Add an article about extending **quanteda** - suggestions on when to load which packages - how to use Imports, how to extend generics - [ ] Update...

documentation

modularisation

Add convert(x, to = "kerasR") functionality

Following the discussion in #1138, we are thinking of extending `convert()` so that we could go from a dfm using: ```r convert(anydfm, to = "kerasR") ``` Problem is we haven't...

question

dfm

compatibility

Add a learnr site

This looks awesome: https://rstudio.github.io/learnr/ Would be nice to integrate this with https://tutorials.quanteda.io.

documentation

POS feature selection

Add the ability to extract parts of speech (using OpenNLP) as features, as an option to dfm. This means we should think of modularising the objects that define dfm "features"....

enhancement

tokens

design

How to predict in advance a kwic search that will take a very long time?

What would be reasonable limits on what we allow a user to ask for the pattern matching functions? It appears to be an issue mainly in the number of patterns....

question

performance

Proposal for extending tokens objects to store annotations

This is a restart of #536, following on two use cases I've encountered in the past two days. ### Idea Provide a way for a `tokens` object to store the...

Redesign docvars internals to improve efficiency and integrity

I've started a Request for Comment to serve as an ongoing discussion board, rather than a string of issues. See https://github.com/quanteda/quanteda/wiki/Proposal-for-changing-docvars. This will affect or resolve the following issues: -...

tokens

infrastructure

design

dfm

corpus

Add a vignette for a right-to-left language

Right-to-left languages pose special challenges for **quanteda** (and R in general) for tokenising and indexing, although this may depend on locale issues that are hard to test for us (since...

enhancement

tokens

Add sentence parsing exceptions as language-dependent global setting

Exceptions such as Mr., Dr., Prof., etc are currently hard-wired into `tokenize.character()`. These could be listed for each language and made user accessible through `settings()`.

enhancement

tokens