Tom Aarsen comments

Results 337 comments of


                                            Tom Aarsen

IndexError: string index out of range

@kootenpv The bug at display is definitely tricky - but it boils down to the following: ```python >>> character = "İ" >>> character 'İ' >>> len(character) 1 >>> character.lower() 'i̇'...

Suggestions and discussions regarding `pre-commit`

> Input coming from my time enabling various pre-commit hooks at my company: I believe that whoever adds a pre-commit should also be responsible for linting the whole repository with...

Suggestions and discussions regarding `pre-commit`

I quickly ran `mypy` over the `nltk` folder, which resulted in: ```diff -Found 134 errors in 58 files (checked 345 source files) ``` (Note, I believe it's common for fixing...

Suggestions and discussions regarding `pre-commit`

`pyupgrade` seems to be a nice addition to modernize the NLTK codebase somewhat. It seems to err on the side of caution, only modifying where its sure we'll be happy...

Suggestions and discussions regarding `pre-commit`

@dannysepler If I understood that blogpost correctly, then it proposes the inclusion of a file in the root directory called `.git-blame-ignore-revs`, so that users can use `git blame --ignore-revs-file .git-blame-ignore-revs...

Suggestions and discussions regarding `pre-commit`

> On another note, @ iliakur mentioned [here](https://github.com/nltk/nltk/pull/2774#issuecomment-892528031) that pre-commit hooks ought to be added to the CI as well. This can supposedly be done with [pre-commit.ci](https://pre-commit.ci/), which mentions adding...

TweetTokenizer add new emoticons characters

@danielafe7-usp No, this is not (conveniently) possible. TweetTokenizer detects emoticons using a specific regular expression: https://github.com/nltk/nltk/blob/54221dec0bae2642d1642d182d8a381c88b86bd0/nltk/tokenize/casual.py#L57-L70 This regex will not match `:-*` (because `*` is not considered a valid option...

TweetTokenizer add new emoticons characters

@ajdapretnar I feel like this is a bit of a complex situation. To give some context: ```python >>> string = "🇸🇮🤝🇺🇦" >>> len(string) 5 >>> string[0] '🇸' ``` Long story...

"Smart" issue

@stevenbird ```python import nltk def part_of_speech(word): token = nltk.word_tokenize(word) pos = nltk.pos_tag(token) return pos print(part_of_speech('smart')) print(part_of_speech('smarter')) print(part_of_speech('smartest')) print(part_of_speech('I am smart')) print(part_of_speech('I am smarter')) print(part_of_speech('I am smartest')) print(part_of_speech('smart person')) print(part_of_speech('smarter person'))...

Type hinting / annotation (PEP 484)?

I would love to see Python 3.5+ type hinting now that we no longer support versions that break with this type hinting.