javascript icon indicating copy to clipboard operation
javascript copied to clipboard

Delete the current sentence & word tokenizers/parsers

Open atimmer opened this issue 6 years ago • 0 comments

Explanation

The current sentence and word tokenizers/parsers take into account HTML. In https://github.com/Yoast/javascript/issues/406 we will build parsers for sentences and words that assume there is not HTML in the text anymore.

When all of the text analysis library code relies on the tree instead of the old (flawed) parsers we can delete the old parsers.

Technical decisions

The files I am talking about are:

  • https://github.com/Yoast/javascript/blob/develop/packages/yoastseo/src/stringProcessing/SentenceTokenizer.js
  • https://github.com/Yoast/javascript/blob/develop/packages/yoastseo/src/stringProcessing/getWords.js

If this has not been done yet, we should also make sure that all the tests are implemented for the new parsers. Tests with HTML shouldn't be ported. Old tests:

  • https://github.com/Yoast/javascript/blob/develop/packages/yoastseo/spec/stringProcessing/getSentencesSpec.js
  • https://github.com/Yoast/javascript/blob/develop/packages/yoastseo/spec/stringProcessing/getWordsSpec.js

Feedback?

atimmer avatar Nov 20 '19 16:11 atimmer