javascript
javascript copied to clipboard
Delete the current sentence & word tokenizers/parsers
Explanation
The current sentence and word tokenizers/parsers take into account HTML. In https://github.com/Yoast/javascript/issues/406 we will build parsers for sentences and words that assume there is not HTML in the text anymore.
When all of the text analysis library code relies on the tree instead of the old (flawed) parsers we can delete the old parsers.
Technical decisions
The files I am talking about are:
- https://github.com/Yoast/javascript/blob/develop/packages/yoastseo/src/stringProcessing/SentenceTokenizer.js
- https://github.com/Yoast/javascript/blob/develop/packages/yoastseo/src/stringProcessing/getWords.js
If this has not been done yet, we should also make sure that all the tests are implemented for the new parsers. Tests with HTML shouldn't be ported. Old tests:
- https://github.com/Yoast/javascript/blob/develop/packages/yoastseo/spec/stringProcessing/getSentencesSpec.js
- https://github.com/Yoast/javascript/blob/develop/packages/yoastseo/spec/stringProcessing/getWordsSpec.js