Fix encoding entities
What is the purpose of this pull request?
Bug fix
Does your PR contain necessary tests?
All patches that change the editor code must include tests. You can always read more on PR testing, how to set the testing environment and how to create tests in the official CKEditor documentation.
This PR contains
- [x] Unit tests
- [x] Manual tests
Did you follow the CKEditor 4 code style guide?
Your code should follow the guidelines from the CKEditor 4 code style guide which helps keep the entire codebase consistent.
- [x] PR is consistent with the code style guide
What is the proposed changelog entry for this pull request?
* [#4941](https://github.com/ckeditor/ckeditor4/issues/4941): Fix: Some entities get wrongly encoded, when using `entities_processNumerical = true`
What changes did you make?
There was an issue with using String#charCodeAt() in the entities plugin that returns an integer from 0 to 65535 representing UTF-16 code unit at the given index. In the case when index was greater than 65535 method returns a surrogate pair. To fix this I used codePointAt() which returns a Unicode code point value at the given index.
Another issue was with the regex which was unable to match and replace given unicodes. To fix this I updated existing entities regex by adding a u- unicode option.
Unfortunately, this is not working on IE because codePointAt() and Unicode option are not supported there. Although it is possible to correctly convert HTMLEntity based on the values from the surrogate pair(see d148812) I was unable to find a solution to support the Unicode option on IE.
Which issues does your PR resolve?
Closes #4941.