liquid Add support in regular expressions for UTF-8 whitespace detection

We ran across a nasty bug at Braze where a customer was supplying the UTF-8 non-breaking space character in a Liquid template they were providing to us, and it took a very long time to debug why it was not parsing correctly. It turns out that the user-supplied Liquid string had some UTF-8 non-breaking spaces in it, which the current regular expressions do not count as whitespace (\s only includes ASCII whitespace, while [[:space:]] includes ASCII and UTF-8 whitespace characters).

I replaced \s everywhere, but I added a single test case that red-greens against the existing code. Getting full coverage of every possibility seemed excessive, although I'm open to implementing more thorough tests if it's needed before merging.

Co-authored-by: Chris Watkins [email protected]

Feb 15 '24 20:02 zachmccormick

I have signed the CLA!

Feb 15 '24 20:02 zachmccormick

It may also be smart to replace \w with [[:word:]] to work properly with non-ASCII word characters as well, however I would imagine those are easier to spot visually and probably don't get accidentally used.

Feb 15 '24 20:02 zachmccormick