non-breakable space and preview
Hi. I attached an example of csv file; it has only three rows. Row 1 and 3 have a non-breakable space at the end, Row 2 has a normal space instead (notepad++ screenshot):

I checked the option 'Trim leading & trailing whitespace from strings' before to create a project. It fails because I think it can't recognize non-breakable spaces. Once the project is created, the trim() command deletes spaces without any problems. Can you confirm that? Thanks a lot,
Fabio
Hi @fabiolinus, perhaps it would be easier to reproduce this if you could provide an example file (as a file, not a string in the issue, because GitHub might be sanitizing things).
I had a look at the code after @fabiolinus posted this on the mailing list and found (if I'm reading it correctly) that 'Trim leading & trailing whitespace from strings' on import and the GREL trim() command work differently.
- Trim leading & trailing whitespace from strings -> defined in TabularImportingParserBase https://github.com/OpenRefine/OpenRefine/blob/cb55cdfdf6f9ca916839778dc847cce803688998/main/src/com/google/refine/importers/TabularImportingParserBase.java#L187 which uses the Java String 'trim()' function )defines whitespace as any character with a code point less than or equal to the space code point (\u0020) )
- GREL 'trim()' -> uses com.google.common.base.CharMatcher.whitespace() which defines whitespace as any unicode whitespace character (see https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7Bwhitespace%7D)
The trim() command originally used Java String 'trim()' but was changed to the CharMatcher some time ago (12 years!) exactly because Java String trim() doesn't include non-breaking spaces (https://github.com/OpenRefine/OpenRefine/commit/f29f77e8f8a0024deb50b3a13dec2874c859a7fb)
Brilliant, thanks for the investigation! Then this is probably a good first issue.
I think #5105 is a superset of this.
And please, please remember me #4425
Sorry Antonin. I updated my post in github. Fabio Il domenica 11 settembre 2022 23:22:04 CEST, Antonin Delpeuch @.***> ha scritto:
Hi @fabiolinus, perhaps it would be easier to reproduce this if you could provide an example file (as a file, not a string in the issue, because GitHub might be sanitizing things).
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
Hey @wetneb If no one is working on this, can I pick this up? I see this is a good first issue and would like to use this opportunity to work on this.
Okay! As @tfmorris mentioned above this would probably be solved by #5105 so it is worth considering tackling that instead :)
Thanks! seems like someone is already working on this.
Yes, sure. Sorry again, Fabio Il domenica 11 settembre 2022 23:22:04 CEST, Antonin Delpeuch @.***> ha scritto:
Hi @fabiolinus, perhaps it would be easier to reproduce this if you could provide an example file (as a file, not a string in the issue, because GitHub might be sanitizing things).
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
Closed by #5336