Problems with Regex when using SPARQL REPLACE function with certain characters
Running the following SPARQL for replacing %c3%85 with the letter Å runs as expected
select REPLACE("%c3%85-XYZ-%20%28-DEF-%29","%C3%85", "Å", 'i') where {}
#Result: Å-XYZ-%20%28-DEF-%29
However, when using nested REPLACE statements with an outer replace having a regex with ., the replace function "jumps" back one character where the match is found :
select REPLACE(
REPLACE("%c3%85-XYZ-%20%28-DEF-%29","%C3%85", "Å", 'i'),
"%..(%..)*", "?", 'i')
where {}
# Result: Å-XYZ?8-DEF?9
# Expected: Å-XYZ-?-DEF-?
This only happens for some replace characters including all ofÆØÅæøå
Workaround for this is to run a CONCAT before the second REPLACE, which seems to "reset" the string before sending it to next REPLACE:
select REPLACE(
CONCAT(REPLACE("%c3%85-XYZ-%20%28-DEF-%29","%C3%85", "Å", 'i'),""),
"%..(%..)*", "?", 'i')
where {}
# Result: Å-XYZ-?-DEF-?
This was tested using Virtuoso version 07.20.3212 on Linux (x86_64-unknown-linux-gnu), Single Server Edition with Virtuoso SPARQL Query Editor
I stumbled upon a similar issue. When I use a non-ASCII character in the replacement, it ends up broken:
SELECT (REPLACE(".", "\\.", "á") AS ?s)
WHERE {}
# Result: á
# Expected: á
However, when STR() is applied to the replacement, what should be in theory a no-op delivers the correct result:
SELECT (REPLACE(".", "\\.", STR("á")) AS ?s)
WHERE {}
# Result: á
# Expected: á
Unfortunately, this work-around doesn't work for all non-ASCII characters:
SELECT (REPLACE(".", "\\.", STR("š")) AS ?s)
WHERE {}
# Result: ?
# Expected: š
This seems to be a general problem of STR(), which I've filed as #609.
Tested using Virtuoso version 07.20.3217.
Any progress on this @openlink?
This issue is still to be resolved, which I have requested development schedule time to do ...