IFS: UTF-8 support is incomplete
Test script:
set -o noglob
# test 1
IFS='£'
set -- : :
v="${#},$*"
echo "$v"
# test 2
IFS='£' # £ = C2 A3
v='abc§def ghi§jkl' # § = C2 A7 (same initial byte)
set -- $v
v="${#},${1-},${2-},${3-}"
echo "$v"
Expected output (UTF-8 locale):
2,:£:
1,abc§def ghi§jkl,,
Output on ~~ksh Version AJM 93u+ 2012-08-01~~ current development version:
2,:£:
1,abc?def ghi?jkl,,
In the second test, the § characters get mangled.
For reference, output on the latest release Version AJM 93u+ 2012-08-01:
2,:?:
1,abc?def ghi?jkl,,
@McDutchie This is definitely broken. I'm pretty sure I had commented earlier that IFS only works for single byte locales but I can't find it now. However, I'm seeing slightly different behavior than you documented. I suspect you made a copy/paste mistake and meant to say the first output you showed was from the current source; i.e., 2017.0.0-devel-....
I took your script and modified it slightly to make understanding the behavior easier. Note that god on my macOS system is the GNU od command:
set -o noglob
IFS='£' # 0xC2 0xA3
print -n "$IFS" | god -tx1z
set -- : :
v="${#},$*"
print -n "$v" | god -tx1z
v='ab§cd ef§gh' # § = 0xC2 0xA7 (same initial byte)
set -- $v
v="${#},${1-},${2-},${3-}"
print -n "$v" | god -tx1z
Output from ksh93u+ included with macOS and ksh93v-:
0000000 c2 a3 >£<
0000002
0000000 32 2c 3a c2 3a >2,::<
0000005
0000000 31 2c 61 62 a7 63 64 20 65 66 a7 67 68 2c 2c >1,abcd efgh,,<
0000017
Output from ksh built from the current source:
0000000 c2 a3 >£<
0000002
0000000 32 2c 3a c2 a3 3a >2,:£:<
0000006
0000000 31 2c 61 62 a7 63 64 20 65 66 a7 67 68 2c 2c >1,abcd efgh,,<
0000017
So somewhere along the line it appears we fixed the first test. Probably as a result of my replacing most of the AST locale code with the platform's locale support.
P.S., Whomever fixes this should be certain to add appropriate unit tests.