ast icon indicating copy to clipboard operation
ast copied to clipboard

IFS: UTF-8 support is incomplete

Open McDutchie opened this issue 6 years ago • 1 comments

Test script:

set -o noglob

# test 1
IFS='£'
set -- : :
v="${#},$*"
echo "$v"

# test 2
IFS='£'			# £ = C2 A3
v='abc§def ghi§jkl'	# § = C2 A7 (same initial byte)
set -- $v
v="${#},${1-},${2-},${3-}"
echo "$v"

Expected output (UTF-8 locale):

2,:£:
1,abc§def ghi§jkl,,

Output on ~~ksh Version AJM 93u+ 2012-08-01~~ current development version:

2,:£:
1,abc?def ghi?jkl,,

In the second test, the § characters get mangled.

For reference, output on the latest release Version AJM 93u+ 2012-08-01:

2,:?:
1,abc?def ghi?jkl,,

McDutchie avatar Aug 04 '19 00:08 McDutchie

@McDutchie This is definitely broken. I'm pretty sure I had commented earlier that IFS only works for single byte locales but I can't find it now. However, I'm seeing slightly different behavior than you documented. I suspect you made a copy/paste mistake and meant to say the first output you showed was from the current source; i.e., 2017.0.0-devel-....

I took your script and modified it slightly to make understanding the behavior easier. Note that god on my macOS system is the GNU od command:

set -o noglob
IFS='£'  # 0xC2 0xA3
print -n "$IFS" | god -tx1z

set -- : :
v="${#},$*"
print -n "$v" | god -tx1z

v='ab§cd ef§gh'  # § = 0xC2 0xA7 (same initial byte)
set -- $v
v="${#},${1-},${2-},${3-}"
print -n "$v" | god -tx1z

Output from ksh93u+ included with macOS and ksh93v-:

0000000 c2 a3                                            >£<
0000002
0000000 32 2c 3a c2 3a                                   >2,::<
0000005
0000000 31 2c 61 62 a7 63 64 20 65 66 a7 67 68 2c 2c     >1,abcd efgh,,<
0000017

Output from ksh built from the current source:

0000000 c2 a3                                            >£<
0000002
0000000 32 2c 3a c2 a3 3a                                >2,:£:<
0000006
0000000 31 2c 61 62 a7 63 64 20 65 66 a7 67 68 2c 2c     >1,abcd efgh,,<
0000017

So somewhere along the line it appears we fixed the first test. Probably as a result of my replacing most of the AST locale code with the platform's locale support.

P.S., Whomever fixes this should be certain to add appropriate unit tests.

krader1961 avatar Aug 04 '19 04:08 krader1961