rc icon indicating copy to clipboard operation
rc copied to clipboard

No UTF-8?

Open ghost opened this issue 7 years ago • 7 comments

seems like this rc is really bad at everything other than ascii characters in interactive mode, it enters some weird state when i start typing on another language. image is it even fixable? rc from plan9 works correctly in tty, in other terminals it thinks that non-standard characters are double-size and so it clears them uncorrectly with backspace

ghost avatar May 20 '18 19:05 ghost

I doubt this is rc's fault. It's almost entirely ignorant of character encodings, but since it largely slings around uninterpreted bytestrings it gets away with it. (I'm a bit surprised that ? globs seem to work correctly in the presence of multibyte characters as that's one place where I would expect to need minimal UTF-8 support.)

First thing to check is that your locale settings are correct. What is $LANG ?

TobyGoodwin avatar May 30 '18 21:05 TobyGoodwin

yeah, seems like my locale.conf wasn't been read by rc so I had to add LANG to .rcrc manually

ghost avatar May 31 '18 04:05 ghost

Another problem: image

ghost avatar May 31 '18 04:05 ghost

Ah yes, thank you! In this case, the command name is being deliberately scrambled by protect() in which.c. It wants to avoid non-printing characters, but uses the ASCII-only isprint(). I reckon I can fix that to handle UTF-8 fairly easily (without need to drag in libicu, for instance). I then worry that we're being UTF-8-centric and what about the -16 and -32 encodings? I simply don't have enough experience to tackle those sensibly. If anyone does, do send a Pull Request!

TobyGoodwin avatar May 31 '18 08:05 TobyGoodwin

I'm not sure how this works, but if possible I would like to avoid having anything higher than UTF-8, especially since you don't have enough experience. Most of the time simpler solution is more robust

ghost avatar May 31 '18 08:05 ghost

Just saw this thread.. wonder why bother with a protect() at all? This is the simplest solution, and it's in line with rc punting on all UTF-8 issues (for now).

rakitzis avatar Jun 18 '18 17:06 rakitzis

I think I wrote protect() when I was much younger. If so, it was in response to some hostile environment or other (might well have been a Windows 3.1 terminal emulation + telnet) and It Seemed Like A Good Idea At The Time.

TobyGoodwin avatar Jul 27 '18 07:07 TobyGoodwin

Should we get rid of protect(), @rakitzis?

xyb3rt avatar Jun 11 '23 11:06 xyb3rt

It's not useful as it stands. Please remove it.

rakitzis avatar Jun 11 '23 17:06 rakitzis

Looking into this, I saw that env -i rc behaves the same as env -i sh when build with EDIT=null on my system. I only got the behaviour from the original comment when building with EDIT=readline. It might be worth it to look into other interactive programs that use readline (e.g. python) to see how they behave and how they get it right.

xyb3rt avatar Jun 11 '23 19:06 xyb3rt

Python gets it right. This is what they're doing: https://peps.python.org/pep-0538/

xyb3rt avatar Jun 12 '23 10:06 xyb3rt

OK does this boil down to something simple that can be done for rc?

rakitzis avatar Jun 12 '23 13:06 rakitzis

On Linux it basically boils down to overwriting LC_CTYPE to C.UTF-8 if it is C or POSIX at startup. Unfortunately on other systems the value needs to be slightly different.

I'll vote for doing nothing, because we can argue that an LC_CTYPE of C or POSIX asks for this behaviour.

xyb3rt avatar Jun 12 '23 20:06 xyb3rt

I'm closing this, because rc is encoding-agnostic and works correctly with a properly configured locale.

xyb3rt avatar Jun 14 '23 06:06 xyb3rt