Results 25 comments of Arnold Robbins

You ask: > I wonder why (onetrueawk) awk's length() function counts bytes instead of characters The answer is that onetrueawk is not POSIX compliant, at least in this regard. Check...

Hi. I am afraid that I disagree with you. Firstly, POSIX compatibility in this day and age wins out over backwards compatibility. Users of Asian languages (for example) expect these...

Gawk goes to a lot of work to make string handling efficient. First, strings are reference counted, to avoid lots of copying. Next, gawk caches the value of MB_CUR_MAX since...

In answer to your last question, gawk doesn't store a flag for a string being ASCII / non-ASCII. The decision is based exclusively on MB_CUR_MAX being == 1 or >...

I note that the regexp, while syntactically valid, is semantically nonsense. `$` and `^` are always special in extended regular expressions, and this regexp has no meaning. Maybe the script...

There is no formal way to say "matches a zero width string". The traditional regexp `^$` is "the beginning of the string immediately followed by the end of the string",...

You have just gone down a very dark, deep, and twisty rabbit hole. As in "Alice In Wonderland". I am not going to follow you down there. :-) Wearing my...

This is really fodder for the POSIX committee. Maybe they even have something to say about it already. But that is who's lead I would follow. My two cents.

@plan9 The grammar is definitely an area where "Here there be dragons." Tread very, very, carefully.

I will note that POSIX doesn't require this wart, and gawk only implements it with the `--traditional` option. I suspect it's actually a safe change. Or you might could condition...