gap icon indicating copy to clipboard operation
gap copied to clipboard

GAP potentially puts linebreaks between the bytes forming a UTF-8 character

Open zickgraf opened this issue 2 years ago • 6 comments

Consider the following situation:

gap> SizeScreen([80]);;
gap> Display(" →→→→→→→→→→→→→→→→→→→→→→→→→→→→");
 →→→→→→→→→→→→→→→→→→→→→→→→→�\
�→→

Observed behaviour

GAP puts a linebreak between the bytes forming the UTF-8 character . In particular, if this happens inside the output in a .tst file, the file is not a valid UTF-8 file anymore.

Expected behaviour

The linebreak is inserted before or after the UTF-8 character.

I expect that this is a known bug, but I could not find an open issue for this.

Copy and paste GAP banner (to tell us about your setup)

 ┌───────┐   GAP 4.13dev built on 2023-12-15 03:25:31+0100
 │  GAP  │   https://www.gap-system.org
 └───────┘   Architecture: x86_64-pc-linux-gnu-default64-kv9
 Configuration:  gmp 6.2.1, GASMAN, readline
 Loading the library and packages ...
 Packages:   AClib 1.3.2, Alnuth 3.2.1, AtlasRep 2.1.7, AutPGrp 1.11, Browse 1.8.21, CaratInterface 2.3.5, CRISP 1.4.6, Cryst 4.1.26, CrystCat 1.1.10, CTblLib 1.3.6, curlInterface 2.3.2, FactInt 1.6.3, FGA 1.5.0, Forms 1.2.9, 
             GAPDoc 1.6.6, genss 1.6.8, IO 4.8.2, IRREDSOL 1.4.4, LAGUNA 3.9.6, orb 4.9.0, Polenta 1.3.10, Polycyclic 2.16, PrimGrp 3.4.4, RadiRoot 2.9, recog 1.4.2, ResClasses 4.7.3, SmallGrp 1.5.3, Sophus 1.27, SpinSym 1.5.2, 
             StandardFF 1.0, TomLib 1.2.9, TransGrp 3.6.5, utils 0.84
 Try '??help' for help. See also '?copyright', '?cite' and '?authors'

zickgraf avatar Dec 15 '23 13:12 zickgraf

Technically, I don't think GAP promises to use UTF-8 -- someone could be using Latin-1 for example.

So, there are various things to decide -- do we want to changing printing based on terminal config, or just decide nowadays everyone wants UTF-8?

ChrisJefferson avatar Dec 17 '23 03:12 ChrisJefferson

Just some ideas: Maybe an efficient solution could be to not insert linebreaks at all if a string contains any characters outside of the range of printable ASCII characters. Or a partial solution could maybe restrict linebreaks to be inserted only between printable ASCII characters. But maybe that would lead to too many inconsistencies :/

zickgraf avatar Dec 18 '23 09:12 zickgraf

This has reminded me of a PR I never got around to finishing (I've just looked at resurrecting it, will need some poking):

https://github.com/gap-system/gap/pull/5140

This disables GAP's linebreaks entirely (the reason this is a bit less trivial than you might think is GAP combines line breaks with indendation -- personal I never want GAP to line break, but always want it to indent). I'm going to work on polishing it up over the next few days, then we can see if it would solve this problem, and maybe write some docs for it.

ChrisJefferson avatar Dec 18 '23 10:12 ChrisJefferson

Ah, I wasn't aware of that PR. I like the idea very much, this would also solve other issues I have.

zickgraf avatar Dec 19 '23 19:12 zickgraf

I have now updated #5140 , so it applies to master and has some basic documentation. You should be able to run SetPrintFormattingStatus("*stdout*", rec(linewrap := false, indent := true));, which should stop UTF-8 characters getting chopped, and in general stop GAP terminal wrapping (instead letting your terminal do it's normal thing).

I'd be interested if this seems to handle UTF-8 well, or if there is some unexpected issues

ChrisJefferson avatar Jan 11 '24 08:01 ChrisJefferson

Very nice, thanks a lot! I just tried out the PR: In a terminal I do not see problems with UTF-8 characters anymore :-) In a tst file, I don't think I can currently affect the formatting of the output stream (which I think is an OutputTextString), right? But I guess we could possibly introduce a new option for Test which sets the formatting once #5140 is merged? In any case, I think #5140 is a huge improvement! I will use it for my local GAP build and will report if anything weird shows up.

zickgraf avatar Jan 11 '24 13:01 zickgraf