drracket icon indicating copy to clipboard operation
drracket copied to clipboard

Dr. Racket does not combine "Combining Macron Below" with previous character when rendering unicode

Open rxg opened this issue 3 years ago • 7 comments

On macOS, in racket, if I run (bytes->string/utf-8 (bytes #x77 #xcc #xb1))

I get: "w̱" (w with an underline below it), but in Dr. Racket I get a w with an underline next to it, as though they were separate characters.

See: https://en.wikipedia.org/wiki/Macron_below

For context: I noticed this while representing the "Squamish" in its native orthography: (define B (bytes #x53 #xe1 #xb8 #xb5 #x77 #x78 #xcc #xb1 #x77 #xc3 #xba #x37 #x6d #x65 #x73 #x68))

(bytes->string/utf-8 B) "Sḵwx̱wú7mesh"

In Dr. Racket, the k is properly rendered with the underscore (to be fair this is a built-in unicode character)

But the x renders without the overlap.

Interestingly enough, running string->length on the above gives "12", but I'm guessing that this is the appropriate answer if "combining diacritical marks" should count toward length

rxg avatar Apr 26 '22 04:04 rxg

I think this has to do with the way the editor libraries draw text. It is possible to call in a way that the bit that's supposed to be under the "w" actually goes under it (that boolean passed to draw-text) but I don't know the ramifications of trying to change the editor library to use that drawing mode.

#lang racket/gui

(define str (bytes->string/utf-8 (bytes #x77 #xcc #xb1)))

(define (draw c dc)
  (send dc draw-text str 20 10 #f)
  (send dc draw-text str 20 40 #t))

(define f (new frame% [label ""] [width 400] [height 400]))
(define c (new canvas% [parent f] [paint-callback draw]))
(send f show #t)

rfindler avatar Apr 26 '22 14:04 rfindler

Thanks Robby! Curious: if I put the full string into the above code, the second drawing (at 20,40) renders on the canvas differently than in my Dr. Racket interaction from the original post. On the canvas at 20,40 I see the second underline below the x that follows the w, whereas in my Dr. Racket interaction I see what looks like "w_x" with nothing above the underline. I don't understand how this interacts with fonts or font size so that may be what's happening if the default canvas font is not the same as what's in Dr. Racket (my size is definitely different).

rxg avatar Apr 26 '22 19:04 rxg

Oh, yeah, good point! I see something different there too and I'm not sure what's going on, actually. I've put a screenshot. Is that something similar to what you're seeing? (The definitions window contains the above code, so str is the same one that's being passed to draw-text.)

Screen Shot 2022-04-26 at 3 05 38 PM

rfindler avatar Apr 26 '22 20:04 rfindler

Yes that's what I saw when I ran the code!

rxg avatar Apr 27 '22 00:04 rxg

Not sure it is helpful, but under linux it appears to draw only two ways (as I guess the difference is the font in this screenshot).

Screen Shot 2022-04-27 at 9 31 15 AM

rfindler avatar Apr 27 '22 14:04 rfindler

And here's windows, also looks like only two ways things get drawn.

Screen Shot 2022-04-27 at 9 57 41 AM

rfindler avatar Apr 27 '22 14:04 rfindler

It looks like a hard problem. There are a few similar reports. I'm posting them in case they hav some hint:

  • DrRacket result prints "eé" instead of "ée" --> https://github.com/racket/drracket/issues/46

  • draw-text ñrints "p̂p" instead of "pp̂" --> https://github.com/racket/draw/issues/22

  • DrRacket editor show "กO ำ" instead of "กำ" --> https://github.com/racket/drracket/issues/478

[In the first and second previous reports, did the acent move in a different direction or I'm just misinterpreting them?]

gus-massa avatar May 16 '22 00:05 gus-massa