luaotfload icon indicating copy to clipboard operation
luaotfload copied to clipboard

Diacritics misplaced with the default renderer

Open jbezos opened this issue 2 years ago • 8 comments

This could also be considered a bug in the hyphenation patterns, but with Harfbuzz it works as expected. Here is a MWE:

\documentclass{article}

\patterns{ е1 }
% \patterns{ 2^^^^0308  }

\usepackage{fontspec}

\setmainfont{Noto Sans}[
  % Renderer=Harfbuzz,
  Script=Cyrillic, Language=Bulgarian]
  
\begin{document}

азе^^^^0308аз

\end{document}

The umlaut is shifted to the right, but with Harfbuzz it’s correctly placed. It also works if we prevent a hyphen just before the diacritic.

jbezos avatar Oct 28 '23 06:10 jbezos

With HarfBuzz it's correctly placed if no break occurs, but it still has the hyphenation point there and therefore allows linebreaking between the e and the diacritic which is pretty much guaranteed to be wrong.

Therefore I at least additionally think that this is a bug in the hyphenation patterns. It might make a sense to do a pass post-hyphenation to validate that no automatically inserted hyphenation points fall in the middle of grapheme clusters to avoid such issues in general, something like https://gist.github.com/zauguin/e119669fa702b112c704a9337b30d446/revisions. Additionally it might make sense to do Unicode normalization before hyphenation in order to avoid pattern not working with non-normalized text.

zauguin avatar Oct 29 '23 14:10 zauguin

I think too that is a bug in the patterns. The topic came up a few years ago here https://tex.stackexchange.com/a/340164/2388, and recently on the luatex user list for greek.

If luaotfload could make some pre/post processing in the right place that would imho quite good.

u-fischer avatar Oct 29 '23 14:10 u-fischer

Not sure if this should belong in luaotfload.I don't really mind if we add it there, but hyphenation is not really in scope and touching the hyphenate callback might also be problematic for non-LaTeX users of luaotfload.

zauguin avatar Oct 29 '23 14:10 zauguin

Then I’ll fix it (at least for the moment) on the babel side, although it has to be fixed eventually in the patterns. I think adding patterns like 8^^^^0308 systematically for all languages will be safe (and let’s hope 9 is not used 🤞). @reutenauer

jbezos avatar Oct 29 '23 15:10 jbezos

After thinking a little bit about this, with some attempts to deal with the issue, I’m not sure this is a task for the hyphenation patterns, because it’s not language dependent — no combination of ‹letter› and ‹combining char› can be hyphenated regardless of the language, and that’s true also for non-LaTeX formats. Repeating the full list of combining chars (there are ~100 of them) in every set of patterns ‘just in case’ doesn’t make much sense to me.

In my tests, there is a penalization of ~.2-.3 seconds per language in my system if I attempt to fix it in the babel side (patterns cannot be added directly to avoid duplicates, so we must check before there isn’t a similar one).

So, I think again it should be fixed by luaoftload and for any renderer. As Ulrike said:

If luaotfload could make some pre/post processing in the right place that would imho quite good.

jbezos avatar Nov 09 '23 08:11 jbezos

@jbezos Do you see any reason why this couldn't become part of a separate package which would then be loaded by babel (and maybe polyglossia)? Otherwise I think that would be my plan here: Create a package based on the gist earlier, then we are completely node independent and have it applied at a more appropriate time than if luaotfload tried to do this as part of shaping. Potentially adding an optional normalization step, I'm guessing non-normalized text isn't exactly helpful for hyphenation either.

zauguin avatar Nov 11 '23 14:11 zauguin

@zauguin With lualatex +babel there is no real need for a package, because a simple transform can do the trick:

\babelposthyphenation{english}{ |[{0300}-{036F}] }{ remove, {} }

(Here | is a discretionary.) But it’s another loop for what I think should be handled by the font renderer.

jbezos avatar Nov 21 '23 16:11 jbezos

This just came up again https://tex.stackexchange.com/q/709020/2388

u-fischer avatar Feb 08 '24 10:02 u-fischer