biblib icon indicating copy to clipboard operation
biblib copied to clipboard

Name parser's incorrect handling of certain accents such as \\H

Open pszolovits opened this issue 6 years ago • 0 comments

This is a wonderful package, but I have run into a problem: In TeX and LaTeX, once can specify accents such as \H either in the form \H{o} or \H o. In the latter case, the NameParser code splits the second form incorrectly. biblib.algo.parse_names("Sz\\H{o}l\\H{o}, Abel and Sz\\H ol\\H o, Baker") [Name(first='Abel', von='', last='Sz\\H{o}l\\H{o}', jr=''), Name(first='Baker', von='Sz\\H ol\\H', last='o', jr='')]

For reasons I don't understand, surrounding the second \H o with braces for Baker will parse that name correctly, even though the first \H o still contains a space. biblib.algo.parse_names("Sz\\H ol{\\H o}, Baker") [Name(first='Baker', von='', last='Sz\\H ol{\\H o}', jr='')]

This particular problem can be solved by converting to Unicode first biblib.algo.parse_names(biblib.algo.tex_to_unicode("Sz\\H{o}l\\H{o}, Abel and Sz\\H ol\\H o, Baker")) [Name(first='Abel', von='', last='Szőlő', jr=''), Name(first='Baker', von='', last='Szőlő', jr='')] but that approach strips out braces needed, for example, to specify an institution name and have it parsed as all being a last name, not split into name components. E.g., biblib.algo.parse_names(biblib.algo.tex_to_unicode("Sz\\H ol\H o, Baker and {NRC Committee}")) [Name(first='Baker', von='', last='Szőlő', jr=''), Name(first='NRC', von='', last='Committee', jr='')]

I think that the NameParser's algorithm must special-case accent specifiers such as \H, \r, \u, \v, etc., to be sure not to split tokens on spaces following them, just as it now special-cases spaces at a brace depth > 0.

pszolovits avatar Sep 28 '19 17:09 pszolovits