hepdata icon indicating copy to clipboard operation
hepdata copied to clipboard

twitter: replace unicodeit with unicodeitplus

Open GraemeWatt opened this issue 2 years ago • 1 comments

The new unicodeitplus package looks better suited than unicodeit to converting LaTeX expressions in paper titles to Unicode for the purpose of tweeting. It overcomes some of the limitations of UnicodeIt mentioned in svenkreiss/unicodeit#25. Switching over is a simple matter of replacing unicodeit.replace with unicodeitplus.parse. Most of the cleanup operations in the cleanup_latex function will no longer be needed, although unicodeitplus does not yet handle ~ or \rm. Before making the switch, it would be good to run some tests over all (or at least many) existing paper titles to identify remaining limitations of unicodeitplus.parse. I've already identified some problems with \sqrt having complicated arguments that I'll raise in a separate issue.

GraemeWatt avatar Jun 16 '23 14:06 GraemeWatt

I just wrote a Jupyter notebook that gets the titles of all (almost 10,000) HEPData records and compares the output from latex2text, unicodeit and unicodeitplus. I'll wait until a future release of unicodeitplus to address the remaining limitations before making the switch from unicodeit.

GraemeWatt avatar Jun 20 '23 11:06 GraemeWatt