react-pdf Copy-paste issue for registered font

Describe the bug Copy-paste is not working correctly for the PDFs generated with the registered font.

To Reproduce I was generating PDF files using react-pdf/renderer Registered the font like this: Font.register({ family: 'Gelasio-Regular', src: __dirname + '/Gelasio-Regular.ttf' }); But when copy-paste the content of the PDF file, it is not coming correctly.

Here is the generated PDF file: test.pdf

When copy-pasting the content, I am getting like this: Oce􀀄

Downloaded the font Gelasio-Regular.ttf file from: https://fonts.google.com/specimen/Gelasio

Other information

React-pdf version: 2.1.1

Jul 22 '22 09:07 arathithamban

Hi, same problem. Copying some words, the characters are changed. Same problem if trying to search the text within the pdf.

Sep 13 '22 08:09 anthares-dev

Which @react-pdf/renderer version are you using?

Sep 13 '22 08:09 ghost

I'm using 2.0.21 The issue appears using fonts Nunito and OpenSans.

Sep 13 '22 10:09 anthares-dev

I'm using custom font Noto-Sans and see no issue with the latest version 3.0.0

Sep 13 '22 10:09 ghost

+1 to seeing this issue generating a pdf. Using the Roboto font from Google. Tried upgrading to React-PDF 3.0.0 and got the latest version of the font

Sep 20 '22 21:09 ErnestMcBeard

Yeah. Tested with latest version with Noto Sans SC local font and this issue can be reproduced.

Sep 21 '22 11:09 ghost

Issue can be reproduced with the following fonts: Nunito, Montserrat, NotoSans and Roboto

No problem if using Raleway or JosefinSans

My React-PDF version is 2.1.0

Sep 22 '22 14:09 anthares-dev

I'm facing the same issue with Arial. And after extensive tests, we found out that the problem is restricted to Arial Regular (weight 400) and Arial Bold (weight 700). Other font weights registered in the application, like 300, 500 & 900, works fine! The example below illustrates this, where the first row of content uses font weight 500, the second one 700 and the third one uses regular 400:

Copying content from:

Output when pasting it somewhere else:

CODE NAME: "CONGENBILL" EDITION 1994
1Shiipe
:AD AM 2º:OS/ /IA:

With that figured out, we tried to find the font in some other source or CDN but we got the same result for both TTF and WOFF format.

In addition, we faced the same behavior from version 2.3 thru 3.0 of @react-pdf/renderer.

Oct 17 '22 14:10 juliolmuller

I'm facing the same issue using Google NotoSansTC Font. @react-pdf/renderer version: 3.0.0 Both Chinese & non-Chinese characters are copied as wrong text. I have tried the birdfont and fontforge workaround, but non of them completely works for me. Importing & exporting the font using fontforge fixes part of the issue, most of the characters can be copied correctly, but some glyphs cannot display as usual anymore.

Nov 28 '22 12:11 SongRongLee

@SongRongLee What is the fontforge workaround?

Nov 28 '22 12:11 ghost

@chathu-novade As described in this reply

Nov 28 '22 12:11 SongRongLee

We're having this same issue with Google Outfit. We only have Latin characters in our pdf. Both copy-paste and search show the same errors, e.g. "specific" becomes "specixc". If we generate the PDF from a different source (Figma), it does not have this error. The fontforge fix did not work.

I can provide PDFs and/or example code if requested.

Jan 24 '23 15:01 jxbaker-sep

FYI, if this is more important than the appearance of your text, I would advise removing the custom font altogether as it seems to solve the issue. In my case, it was because I'm using this for a resume that is often consumed by algorithms.

Mar 15 '23 14:03 justin-hackin

Our team recently make some progress on this problem. Hope that our experience can help.

When embedding a Type 0 font into a PDF, there are two main methods for generating Unicode mappings for glyphs: bfrange and bfchar. These methods are used to specify how character codes map to Unicode character codes.

Example using bfrange:

4 beginbfrange
<00> <26> <00>
<61> <7d> <61>
endbfrange

Example using bfchar:

8 beginbfchar
<815c> <815c>
<eb63> <eb63>
endbfchar

React-pdf uses the bfrange method in its toUnicodeCmap function to generate these mappings (which comes from pdfkit). While this works well in most PDF viewing software like Adobe Reader and Mozilla's PDF.js, it encounters issues when viewed in Chrome's built-in PDF viewer. Specifically, the texts can sometimes appear normally but copied as gibberish, especially when the PDF contains a large amount of text.

To resolve this issue, re-implement the toUnicodeCmap method to use bfchar encoding instead of bfrange. This change successfully resolved the gibberish text issue when viewing the PDF in Chrome.

The fonts we use to test include Noto Sans Traditional Chinese and DFKai-SB.

The greatest benefit of solving this problem is that we can use google's font directly (without any tweaks by fontforge) and reduce network traffic of our servers :D

Sep 15 '23 15:09 victorfu

@victorfu Can you please share the rewritten method? I'm running into the same issue but I'm unfamiliar with pdf cmaps and encoding.

Oct 03 '23 14:10 peilong-du

@victorfu Can you please share the rewritten method? I'm running into the same issue but I'm unfamiliar with pdf cmaps and encoding.

@peilong-du check this pull request https://github.com/diegomura/react-pdf/pull/2408

Oct 09 '23 01:10 victorfu

Thanks @victorfu !!!

Oct 09 '23 13:10 peilong-du

@victorfu Can you please share the rewritten method? I'm running into the same issue but I'm unfamiliar with pdf cmaps and encoding.

@peilong-du check this pull request #2408

Thanks!!! I've been able to build and create a patch-package for this issue which is a lifesaver because I'm using it to build my resume and ATS software would read it wrong.

Oct 18 '23 19:10 irian-codes

Do we have an ETA on when the next release is gonna be after the pull request is merged?

@irian-codes I'm using this in a resume builder that i use for my own resume. Can you share the patch-package you built?

Update: Nevermind. After some digging, I was able to use patch-package to patch it myself with #2408 .

Oct 29 '23 09:10 sandeepdotcode

Closed by https://github.com/diegomura/react-pdf/pull/2488

Jan 15 '24 11:01 diegomura