page.drawText() inserts spaces when using Thai font
What were you trying to do?
I am trying to use the page.drawText() function to render text in the Thai language
Why were you trying to do this?
To build an application that creates PDF files containing text written in the Thai language
How did you attempt to do it?
The steps I followed are:
- Download Google Noto Sans Thai font
- Embed the font in the pdf-lib PDF document
- Invoke the page.drawText() function passing in the text in Thai
See code example provided in reproduction steps section below.
What actually happened?
The PDF file was successfully created but it seems some large spaces have been inserted into the Thai text in the PDF.
I've copied the text from the PDF and pasted below, notice the strange block characters which have been inserted.
แห่งได้เปดขึนแล้วในการขยายรถไฟใต้ดินลอนดอนครังใหญ่ครังแรกในศตวรรษนี
Those strange characters appear visually as large blank spaces in the PDF e.g like this:
แห่งได้เป ดขึ นแล้วในการขยายรถไฟใต้ดินลอนดอนครั งใหญ่ครั งแรกในศตวรรษนี
What did you expect to happen?
I expected the Thai text to be rendered as one continuous string without any strange characters or spaces inserted:
แห่งได้เปดขึนแล้วในการขยายรถไฟใต้ดินลอนดอนครังใหญ่ครังแรกในศตวรรษนี
How can we reproduce the issue?
- Create a Node JS project folder e.g. called 'pdf-test'
- cd pdf-test
- npm init -y
- npm i pdf-lib
- npm i @pdf-lib/fontkit
- Download Noto Sans Thai font from https://fonts.google.com/download?family=Noto%20Sans%20Thai
- Unzip the font and copy the TTF file from Noto_Sans_Thai/static/NotoSansThai/NotoSansThai-Regular.ttf, paste the file into the the project folder pdf-test so it can be loaded by the index.js script below
- Create a file called index.js and paste the code from below
- Run the index.js file using the command
node index.jswhich will create the PDF file containing some Thai text - Use a PDF viewer/browser e.g. Google Chrome to view the rendered PDF
- Notice the spacing between some of the Thai text
const fs = require('fs');
const path = require('path');
const { PDFDocument, rgb } = require('pdf-lib');
const fontkit = require('@pdf-lib/fontkit');
(async function run() {
const pdfDoc = await PDFDocument.create()
pdfDoc.registerFontkit(fontkit)
// Font downloaded from https://fonts.google.com/download?family=Noto%20Sans%20Thai
// See also https://fonts.google.com/noto/specimen/Noto+Sans+Thai?query=thai
const thaiFontBytes = fs.readFileSync(path.join(__dirname, './NotoSansThai-Regular.ttf'))
const thaiFont = await pdfDoc.embedFont(thaiFontBytes)
const page = pdfDoc.addPage()
const { width, height } = page.getSize()
const fontSize = 11
page.drawText('แห่งได้เปิดขึ้นแล้วในการขยายรถไฟใต้ดินลอนดอนครั้งใหญ่ครั้งแรกในศตวรรษนี้', {
x: 50,
y: height - 2 * fontSize,
size: fontSize,
font: thaiFont,
color: rgb(0, 0.53, 0.71),
})
const pdfBytes = await pdfDoc.save()
fs.writeFile('thai-test.pdf', pdfBytes, () => console.log('PDF file saved.'))
})()
Version
1.16.0
What environment are you running pdf-lib in?
Node
Required Reading
- [X] I have read www.sscce.org.
- [X] My report includes a Short, Self Contained, Correct (Compilable) Example.
- [X] I have read Smart Questions.
- [X] I have read 45 GitHub Issues Dos and Don'ts.
Additional Notes
No response
I also face this problem. I guess the bug is in UnicodeLayoutEngine class in @pdf-lib/fontkit lib.
for me the same with many fonts
Hey,
I see the same issue here. When I write in document, using fonts by google api, sometimes is added an spaces " " in my text.
like this:

I'm looking for light 💡
@tudor-sandu, is this the issue you guys are experiencing?
same here with helvetica neue roman and helvetica neue condensed It inserts spaces, for example after the sequence of fi, but not after i or f by itself. For example Backoffice becomes Backoffi ce and fifi becomes fi fi
(for Thai font) the issue can be resolved when we use .embedFont(fontBytes, { subset: true });
Don't know why this help.
The effect in the first post is some bytes added to text outside of valid space for the charset. In PDF if there is no character for that byte-sequence (utf8 is multi-byte with variable length), a reader renders it as a space. While when you copy the text, the actual data with the added bytes is copied and when you paste it in a program that renders non-valid/non-printable "chars" as those "glyphs" (the squares in first post), displaying the data as hex (for example 10F0C1), instead of rendered a space.
Also all the examples and my case does not seem like the font just does not have proper glyph for a character.
I also excluded, that some non-printable bytes in the source beforehand. Its being added when rendering the pdf.
https://unicode-table.com/en/search/?q=10F0C1
https://www.unicode.org/charts/PDF/U100000.pdf Quote:
he Supplementary Private Use Area-B block encompasses the entire range of Plane 16. The range U+100000..U+10FFFD is
entirely designated for private use. The last two code points on the plane, U+10FFFE..U+10FFFF, are designated
noncharacters. Consequently, no character code charts or names lists are provided for the majority of this block, except that
a chart and names list are provided for the last 128 code points, to show the location of the noncharacters
(for Thai font) the issue can be resolved when we use
.embedFont(fontBytes, { subset: true });Don't know why this help.
This solution is work for font Khmer also.
@akomm
same here with
helvetica neue romanandhelvetica neue condensedIt inserts spaces, for example after the sequence offi, but not afteriorfby itself. For exampleBackofficebecomesBackoffi ceandfifibecomesfi fi
Try the following
await pdfDoc.embedFont(YOURFONT, { features: { liga: false }, });
It definitely is a bug and in my opinion is an issue that should be fixed: https://github.com/Hopding/pdf-lib/issues/490
(for Thai font) the issue can be resolved when we use
.embedFont(fontBytes, { subset: true });Don't know why this help.
This solution also works for Calibri fonts