pdftools icon indicating copy to clipboard operation
pdftools copied to clipboard

pdf_data(pdf, font_info = TRUE) throws error when page is fullscreen image

Open cutterkom opened this issue 4 years ago • 2 comments

When running pdf_data() with font_info = TRUE, it breaks when a page is a fullscreen image.

Error: Error in FUN(X[[i]], ...) : is.data.frame(df) is not TRUE (same behaviour as described here: https://github.com/ropensci/pdftools/issues/88).

Probably a good idea to catch this case.

cutterkom avatar Mar 21 '22 14:03 cutterkom

I am observing the same problem. Performing pdf_data() with font_info = TRUE on a PDF that includes either a full image or blank page throws the same error:

ListWarning: Column sizes are not equal in DataFrame::push_back, object degrading to List Error in FUN(X[[i]], ...) : is.data.frame(df) is not TRUE

Unfortunately, this is a common case for PDFs. Is there a straight forward workaround for this?

kimonkrenz avatar Sep 30 '22 14:09 kimonkrenz

Do you have an example PDF file and some code so that I can reproduce this?

jeroen avatar Oct 01 '22 20:10 jeroen

Hi @jeroen, please find below a reproducible example using the following two pdf files:

adrianadantas.pdf adrianadantas_altered.pdf

require("pdftools")

# 1 load pdf incl. blank page without using font_info pdf_1 <- pdftools::pdf_data(pdf = "~/Desktop/adrianadantas.pdf")

# 2 load pdf incl. blank page using font_info pdf_2 <- pdftools::pdf_data(pdf = "~/Desktop/adrianadantas.pdf", font_info = TRUE)

# 3 load pdf with removed blank page using font_info pdf_3 <- pdftools::pdf_data(pdf = "~/Desktop/adrianadantas_altered.pdf", font_info = TRUE)

View(pdf_3[[1]])

1 and 3 work as expected, 2 throws the following error: Warning: Column sizes are not equal in DataFrame::push_back, object degrading to ListWarning: Column sizes are not equal in DataFrame::push_back, object degrading to ListError in FUN(X[[i]], ...) : is.data.frame(df) is not TRUE

kimonkrenz avatar Oct 03 '22 20:10 kimonkrenz

I think it is fixed. Can you try to install the new version:

install.packages("pdftools", repos =  'https://ropensci.r-universe.dev')

jeroen avatar Oct 04 '22 09:10 jeroen

Installed, tested and works perfectly. Many thanks, @jeroen!

kimonkrenz avatar Oct 04 '22 09:10 kimonkrenz

Thanks, I sent it to CRAN

jeroen avatar Oct 04 '22 20:10 jeroen