pdf_data(pdf, font_info = TRUE) throws error when page is fullscreen image
When running pdf_data() with font_info = TRUE, it breaks when a page is a fullscreen image.
Error: Error in FUN(X[[i]], ...) : is.data.frame(df) is not TRUE (same behaviour as described here: https://github.com/ropensci/pdftools/issues/88).
Probably a good idea to catch this case.
I am observing the same problem. Performing pdf_data() with font_info = TRUE on a PDF that includes either a full image or blank page throws the same error:
ListWarning: Column sizes are not equal in DataFrame::push_back, object degrading to List Error in FUN(X[[i]], ...) : is.data.frame(df) is not TRUE
Unfortunately, this is a common case for PDFs. Is there a straight forward workaround for this?
Do you have an example PDF file and some code so that I can reproduce this?
Hi @jeroen, please find below a reproducible example using the following two pdf files:
adrianadantas.pdf adrianadantas_altered.pdf
require("pdftools")
# 1 load pdf incl. blank page without using font_info
pdf_1 <- pdftools::pdf_data(pdf = "~/Desktop/adrianadantas.pdf")
# 2 load pdf incl. blank page using font_info
pdf_2 <- pdftools::pdf_data(pdf = "~/Desktop/adrianadantas.pdf", font_info = TRUE)
# 3 load pdf with removed blank page using font_info
pdf_3 <- pdftools::pdf_data(pdf = "~/Desktop/adrianadantas_altered.pdf", font_info = TRUE)
View(pdf_3[[1]])
1 and 3 work as expected, 2 throws the following error: Warning: Column sizes are not equal in DataFrame::push_back, object degrading to ListWarning: Column sizes are not equal in DataFrame::push_back, object degrading to ListError in FUN(X[[i]], ...) : is.data.frame(df) is not TRUE
I think it is fixed. Can you try to install the new version:
install.packages("pdftools", repos = 'https://ropensci.r-universe.dev')
Installed, tested and works perfectly. Many thanks, @jeroen!
Thanks, I sent it to CRAN