pdf2json
pdf2json copied to clipboard
Passages with spaces joined by periods rather than split into separate words
We have noticed an issue where in somes cases pieces of our text are joined by periods into one massive word, rather than split by spaces into individual array members, eg:
[1027,54,538,27,38,"Churches.set.up.Christian.schools.in.the.early.1800s..Some.Indigenous.peoples.were."]
Not sure if you have any idea what might cause this – if it’s an issue in our PDFs or something that pdf2json is getting wrong for some reason?