pdf2json icon indicating copy to clipboard operation
pdf2json copied to clipboard

Passages with spaces joined by periods rather than split into separate words

Open gordonbisnor opened this issue 5 years ago • 0 comments

We have noticed an issue where in somes cases pieces of our text are joined by periods into one massive word, rather than split by spaces into individual array members, eg:

[1027,54,538,27,38,"Churches.set.up.Christian.schools.in.the.early.1800s..Some.Indigenous.peoples.were."]

Not sure if you have any idea what might cause this – if it’s an issue in our PDFs or something that pdf2json is getting wrong for some reason?

gordonbisnor avatar May 24 '20 11:05 gordonbisnor