Transformers-Tutorials funsd dataset for layoutlmv3

How was the funsd dataset generated for layoutlmv3 in the tutorial?

It looks like it has segment positional features instead of word-level positional embeddings?

Jul 25 '22 17:07 wandering-walrus

Looks like I submitted this too soon. It looks like the word-level positions were programmatically adjusted to be segment-level positions based on the labels. Is the recommendation to do this for inference using the layoutlmv3 model fine-tuned on the publaynet dataset to get these segment positions?

Jul 25 '22 17:07 wandering-walrus

Hi,

Thanks for your interest in LayoutLMv3. The model indeed leverages segment position embeddings instead of position embeddings per word, and this seems to greatly improve performance.

Indeed, at inference time, you would need to have a model that first recognizes the segments, to make sure you can group the words per segment. Maybe there are also OCR engines out there which can recognize segments directly.

Jul 25 '22 18:07 NielsRogge

Thanks, @NielsRogge. I haven't been able to find much regarding this topic. Do you think layoutlmv3 finetuned on the publaynet dataset could be used for this?

Jul 25 '22 20:07 wandering-walrus

Hi @NielsRogge, I also have a question about how the FUNSD dataset was made for LayoutLMv3. Here I see how you retrieved the segment-level bounding boxes. But is there a reason why tokens are not labelled according to the usual BIOES schema? It looks to me that you only have added B- and I- labels to each token; could that also explain the final performances of the model?

Aug 17 '22 14:08 AleRosae