Thibault Douzon comments

Results 4 comments of


                                            Thibault Douzon

Scale.x_continuous labels formatting not working with Geom.contour

From what I understood (I couldn't make the debugger enter the render_prepare sadly), the plot contains duplicate scales for x and y, the second (incorrect one without labels) erasing the...

fix LayoutLMv3TokenizerFast subword label after 'Ġ' token

Hi @ArthurZucker, thanks for your investigations. This PR fixes the problem for LayoutLMv3 but I expect the problem to exist on other models using Fast BPE tokenization, I will take...

fix LayoutLMv3TokenizerFast subword label after 'Ġ' token

LayoutLMv2 uses WordPiece and not BPE. From what I saw its vocabulary does not contain empty token and thus cannot produce (0, 0) offset_mapping when encoding.

LayoutLMv3 Processor - subword does not get assigned -100 with unusual words

The same problem arises with all BPE based tokenizers. Example with LayoutXLM: ``` import numpy as np from transformers import LayoutXLMTokenizerFast processor = LayoutXLMTokenizerFast.from_pretrained( "microsoft/layoutxlm-base", apply_ocr=False ) words = ["pencil",...