language-models issues

'pdf' field missing in DocLayNet dataset on HF

Hi @piegu! Thanks for processing the DocLayNet dataset into smaller portions. It really helps for fast experimentations! It was especially useful to have the byte stream of the pdfs in...

Ulipenitz

seems not to be NE tag.'.format(chunk)) UserWarning: <Any Label> seems not to be NE tag.

Hi, I want you to ask about Title, I'm working similar problem and using LayoutXLM and my own tokenizer but I can't understand **seems not to be NE tag.'.format(chunk)) UserWarning:...

smertakcay

Adding License to the repository

Hi Pierre! And thanks for the amazing work you are doing! Could you kindly consider adding the License to the repository? "without a license, the default copyright laws apply, meaning...

legendawes

How did you created DocLayNet-small

Hi @piegu, Thank you for creating DocLayNet datasets (small, base and large). It's very time saving in finetune model for downstream task. I have question on bounding boxes. I checked...

mit1280

docs: demo, experiments and live inference API on Tiyaro

Hello Pierre Guillou (@piegu) ! Thank you for your work on piegu/language-models. This GitHub project is interesting, and we think that it would be a great addition to make this...

ijonglin

Pourriez-vous s'il vous plaît partager une liste des packages installés pour ce projet ainsi que les versions nécessaires ?

Bonjour, D'abord, merci beaucoup pour votre travail et pour le temps que vous avez passé à entraîner à ces modèles. Je souhaiterais m'inspirer de votre notebook lm3-french-classifier-amazon.ipynb pour finetuner un...

ombelinelage

Est-il possible de fine-tuner le modèle pré-entrainé sur un nouveau dataset pour de la génération de texte ?

7

Bonjour, merci d'avoir partagé votre code et les différents modèles pré-entrainés. J'ai téléchargé le corpus de Wikipédia et le premier modèle afin de faire tourner le notebook **lm-french-generation.ipynb**. Est-il possible...

aquadzn

Getting the same value for all evaluation metrics

1

I’m working on finetuning LILT model on a custom dataset where the labels aren’t exactly IOB format .When using seqeval i get an error telling me the tags aren’t NER...

ameni-ayedi

Notebook or script require For LiLT to train document classification using transformers

Hi Piegu, in the search for training LiLT for sequence classification for document classification like RVL-CDIP, I have not found any relevant notebook or script , though it has shown...

madhavi1102

Unable to write images to metadata

i have my setup as ``` elements = partition_pdf( filename=pdf_path, strategy="hi_res", chunking_strategy="by_title", include_orig_elements=True, extract_images_in_pdf=True, extract_image_block_types=["Image", "Table"], extract_image_block_output_dir=str(self.dirs["images"]), # Save images to disk extract_image_block_to_payload=False, # Ensure base64 is not used include_page_breaks=True,...

meltedhead

language-models
language-models copied to clipboard

Metadata

'pdf' field missing in DocLayNet dataset on HF

seems not to be NE tag.'.format(chunk)) UserWarning: <Any Label> seems not to be NE tag.

Adding License to the repository

How did you created DocLayNet-small

docs: demo, experiments and live inference API on Tiyaro

Pourriez-vous s'il vous plaît partager une liste des packages installés pour ce projet ainsi que les versions nécessaires ?

Est-il possible de fine-tuner le modèle pré-entrainé sur un nouveau dataset pour de la génération de texte ?

Getting the same value for all evaluation metrics

Notebook or script require For LiLT to train document classification using transformers

Unable to write images to metadata

← Metadata

Owner

Metadata

language-models language-models copied to clipboard

Metadata

← Metadata

Owner

Metadata

language-models
language-models copied to clipboard