Satheesh K comments

Results 10 comments of


                                            Satheesh K

IndexError: The shape of the mask [114] at index 0 does not match the shape of the indexed tensor [149, 256, 14, 14] at index 0

@Dengligedeng Hi, I am also facing the same issue. Are you able to solve this issue?

Use multiprocessing to parallely process PDF pages

Hi @vinayak-mehta , Even I thought of implementing this. [dramatiq](https://github.com/Bogdanp/dramatiq) or [celery](https://github.com/celery/celery) are my suggestions for asynchronous processing of pages.

Working very well on training but doing pretty poorly at inference

what is the prompt you are using while inference? most probably it is messing up the json output. If you have fine-tuned base model for your task, try using ``...

Working very well on training but doing pretty poorly at inference

did you fine-tune the model on your custom dataset? or trying to use the off the shelf `cord-v2` model?

Working very well on training but doing pretty poorly at inference

I see some changes compared to the code in this repo. but still I suggest checking your prompt at inference time if you think edit distance very low during training...

Great library, but dependencies ??!!

@vinayak-mehta , Have you tried pdftoppm( poppler utils) for converting pdf to png.

Great library, but dependencies ??!!

Ok, In this [post](https://serverfault.com/questions/167573/fast-pdf-to-jpg-conversion-on-linux-wantedl), there was one more suggestion to do this with [MuPDF](https://mupdf.com/index.html)

Cannot export Donut models to ONNX

Hi @WaterKnight1998 @mht-sharma , Do you have inference script for Donut document parsing model using encoder and decoder onnx models? Similar to this [TrOCR gist](https://gist.github.com/mht-sharma/f38c670930ac7df413c07327e692ee39)

Llama2 70B SFT with FSDP failing

I have used `convert_llama_hf_to_nemo.py` script to convert llama2 70B model from huggingface format to NeMo format. Here is the exact command ``` python3 -u /opt/NeMo/scripts/checkpoint_converters/convert_llama_hf_to_nemo.py --input_name_or_path=/workspace/llama2_models --output_path=/workspace/llama2_models/llama2-70b-base.nemo ```

how to extract entire text for given image using Donut pretrained model?

I think it is straightforward to get text output using donut-base model. Load `naver-clova-ix/donut-base` from huggingface and use `` as prompt. ``` from donut import DonutModel import torch from PIL...