Chidhambararajan comments

Results 76 comments of


                                            Chidhambararajan

Output doesn't make sense

You might have to finetune a model for your specific use case

Can the model be finetuned with 8 GB cuda memory?

There is no offical code snipped in donut to facilitate the same Try incorporating the suggestions in the link here https://spell.ml/blog/gradient-checkpointing-pytorch-YGypLBAAACEAefHs Let us know if it works

Can the model be finetuned with 8 GB cuda memory?

960x640 works, i guess that the input dimensions should be a multiple of 320 Going below this resolution might give junk, as the pdf might become pixelated and fonts not...

Document Information Extraction in Production - How to know the confidence of each field?

This issue discusses briefly regarding the same https://github.com/clovaai/donut/issues/37, however this gives out a confidence score for the whole json not for individual entities. This models predicts the whole json as...

Document Information Extraction in Production - How to know the confidence of each field?

Thanks for updating your comment, it looks fine now 🙌 As on now there is no direct method to extract confidence scores for specific fields But each and every predicted...

Feature Request: Explanation in documentation about power of additional special tokens

Due to an NDA I cannot exactly share my document classes, but donut's own internal train code performs this https://github.com/clovaai/donut/blob/master/train.py#L66-L70 If you look at this line , this line is...

ASCII only output during training

https://github.com/clovaai/donut/blob/e6623ad56c0e9f12a426dab2d8b2d65a39d64689/donut/model.py#L159-L161 Can I change the pretrained tokenizer from "hyunwoongko/asian-bart-ecjk" to "hyunwoongko/asian-bart-en". The later one is an english only decoder from the same repo, would that do the trick? Cause I...

ASCII only output during training

> > https://github.com/clovaai/donut/blob/e6623ad56c0e9f12a426dab2d8b2d65a39d64689/donut/model.py#L159-L161 > > > > Can I change the pretrained tokenizer from "hyunwoongko/asian-bart-ecjk" to "hyunwoongko/asian-bart-en". The later one is an english only decoder from the same repo, would...

ASCII only output during training

Tried the above mentioned change but still observed other lang charecters in prediction during intermediate epochs `Prediction: Examination Examination Generation: Generation: General Examination: GENERAL EXamination: GENERAL APPEANANACE normal, pleasant, pleasant,...

ASCII only output during training

Will try it out, thanks for the tip!