doctr Adding ViTSTR

Adding Vision Transformer for scene text recognition i work currently on this (with huggingface ViT backbone) if i done and have solid results it would be a charme for me to add this model if you interested !? :) Same for the new unilm/TrOCR model

Sep 29 '21 21:09 felixdittrich92

Hi @felixdittrich92,

Thanks for your message, it would be a pleasure having you contributing to the lib!

We already have a recognition model including a transformer decoder (MASTER), but we do not have yet full transformer architectures such as ViT or TrOCR. It is on the mid-term road map, and if you would like to propose your implementation you are more than welcome to open a PR! :pray:

Please read the CONTRIBUTING section and feel free to look at the models already implemented in doctr :smiley:

Thank you and have a nice day :+1:

Sep 30 '21 14:09 charlesmindee

i will do thanks :) :+1:

Sep 30 '21 15:09 felixdittrich92

Hi @felixdittrich92, do you still plan to implement this ? If not, we may close this issue to avoid a huge stack of unaddressed ones!

Apr 28 '22 08:04 charlesmindee

Huhu @charlesmindee :wave: , yes of course (maybe a bit lighter version with mobilevit) but i think ftm there are other thinks like a fix for master and sar are more important so i would say lets hold this on 1.0.0 wdyt ? :+1:

Apr 28 '22 11:04 felixdittrich92

ok

Apr 29 '22 09:04 charlesmindee

@felixdittrich92 Hi, are there any model weights available for ViTSTR that are compatible with doctr? :)

I saw these ones but they seem to be named differently I suppose: https://github.com/roatienza/deep-text-recognition-benchmark/releases

Jan 30 '23 22:01 chpatrick