docling Convert model weights to safetensors format

We want to move from pickled objects saved by torch or torch.jit to safetensors format for the weights of docling-ibm-models. This has various advantages, such as better security, and also acts as a pre-requisite to achieve proper accelerator support across all models.

Nov 11 '24 13:11 cau-git

There is work-in-progess on this PR: https://github.com/DS4SD/docling-ibm-models/pull/50

Nov 25 '24 09:11 cau-git

Looking forward to this one! I'm working on a txtai integration for docling and the biggest downside is speed. For some PDFs that are a couple pages it takes 14s to extract vs 200ms with existing methods (Apache Tika). Obviously, the upside is all the formatting being preserved. But if 14s could go down to a couple seconds even it would be a big win.

Dec 03 '24 11:12 davidmezzetti

Docling v2.12.0 has its models in safetensors format: https://github.com/DS4SD/docling/releases/tag/v2.12.0

Dec 15 '24 16:12 nikos-livathinos