Convert model weights to safetensors format
We want to move from pickled objects saved by torch or torch.jit to safetensors format for the weights of docling-ibm-models. This has various advantages, such as better security, and also acts as a pre-requisite to achieve proper accelerator support across all models.
There is work-in-progess on this PR: https://github.com/DS4SD/docling-ibm-models/pull/50
Looking forward to this one! I'm working on a txtai integration for docling and the biggest downside is speed. For some PDFs that are a couple pages it takes 14s to extract vs 200ms with existing methods (Apache Tika). Obviously, the upside is all the formatting being preserved. But if 14s could go down to a couple seconds even it would be a big win.
Docling v2.12.0 has its models in safetensors format: https://github.com/DS4SD/docling/releases/tag/v2.12.0