docling icon indicating copy to clipboard operation
docling copied to clipboard

Investigate how to remove dependency of OpenCV

Open PeterStaar-IBM opened this issue 1 year ago • 4 comments

PeterStaar-IBM avatar Oct 03 '24 12:10 PeterStaar-IBM

The OpenCV is a dependency for docling-ibm-models(https://github.com/DS4SD/docling-ibm-models), where it is used to load and resize images for the TableFormer model.

Here are the places where opencv is called:

Additionally in order to correctly evaluate the effect of replacing opencv with another image library, we should measure the impact at:

  • TableFormer input tensors (image loading, normalization, resizing).
  • TableFormer output predictions.

nikos-livathinos avatar Oct 03 '24 14:10 nikos-livathinos

@nikos-livathinos I ran a quick test with Pillow. I created two functions to resize the image, one using Pillow and other using openCV. Both were given the same input image with the input type numpy.ndarray

Used python's timeit module for 100 runs Below are some performance numbers:

Avg time for OpenCV: 0.008613 seconds
Avg time for Pillow: 0.072365 seconds

Now, instead of numpy.ndarray if we provide the image file path as the input and open the image with Pillow directly, performance gets slightly better: Avg time for Pillow: 0.067890 seconds

sgonsal avatar Nov 07 '24 20:11 sgonsal

@sgonsal could you also try the same with torchvision.transforms.functional.resize?

With a torch.Tensor input, it relies on torch.nn.functional.interpolate: https://github.com/pytorch/vision/blob/cb9fdbf11f884b0501d1c23a48af258ab4acb57f/torchvision/transforms/_functional_tensor.py#L467

@pavel-denisov-fraunhofer Done. Ran a similar test as my comment above. This time with torchvision.transforms.functional.resize

Avg time for torchvision: 0.071132 seconds

sgonsal avatar Dec 02 '24 21:12 sgonsal