GLiNER icon indicating copy to clipboard operation
GLiNER copied to clipboard

Onnx converted model has slower inference

Open yogitavm opened this issue 1 year ago • 3 comments

I finetuned gliner small v2.1 model and created onnx version of the same model using the convert_to_onnx.ipynb exmple code. When I compared the inference time of both models, the onnx version took 50% more time.

This is how I'm loading the model: model = GLiNER.from_pretrained(model_path, load_onnx_model=True, load_tokenizer=True)

yogitavm avatar Sep 17 '24 09:09 yogitavm

From my experiments, ONNX models work faster for sequences smaller than 124 words. With a longer input sequence, attention becomes the limiting factor and ONNX is not necessarily more efficient than PyTorch. The main purpose of ONNX is to enable easier conversion of models between different frameworks and running in other environments. If you need efficient inference on CPU I would recommend to try GLiNER.cpp it is consistently faster than Pytorch and enables up to 2x acceleration.

Ingvarstep avatar Sep 21 '24 19:09 Ingvarstep

Thanks @Ingvarstep. I was going through GLiNER.cpp and could not find license details. Is it Apache 2.0 or MIT licensed?

yogitavm avatar Oct 01 '24 12:10 yogitavm

Thanks @Ingvarstep. I was going through GLiNER.cpp and could not find license details. Is it Apache 2.0 or MIT licensed?

It's Apache 2.0

Ingvarstep avatar Oct 01 '24 16:10 Ingvarstep