GLiNER icon indicating copy to clipboard operation
GLiNER copied to clipboard

convert_to_onnx.py script performs bad with urchade/gliner_multi-v2.1 base

Open mikeg27 opened this issue 7 months ago • 1 comments

Hi, I've tried to quantize the base model urchade/gliner_multi-v2.1 using the convert_to_onnx.py script.
Unfortunately, the model prediction quality degraded significantly after quantization.
This does not happen for urchade/gliner_small-v2.1.


🧪 Environment:

  • Python: 3.10.18
  • GLiNER: 0.2.21
  • Torch: 2.7.0

✅ Models tested:

Model Variant Base Performance Quantized Performance
urchade/gliner_small_v2.1 ✅ good ✅ decent
urchade/gliner_multi_v2.1 ✅ good ⚠️ poor

gliner_small_v2.1 ONNX test

# Load ONNX model
model = GLiNER.from_pretrained(model_name, load_onnx_model=True, load_tokenizer=True)

text = """Daniel Evans has a passport number K01234567 and a permanent address
at 500 Pine Street, Seattle. His contact number is +1-206-555-0199."""

labels = ["name", "passport_number", "street_address", "phone_number"]
entities = model.predict_entities(text, labels, threshold=0.4)

for entity in entities:
    print(entity["text"], "=>", entity["label"], entity["score"])

Output:

Daniel Evans => name 0.9867521524429321  
K01234567 => passport_number 0.8854160904884338  
500 Pine Street => street_address 0.9752442240715027  
+1-206-555-0199 => phone_number 0.7612784504890442

⚠️ gliner_small_v2.1 quantized test

# Load quantized ONNX model
model_quant = GLiNER.from_pretrained(
    model_name,
    load_onnx_model=True,
    load_tokenizer=True,
    onnx_model_file="model_quantized.onnx"
)

text = """Daniel Evans has a passport number K01234567 and a permanent address
at 500 Pine Street, Seattle. His contact number is +1-206-555-0199."""

labels = ["name", "passport_number", "street_address", "phone_number"]
entities = model_quant.predict_entities(text, labels, threshold=0.1)

for entity in entities:
    print(entity["text"], "=>", entity["label"], entity["score"])

Output:

Daniel Evans => name 0.9830107688903809  
K01234567 => passport_number 0.9768995642662048  
500 Pine Street, Seattle => street_address 0.9961038065190253  
+1-206-555-0199 => phone_number 0.453078476495643

gliner_multi_v2.1 quantized test

# Load quantized ONNX model
model_quant = GLiNER.from_pretrained(
    model_name,
    load_onnx_model=True,
    load_tokenizer=True,
    onnx_model_file="model_quantized.onnx"
)

text = """Daniel Evans has a passport number K01234567 and a permanent address
at 500 Pine Street, Seattle. His contact number is +1-206-555-0199."""

labels = ["name", "passport_number", "street_address", "phone_number"]
entities = model_quant.predict_entities(text, labels, threshold=0.1)

for entity in entities:
    print(entity["text"], "=>", entity["label"], entity["score"])

Output:

Daniel Evans => name 0.21395745873451233  
K01234567 => passport_number 0.15609435737133026

Would appreciate any guidance on how to maintain quality post-quantization for the multi variant.

mikeg27 avatar Jul 02 '25 10:07 mikeg27

Hi @mikeg27, post-quantization performance is largely determined by the model’s training regime. Models not trained with quantization-aware methods often suffer from activation/weight distribution shifts once precision is reduced. Since none of these models were QAT-trained, accuracy varies depending on their robustness and capacity. urchade/gliner_small_v2.1 is a smaller model with simpler internal representations, so it’s less sensitive to quantization-induced perturbations and therefore degrades less when precision changes.

Ingvarstep avatar Nov 27 '25 09:11 Ingvarstep