GLiNER icon indicating copy to clipboard operation
GLiNER copied to clipboard

Returned entities don't provide information

Open Xiaomin-HUANG opened this issue 1 year ago • 1 comments

Model version : "knowledgator/gliner-multitask-large-v0.5", "urchade/gliner_multi-v2.1",

Issue : I used those 2 models to detect ["name_surname", "email","organization", "phone_number"], but some returned entities didn't bring any useful information.

Examples :

              'phone_number': ['numéro', '75', '73', 'numéro de téléphone', 'numéro'] => (I only want the phone number, but not those letters)
              'name_surname': [  'madame', 'madame foucard','madame', 'mr' ....], => (I only want a person's name, but the "madame","mr" are appellation in conversation, they didn't bring any wanted info   )
              'email': ['mail', 'mail'] => (I want the email address in stead of label name )

PS : Those unwanted entities, which are similar to label names, have a high confident score ( like 0.95). So if there are any method to filter those undesired entities ? Thank you so much.

Xiaomin-HUANG avatar Jul 02 '24 13:07 Xiaomin-HUANG

@Xiaomin-HUANG , I think this artifacts of dataset on which this models were trained, the best way to fix it - fine-tune your model. I would like you recommend this notebook. It contains Gradio interfaces to help you label your data. Considering your tasks the amount of required examples should be small.

Ingvarstep avatar Jul 02 '24 16:07 Ingvarstep