GLiNER Fine-tuning fogetfulness

I am working on fine-tuning a model and running into a "forgetful" situation I wanted to bring to your attention.

The 2 changes we made to the finetuning Jupyter notebook are:

create PyCharm Python script
Change output and provide scores

model: urchade/gliner_small json: sample_data.json num_steps = 500 batch_size = 8 data_size = 57 num_batches = 7 num_epochs = 7

Before training results: Cristiano Ronaldo > Person > 0.9846 Ballon d'Or > Award > 0.9413 UEFA Men's Player of the Year Awards > Award > 0.8620 European Golden Shoes > Award > 0.9594

After training, using final model: Cristiano Ronaldo dos Santos Aveiro > Person > 0.9472 Ballon d'Or awards > Award > 0.8051 UEFA Men's Player of the Year Awards > Award > 0.9852 European Golden Shoes > Award > 0.9863 outfield player > Person > 0.8722

Model retained original entities (although the scores changed) and even predicted a new entity. So I think the finetuning Juypter file works for your sample data just fine.

Our data set is composed of 72 records, which after the 90% split, there are 64 records in the training set, 8 in the test set. All records are for a single label, EntC.

num_steps = 500 batch_size = 8 data_size = 64 num_batches = 8 num_epochs = 62

Before training, results are: EntA > OurLabel > 0.8799 EntA > OurLabel > 0.8288 EntB > OurLabel > 0.7210 EntA > OurLabel > 0.8052 EntA > OurLabel > 0.7026 EntC > OurLabel > 0.5243 EntA > OurLabel > 0.7475

After training, results are: EntC > OurLabel > 1.0000

The model now finds EntC with a score of 1.000, but it is as if the last model completely forgot all other entities except EntC. Any thoughts as to why the forgetfulness could be happening?

While I cannot disclose the entity names or label, I can say that all entities are three-characters long.

Any suggestions are appreciated, thank you.

Jul 28 '24 22:07 davidress-ILW

Hi @davidress-ILW

It seems like your model is experiencing catastrophic forgetting, where it heavily overfits to the new data (EntC) and forgets the previous entities. This is a common issue in continual learning and fine-tuning scenarios.

To mitigate this problem, you can use Experience Replay. This involves maintaining a buffer of orginal data (in this case the pile ner dataset) and periodically use these samples during training. By doing this, you can ensure that the model retains knowledge of the previously learned entities while learning new ones.

Aug 03 '24 13:08 urchade

Adding pile ner data with my training data fixed this issue.

@urchade What is the best ratio of mixing pile ner data set with our training data set?

pile ner has 45K+ entries my training data has only 200+ entries.

Aug 06 '24 01:08 KUMBLE

@urchade Thank you. I really appreciate you sharing your knowledge with me and the broader community by answering these questions. I found the pile NER data so as @KUMBLE mentioned, is there is preferred means of mixing the pile NER data with out custom data sets?

The software you have developed, GLiNER, GraphER, etc are simply fabulous.

Aug 07 '24 13:08 davidress-ILW

Hi @davidress-ILW. You can try this. Let:

Sample A: Your new dataset
Sample B: Sampled dataset from pile-ner (eg.: 2x size of Sample A)

Then, mix Sample A and Sample B to create a new data for training. (optionally) Use another Sample B after each epoch .

Aug 07 '24 13:08 urchade

Hello @urchade

Thank you for the reply on mixing training data with ner pile data.

For my testing, I found Sample B needed to be 5x the size of Sample A

I then mixed Sample A and Sample B (shuffled) to randomize the data.

I say 5x as that ration enabled GLiNER to predict everything found before fine-tuning at high scores, entities that were missed, with the "best" model found during the fine-tuning. So, the fine-tuning appeared to work.

However, I notices that the eval_loss metric was always between 220 and 270 (regardless of mix, ie, 2x, 3x, 4x, and 5x), which I do not understand. Is there a way to extract all the training metrics from a fine-tuning? Should I be concerned about the high eval_loss values?

Thank you again for the efforts you and your team have put into GLiNER. So much easier to fine-tune than other NER models. I also appreciate the support.

Aug 28 '24 14:08 davidress-ILW

Hi @davidress-ILW, I recommend focusing more on metrics like the F1-score rather than relying heavily on the loss metric. The loss value is influenced by several factors, and a value of 200 might be close to the lower bound, especially since the loss reduction is set to sum by default. Additionally, the number of spans in an input is L*K, where L represents the length and K is the maximum span size.

Aug 28 '24 16:08 urchade

Hi dear @davidress as a part of my master thesis I am currently fine-tuning gliner for my master thesis and I wanted to ask you, if you extended the fine tuning script by adding calculating the f1 score during training and taking the f1 score as the metrics for saving the best model. I am running into errors when I try that and it would help me so much. Best regards, Christina

Apr 18 '25 17:04 ChristinaPetschnig

Hi @ChristinaPetschnig Did you get the script to calculate F1 score while finetuning?

@urchade It would be great if you can modify the training script to include F1 score metric as the original scripts has only training loss and validation loss

May 02 '25 16:05 Priyabrata017