GLiNER icon indicating copy to clipboard operation
GLiNER copied to clipboard

Regarding the issue of fine-tuning on a specific domain

Open QuangTQV opened this issue 2 years ago • 8 comments

Dear author, has the file examples/finetune.ipynb included negative entity sampling yet? If not, how can we adjust it to incorporate negative entity sampling?

QuangTQV avatar Apr 13 '24 12:04 QuangTQV

It already include in batch negative sampling

urchade avatar Apr 13 '24 12:04 urchade

It already include in batch negative sampling

thanks ^^

QuangTQV avatar Apr 13 '24 16:04 QuangTQV

It already include in batch negative sampling

How can I make the GliNER model biased towards my specific domain data? Because my data domain is prone to confusion with other domains. For example, "harryporter price" is a question about cryptocurrency price, but the model could mistakenly interpret it as a book or something else

QuangTQV avatar Apr 13 '24 17:04 QuangTQV

The solution is fine-tuning the model on your specialized domain. You can for instance generate synthetic data for that

urchade avatar Apr 13 '24 18:04 urchade

The solution is fine-tuning the model on your specialized domain. You can for instance generate synthetic data for that

I know I should fine-tune on my specific domain data, but my dataset compared to the pre-trained model's data is too small. I'm afraid it won't bias towards my data. Do you have any suggestions for a good fine-tuning solution? My data consists of entities within the blockchain domain.

QuangTQV avatar Apr 13 '24 18:04 QuangTQV

Even with small data it should work. How many is it exactly ? I have read someone finetuning with 20-30 samples getting strong performance in his domain

urchade avatar Apr 13 '24 18:04 urchade

Even with small data it should work. How many is it exactly ? I have head someone finetuning with 20-30 samples getting strong performance in his domain

I have 500 samples for each entity, and I need to extract about 8 entities.

QuangTQV avatar Apr 13 '24 18:04 QuangTQV

I did some testing last week and genrated 70 synthetic examples to bias the model to clasifying different kind of labels associated with bird nesting and dietary habbits. It works quite well. If your real world data is fairly consistent, this helps too. You will want to adjust the number of steps in the fine-tune notebook accordingly.

wjbmattingly avatar Apr 14 '24 00:04 wjbmattingly