GLiNER
GLiNER copied to clipboard
How do I tokenize my data to prepare for finetuning 'urchade/gliner_multi-v2.1'
@urchade Hello everyone, I want to fine-tune the multi-v2.1 version on my data. As we can see in the example finetune.ipynb the data.json file is being read which is already tokenized. I want to know how can I tokenize my data to use it for fine-tuning. If there is an example file for that. Some context my data is mix bag of ids, alphanumeric items, customer names, punctionation marks etc.
Appreciate any help.
Thanks
If you have the raw texts, you could use from gliner.data_processing.tokenizer import WordsSplitter and use WordsSplitter.