Using GroNLP/bert-base-dutch-cased for embeddings
Your question
Good morning,
First of all thanks for building this awesome repository!
I was wondering if you could help me out with a problem I'm facing. I'm quite new to this so I might be doing it entirely wrong.
I want to create embeddings for calculating text similarity. The embedding model has to be compatible for Dutch language, so I wanted to use this BERT model:
https://huggingface.co/GroNLP/bert-base-dutch-cased
This is my code:
$extractor = pipeline('embeddings', 'GroNLP/bert-base-dutch-cased');
$embeddings = $extractor($value, normalize: true, pooling: 'mean');
This is the error I'm getting:
"Error 0 occurred while trying to load file from https://huggingface.co/GroNLP/bert-base-dutch-cased/resolve/main/onnx/model_quantized.onnx"
Is there another way I can use this model for creating embeddings? Thanks in advance for your time!
Context (optional)
No response
Reference (optional)
No response
Hi,
If you check the repository at this link, you’ll see that there is no file named model.onnx or model_quantized.onnx, and no onnx directory.
If the repository contains an ONNX file under a different name, you can specify it using the --model-filename option (as documented here).
vendor/bin/transformers download Lajavaness/sentence-camembert-large --model-filename=model_O2
When you need to use it, pass the parameter modelFilename (as documented here)
$extractor = pipeline('embeddings', 'Lajavaness/sentence-camembert-large', modelFilename: 'model_O2');
In your case, you’ll need to convert the model to ONNX format using Optimum (documentation here) and follow the instructions provided here.