Using GroNLP/bert-base-dutch-cased for embeddings

Open florismeininger opened this issue 1 year ago • 1 comments

Your question

Good morning,

First of all thanks for building this awesome repository!

I was wondering if you could help me out with a problem I'm facing. I'm quite new to this so I might be doing it entirely wrong.

I want to create embeddings for calculating text similarity. The embedding model has to be compatible for Dutch language, so I wanted to use this BERT model:

https://huggingface.co/GroNLP/bert-base-dutch-cased

This is my code:

$extractor = pipeline('embeddings', 'GroNLP/bert-base-dutch-cased');
$embeddings = $extractor($value, normalize: true, pooling: 'mean');

This is the error I'm getting:

"Error 0 occurred while trying to load file from https://huggingface.co/GroNLP/bert-base-dutch-cased/resolve/main/onnx/model_quantized.onnx"

Is there another way I can use this model for creating embeddings? Thanks in advance for your time!

Context (optional)

No response

Reference (optional)

No response

Dec 06 '24 08:12 florismeininger

Hi,

If you check the repository at this link, you’ll see that there is no file named model.onnx or model_quantized.onnx, and no onnx directory.

If the repository contains an ONNX file under a different name, you can specify it using the --model-filename option (as documented here).

vendor/bin/transformers download Lajavaness/sentence-camembert-large --model-filename=model_O2

When you need to use it, pass the parameter modelFilename (as documented here)

$extractor = pipeline('embeddings', 'Lajavaness/sentence-camembert-large', modelFilename: 'model_O2');

In your case, you’ll need to convert the model to ONNX format using Optimum (documentation here) and follow the instructions provided here.

Mar 12 '25 10:03 bastienrossi