optimum onnx export for cuda does not work

System Info

$ lsb_release -a
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.6 LTS
Release:        20.04
Codename:       focal

$ python --version
Python 3.10.10

$ pip list
Package                   Version
------------------------- --------------
absl-py                   2.1.0
aiohttp                   3.9.5
aiosignal                 1.3.1
annotated-types           0.6.0
anyio                     4.3.0
argon2-cffi               23.1.0
argon2-cffi-bindings      21.2.0
arrow                     1.3.0
asttokens                 2.4.1
async-lru                 2.0.4
async-timeout             4.0.3
attrs                     23.2.0
awscrt                    0.20.9
Babel                     2.14.0
backoff                   2.2.1
beautifulsoup4            4.12.3
bleach                    6.1.0
boto3                     1.34.96
botocore                  1.34.96
cachetools                5.3.3
certifi                   2024.2.2
cffi                      1.16.0
charset-normalizer        3.3.2
click                     8.1.7
coloredlogs               15.0.1
comm                      0.2.2
contourpy                 1.2.1
cycler                    0.12.1
datasets                  2.19.1
debugpy                   1.8.1
decorator                 5.1.1
defusedxml                0.7.1
dill                      0.3.8
evaluate                  0.4.2
exceptiongroup            1.2.1
executing                 2.0.1
fastapi                   0.110.0
fastjsonschema            2.19.1
filelock                  3.14.0
fire                      0.6.0
flatbuffers               24.3.25
fonttools                 4.51.0
fqdn                      1.5.1
frozenlist                1.4.1
fsspec                    2024.3.1
google-auth               2.29.0
google-auth-oauthlib      1.2.0
grpcio                    1.63.0
h11                       0.14.0
huggingface-hub           0.23.0
humanfriendly             10.0
idna                      3.7
ipykernel                 6.26.0
ipython                   8.17.2
ipywidgets                8.1.1
isoduration               20.11.0
jedi                      0.19.1
Jinja2                    3.1.3
jmespath                  1.0.1
joblib                    1.4.0
json5                     0.9.25
jsonpointer               2.4
jsonschema                4.22.0
jsonschema-specifications 2023.12.1
jupyter_client            8.6.1
jupyter_core              5.7.2
jupyter-events            0.10.0
jupyter-lsp               2.2.5
jupyter_server            2.14.0
jupyter_server_terminals  0.5.3
jupyterlab                4.0.6
jupyterlab_pygments       0.3.0
jupyterlab_server         2.27.1
jupyterlab_widgets        3.0.10
kiwisolver                1.4.5
lightning                 2.2.4
lightning-cloud           0.5.64
lightning-remote-profiler 0.0.6
lightning_sdk             0.1.7
lightning-utilities       0.10.1
litdata                   0.2.2
Markdown                  3.6
markdown-it-py            3.0.0
MarkupSafe                2.1.5
matplotlib                3.8.2
matplotlib-inline         0.1.7
mdurl                     0.1.2
mistune                   3.0.2
mpmath                    1.3.0
multidict                 6.0.5
multiprocess              0.70.16
nbclient                  0.10.0
nbconvert                 7.16.4
nbformat                  5.10.4
nest-asyncio              1.6.0
networkx                  3.3
ninja                     1.11.1.1
notebook_shim             0.2.4
numpy                     1.26.2
nvidia-cublas-cu12        12.1.3.1
nvidia-cuda-cupti-cu12    12.1.105
nvidia-cuda-nvrtc-cu12    12.1.105
nvidia-cuda-runtime-cu12  12.1.105
nvidia-cudnn-cu12         8.9.2.26
nvidia-cufft-cu12         11.0.2.54
nvidia-curand-cu12        10.3.2.106
nvidia-cusolver-cu12      11.4.5.107
nvidia-cusparse-cu12      12.1.0.106
nvidia-nccl-cu12          2.19.3
nvidia-nvjitlink-cu12     12.4.127
nvidia-nvtx-cu12          12.1.105
oauthlib                  3.2.2
objprint                  0.2.3
onnx                      1.16.0
onnxconverter-common      1.14.0
onnxruntime-gpu           1.17.1
optimum                   1.19.2
overrides                 7.7.0
packaging                 24.0
pandas                    2.1.4
pandocfilters             1.5.1
parso                     0.8.4
pexpect                   4.9.0
pillow                    10.3.0
pip                       24.0
platformdirs              4.2.1
prometheus_client         0.20.0
prompt-toolkit            3.0.43
protobuf                  3.20.2
psutil                    5.9.8
ptyprocess                0.7.0
pure-eval                 0.2.2
pyarrow                   16.0.0
pyarrow-hotfix            0.6
pyasn1                    0.6.0
pyasn1_modules            0.4.0
pycparser                 2.22
pydantic                  2.7.1
pydantic_core             2.18.2
Pygments                  2.17.2
PyJWT                     2.8.0
pyparsing                 3.1.2
python-dateutil           2.9.0.post0
python-json-logger        2.0.7
python-multipart          0.0.9
pytorch-lightning         2.2.4
pytz                      2024.1
PyYAML                    6.0.1
pyzmq                     26.0.3
referencing               0.35.1
regex                     2024.5.15
requests                  2.31.0
requests-oauthlib         2.0.0
rfc3339-validator         0.1.4
rfc3986-validator         0.1.1
rich                      13.7.1
rpds-py                   0.18.0
rsa                       4.9
s3transfer                0.10.1
safetensors               0.4.3
scikit-learn              1.3.2
scipy                     1.11.4
Send2Trash                1.8.3
sentence-transformers     2.7.0
sentencepiece             0.2.0
setfit                    1.0.3
setuptools                68.2.2
simple-term-menu          1.6.4
six                       1.16.0
skl2onnx                  1.16.0
sniffio                   1.3.1
soupsieve                 2.5
stack-data                0.6.3
starlette                 0.36.3
sympy                     1.12
tensorboard               2.15.1
tensorboard-data-server   0.7.2
termcolor                 2.4.0
terminado                 0.18.1
threadpoolctl             3.5.0
tinycss2                  1.3.0
tokenizers                0.19.1
tomli                     2.0.1
torch                     2.2.1+cu121
torchmetrics              1.3.1
torchvision               0.17.1+cu121
tornado                   6.4
tqdm                      4.66.2
traitlets                 5.14.3
transformers              4.40.2
triton                    2.2.0
types-python-dateutil     2.9.0.20240316
typing_extensions         4.11.0
tzdata                    2024.1
uri-template              1.3.0
urllib3                   2.2.1
uvicorn                   0.29.0
viztracer                 0.16.2
wcwidth                   0.2.13
webcolors                 1.13
webencodings              0.5.1
websocket-client          1.8.0
Werkzeug                  3.0.2
wheel                     0.41.2
widgetsnbextension        4.0.10
xxhash                    3.4.1
yarl                      1.9.4

$ nvidia-smi
Fri May 17 04:59:05 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.08             Driver Version: 535.161.08   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       On  | 00000000:00:1E.0 Off |                    0 |
| N/A   26C    P8               8W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Who can help?

@michaelbenayoun @JingyaHuang @echarlaix @simoninithomas @amyeroberts

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

hi,

i trained a setfit model with a small dataset and tried to export it to onnx for cuda. it seems the conversion fails. can someone show me how to export it?

which huggingface transformer did i use and train?

sentence-transformers/all-MiniLM-L6-v2

see: https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2

here all infos regarding the trained model:

from setfit import SetFitModel

model = SetFitModel.from_pretrained("setfit-test-model")
print("model.model_head:", model.model_head)
print("model.model_body:", model.model_body)
print("model.model_body[0].auto_model:", model.model_body[0].auto_model)
print("model.model_body[0].auto_model.config:", model.model_body[0].auto_model.config)

output:

model.model_head: LogisticRegression()
model.model_body: SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)
model.model_body[0].auto_model: BertModel(
  (embeddings): BertEmbeddings(
    (word_embeddings): Embedding(30522, 384, padding_idx=0)
    (position_embeddings): Embedding(512, 384)
    (token_type_embeddings): Embedding(2, 384)
    (LayerNorm): LayerNorm((384,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): BertEncoder(
    (layer): ModuleList(
      (0-5): 6 x BertLayer(
        (attention): BertAttention(
          (self): BertSelfAttention(
            (query): Linear(in_features=384, out_features=384, bias=True)
            (key): Linear(in_features=384, out_features=384, bias=True)
            (value): Linear(in_features=384, out_features=384, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (output): BertSelfOutput(
            (dense): Linear(in_features=384, out_features=384, bias=True)
            (LayerNorm): LayerNorm((384,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (intermediate): BertIntermediate(
          (dense): Linear(in_features=384, out_features=1536, bias=True)
          (intermediate_act_fn): GELUActivation()
        )
        (output): BertOutput(
          (dense): Linear(in_features=1536, out_features=384, bias=True)
          (LayerNorm): LayerNorm((384,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
  )
  (pooler): BertPooler(
    (dense): Linear(in_features=384, out_features=384, bias=True)
    (activation): Tanh()
  )
)
model.model_body[0].auto_model.config: BertConfig {
  "_name_or_path": "setfit-test-model",
  "architectures": [
    "BertModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 384,
  "initializer_range": 0.02,
  "intermediate_size": 1536,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 6,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "torch_dtype": "float32",
  "transformers_version": "4.40.2",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}

i tried to export a onnx model for cuda - which seems not to work:

optimum-cli export onnx --model setfit-test-model --task feature-extraction --optimize O4 --device cuda setfit_auto_opt_O4
Framework not specified. Using pt to export the model.
Using the export variant default. Available variants are:
    - default: The default ONNX variant.

***** Exporting submodel 1/1: SentenceTransformer *****
Using framework PyTorch: 2.2.1+cu121
Overriding 1 configuration item(s)
        - use_cache -> False
2024-05-17 04:46:37.141067101 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 4 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2024-05-17 04:46:37.145604395 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-17 04:46:37.145622795 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
Overridding for_gpu=False to for_gpu=True as half precision is available only on GPU.
/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/optimum/onnxruntime/configuration.py:770: FutureWarning: disable_embed_layer_norm will be deprecated soon, use disable_embed_layer_norm_fusion instead, disable_embed_layer_norm_fusion is set to True.
  warnings.warn(
Optimizing model...
2024-05-17 04:46:38.933947965 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-17 04:46:38.933972326 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
symbolic shape inference disabled or failed.
symbolic shape inference disabled or failed.
Configuration saved in setfit_auto_opt_O4/ort_config.json
Optimized model saved at: setfit_auto_opt_O4 (external data format: False; saved all tensor to one file: True)
Post-processing the exported models...
Weight deduplication check in the ONNX export requires accelerate. Please install accelerate to run it.
Validating models in subprocesses...

Validating ONNX model setfit_auto_opt_O4/model.onnx...
2024-05-17 04:46:43.989705868 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-17 04:46:43.989729585 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
        -[✓] ONNX model output names match reference model (sentence_embedding, token_embeddings)
        - Validating ONNX Model output "token_embeddings":
                -[✓] (2, 16, 384) matches (2, 16, 384)
                -[x] values not close enough, max diff: 2.658904552459717 (atol: 1e-05)
        - Validating ONNX Model output "sentence_embedding":
                -[✓] (2, 384) matches (2, 384)
                -[x] values not close enough, max diff: 0.00038395076990127563 (atol: 1e-05)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05:
- token_embeddings: max diff = 2.658904552459717
- sentence_embedding: max diff = 0.00038395076990127563.
 The exported model was saved at: setfit_auto_opt_O4
⚡ ~ 
⚡ ~ 
⚡ ~ ls
checkpoints     setfit-test-model   setfit_onnx       sklearn_model.onnx  training.py
export_onnx.py  setfit_auto_opt_O4  setfit_onnx.onnx  train_dataset.csv   validate_onnx_model.py
⚡ ~ optimum-cli export onnx --model setfit-test-model --task feature-extraction --optimize O4 --device cuda setfit_auto_opt_O4
⚡ ~ rm -rf setfit_auto_opt_O4
⚡ ~ optimum-cli export onnx --model setfit-test-model --task feature-extraction --optimize O4 --device cuda setfit_auto_opt_O4
Framework not specified. Using pt to export the model.
Using the export variant default. Available variants are:
    - default: The default ONNX variant.

***** Exporting submodel 1/1: SentenceTransformer *****
Using framework PyTorch: 2.2.1+cu121
Overriding 1 configuration item(s)
        - use_cache -> False
2024-05-17 04:47:06.392363834 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 4 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2024-05-17 04:47:06.396077224 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-17 04:47:06.396096399 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
Overridding for_gpu=False to for_gpu=True as half precision is available only on GPU.
/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/optimum/onnxruntime/configuration.py:770: FutureWarning: disable_embed_layer_norm will be deprecated soon, use disable_embed_layer_norm_fusion instead, disable_embed_layer_norm_fusion is set to True.
  warnings.warn(
Optimizing model...
2024-05-17 04:47:08.066191297 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-17 04:47:08.066216170 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
symbolic shape inference disabled or failed.
symbolic shape inference disabled or failed.
Configuration saved in setfit_auto_opt_O4/ort_config.json
Optimized model saved at: setfit_auto_opt_O4 (external data format: False; saved all tensor to one file: True)
Post-processing the exported models...
Weight deduplication check in the ONNX export requires accelerate. Please install accelerate to run it.
Validating models in subprocesses...

Validating ONNX model setfit_auto_opt_O4/model.onnx...
2024-05-17 04:47:13.244194104 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-17 04:47:13.244218108 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
        -[✓] ONNX model output names match reference model (sentence_embedding, token_embeddings)
        - Validating ONNX Model output "token_embeddings":
                -[✓] (2, 16, 384) matches (2, 16, 384)
                -[x] values not close enough, max diff: 2.7059507369995117 (atol: 1e-05)
        - Validating ONNX Model output "sentence_embedding":
                -[✓] (2, 384) matches (2, 384)
                -[x] values not close enough, max diff: 0.0004076659679412842 (atol: 1e-05)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05:
- token_embeddings: max diff = 2.7059507369995117
- sentence_embedding: max diff = 0.0004076659679412842.
 The exported model was saved at: setfit_auto_opt_O4

here to validate the generated onnx model:

import onnxruntime

# Load the ONNX model
onnx_model_path = 'setfit_auto_opt_O4/model.onnx'
session = onnxruntime.InferenceSession(onnx_model_path)

# Check if CUDA execution provider is available
providers = session.get_providers()
print(providers)

the output:

['CPUExecutionProvider']

Expected behavior

setfit can be exported to onnx for cuda

May 17 '24 05:05 geraldstanje

i tried to install accelerate - same issue:

$ pip list
Package                   Version
------------------------- --------------
absl-py                   2.1.0
accelerate                0.30.1
aiohttp                   3.9.5
aiosignal                 1.3.1
annotated-types           0.6.0
anyio                     4.3.0
argon2-cffi               23.1.0
argon2-cffi-bindings      21.2.0
arrow                     1.3.0
asttokens                 2.4.1
async-lru                 2.0.4
async-timeout             4.0.3
attrs                     23.2.0
awscrt                    0.20.9
Babel                     2.14.0
backoff                   2.2.1
beautifulsoup4            4.12.3
bleach                    6.1.0
boto3                     1.34.96
botocore                  1.34.96
cachetools                5.3.3
certifi                   2024.2.2
cffi                      1.16.0
charset-normalizer        3.3.2
click                     8.1.7
coloredlogs               15.0.1
comm                      0.2.2
contourpy                 1.2.1
cycler                    0.12.1
datasets                  2.19.1
debugpy                   1.8.1
decorator                 5.1.1
defusedxml                0.7.1
dill                      0.3.8
evaluate                  0.4.2
exceptiongroup            1.2.1
executing                 2.0.1
fastapi                   0.110.0
fastjsonschema            2.19.1
filelock                  3.14.0
fire                      0.6.0
flatbuffers               24.3.25
fonttools                 4.51.0
fqdn                      1.5.1
frozenlist                1.4.1
fsspec                    2024.3.1
google-auth               2.29.0
google-auth-oauthlib      1.2.0
grpcio                    1.63.0
h11                       0.14.0
huggingface-hub           0.23.0
humanfriendly             10.0
idna                      3.7
ipykernel                 6.26.0
ipython                   8.17.2
ipywidgets                8.1.1
isoduration               20.11.0
jedi                      0.19.1
Jinja2                    3.1.3
jmespath                  1.0.1
joblib                    1.4.0
json5                     0.9.25
jsonpointer               2.4
jsonschema                4.22.0
jsonschema-specifications 2023.12.1
jupyter_client            8.6.1
jupyter_core              5.7.2
jupyter-events            0.10.0
jupyter-lsp               2.2.5
jupyter_server            2.14.0
jupyter_server_terminals  0.5.3
jupyterlab                4.0.6
jupyterlab_pygments       0.3.0
jupyterlab_server         2.27.1
jupyterlab_widgets        3.0.10
kiwisolver                1.4.5
lightning                 2.2.4
lightning-cloud           0.5.64
lightning-remote-profiler 0.0.6
lightning_sdk             0.1.7
lightning-utilities       0.10.1
litdata                   0.2.2
Markdown                  3.6
markdown-it-py            3.0.0
MarkupSafe                2.1.5
matplotlib                3.8.2
matplotlib-inline         0.1.7
mdurl                     0.1.2
mistune                   3.0.2
mpmath                    1.3.0
multidict                 6.0.5
multiprocess              0.70.16
nbclient                  0.10.0
nbconvert                 7.16.4
nbformat                  5.10.4
nest-asyncio              1.6.0
networkx                  3.3
ninja                     1.11.1.1
notebook_shim             0.2.4
numpy                     1.26.2
nvidia-cublas-cu12        12.1.3.1
nvidia-cuda-cupti-cu12    12.1.105
nvidia-cuda-nvrtc-cu12    12.1.105
nvidia-cuda-runtime-cu12  12.1.105
nvidia-cudnn-cu12         8.9.2.26
nvidia-cufft-cu12         11.0.2.54
nvidia-curand-cu12        10.3.2.106
nvidia-cusolver-cu12      11.4.5.107
nvidia-cusparse-cu12      12.1.0.106
nvidia-nccl-cu12          2.19.3
nvidia-nvjitlink-cu12     12.4.127
nvidia-nvtx-cu12          12.1.105
oauthlib                  3.2.2
objprint                  0.2.3
onnx                      1.16.0
onnxconverter-common      1.14.0
onnxruntime-gpu           1.17.1
optimum                   1.19.2
overrides                 7.7.0
packaging                 24.0
pandas                    2.1.4
pandocfilters             1.5.1
parso                     0.8.4
pexpect                   4.9.0
pillow                    10.3.0
pip                       24.0
platformdirs              4.2.1
prometheus_client         0.20.0
prompt-toolkit            3.0.43
protobuf                  3.20.2
psutil                    5.9.8
ptyprocess                0.7.0
pure-eval                 0.2.2
pyarrow                   16.0.0
pyarrow-hotfix            0.6
pyasn1                    0.6.0
pyasn1_modules            0.4.0
pycparser                 2.22
pydantic                  2.7.1
pydantic_core             2.18.2
Pygments                  2.17.2
PyJWT                     2.8.0
pyparsing                 3.1.2
python-dateutil           2.9.0.post0
python-json-logger        2.0.7
python-multipart          0.0.9
pytorch-lightning         2.2.4
pytz                      2024.1
PyYAML                    6.0.1
pyzmq                     26.0.3
referencing               0.35.1
regex                     2024.5.15
requests                  2.31.0
requests-oauthlib         2.0.0
rfc3339-validator         0.1.4
rfc3986-validator         0.1.1
rich                      13.7.1
rpds-py                   0.18.0
rsa                       4.9
s3transfer                0.10.1
safetensors               0.4.3
scikit-learn              1.3.2
scipy                     1.11.4
Send2Trash                1.8.3
sentence-transformers     2.7.0
sentencepiece             0.2.0
setfit                    1.0.3
setuptools                68.2.2
simple-term-menu          1.6.4
six                       1.16.0
skl2onnx                  1.16.0
sniffio                   1.3.1
soupsieve                 2.5
stack-data                0.6.3
starlette                 0.36.3
sympy                     1.12
tensorboard               2.15.1
tensorboard-data-server   0.7.2
termcolor                 2.4.0
terminado                 0.18.1
threadpoolctl             3.5.0
tinycss2                  1.3.0
tokenizers                0.19.1
tomli                     2.0.1
torch                     2.2.1+cu121
torchmetrics              1.3.1
torchvision               0.17.1+cu121
tornado                   6.4
tqdm                      4.66.2
traitlets                 5.14.3
transformers              4.40.2
triton                    2.2.0
types-python-dateutil     2.9.0.20240316
typing_extensions         4.11.0
tzdata                    2024.1
uri-template              1.3.0
urllib3                   2.2.1
uvicorn                   0.29.0
viztracer                 0.16.2
wcwidth                   0.2.13
webcolors                 1.13
webencodings              0.5.1
websocket-client          1.8.0
Werkzeug                  3.0.2
wheel                     0.41.2
widgetsnbextension        4.0.10
xxhash                    3.4.1
yarl                      1.9.4


$ optimum-cli export onnx --model setfit-test-model --task feature-extraction --optimize O4 --device cuda setfit_auto_opt_O4
Framework not specified. Using pt to export the model.
Using the export variant default. Available variants are:
    - default: The default ONNX variant.

***** Exporting submodel 1/1: SentenceTransformer *****
Using framework PyTorch: 2.2.1+cu121
Overriding 1 configuration item(s)
        - use_cache -> False
2024-05-17 16:12:43.669923443 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 4 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2024-05-17 16:12:43.674687159 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-17 16:12:43.674710116 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
Overridding for_gpu=False to for_gpu=True as half precision is available only on GPU.
/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/optimum/onnxruntime/configuration.py:770: FutureWarning: disable_embed_layer_norm will be deprecated soon, use disable_embed_layer_norm_fusion instead, disable_embed_layer_norm_fusion is set to True.
  warnings.warn(
Optimizing model...
2024-05-17 16:12:45.256902100 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-17 16:12:45.256924632 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
symbolic shape inference disabled or failed.
symbolic shape inference disabled or failed.
Configuration saved in setfit_auto_opt_O4/ort_config.json
Optimized model saved at: setfit_auto_opt_O4 (external data format: False; saved all tensor to one file: True)
Post-processing the exported models...
Deduplicating shared (tied) weights...
Validating models in subprocesses...

Validating ONNX model setfit_auto_opt_O4/model.onnx...
2024-05-17 16:12:51.207877203 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-17 16:12:51.207902300 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
        -[✓] ONNX model output names match reference model (token_embeddings, sentence_embedding)
        - Validating ONNX Model output "token_embeddings":
                -[✓] (2, 16, 384) matches (2, 16, 384)
                -[x] values not close enough, max diff: 2.0768027305603027 (atol: 1e-05)
        - Validating ONNX Model output "sentence_embedding":
                -[✓] (2, 384) matches (2, 384)
                -[x] values not close enough, max diff: 0.0004524439573287964 (atol: 1e-05)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05:
- token_embeddings: max diff = 2.0768027305603027
- sentence_embedding: max diff = 0.0004524439573287964.
 The exported model was saved at: setfit_auto_opt_O4

May 17 '24 16:05 geraldstanje

Hi @geraldstanje, thanks for opening an issue! Transferring to the optimum library as they handle onnx exports

May 17 '24 16:05 amyeroberts

@amyeroberts are you sure that the right place?

May 17 '24 16:05 geraldstanje

here the training of the setfit model:

import pandas as pd
from datasets import Dataset, DatasetDict
from sklearn.model_selection import train_test_split
from setfit import SetFitModel, Trainer, TrainingArguments, sample_dataset

# Load your CSV files into Pandas DataFrames
df = pd.read_csv("train_dataset.csv")
train_df = df.iloc[:120,:]
test_df = df.iloc[120:,:]
# Perform train-test split

print(train_df.shape, test_df.shape)
train_df, valid_df = train_test_split(train_df, test_size=0.1)

# Rename columns to match expected names and drop unnecessary columns
train_df = train_df[['text', 'categories']].rename(columns={'categories': 'label'}).drop(columns=['__index_level_0__'], errors='ignore')
valid_df = valid_df[['text', 'categories']].rename(columns={'categories': 'label'}).drop(columns=['__index_level_0__'], errors='ignore')
test_df = test_df[['text', 'categories']].rename(columns={'categories': 'label'}).drop(columns=['__index_level_0__'], errors='ignore')

# Convert to Dataset objects
train_data = Dataset.from_pandas(train_df)
valid_data = Dataset.from_pandas(valid_df)
test_data = Dataset.from_pandas(test_df)


dataset = DatasetDict({
    'train': train_data,
    'validation': valid_data,
    'test': test_data
})

print("len(train_df):", len(train_df))
print("len(valid_df):", len(valid_df))
print("len(test_df):", len(test_df))

# Sample the dataset for few-shot learning
train_dataset = sample_dataset(dataset["train"], label_column="label", num_samples=20)
eval_dataset = dataset["validation"]


# Define your categories and SetFit model
categories = ["aws_iam","access_management", "DOC", "NONE"]
#model = SetFitModel.from_pretrained("sentence-transformers/paraphrase-mpnet-base-v2", labels=categories)
model = SetFitModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2", labels=categories)

# Training arguments
args = TrainingArguments(
    batch_size=16,
    num_epochs=1,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    report_to="none"
)

# Adjust column_mapping based on the printed column names
column_mapping = {
    "text": "sentence",  # Ensure 'text' is the actual column name in your dataset
    "categories": "label"  # Ensure 'categories' is the actual column name in your dataset
}

# Trainer configuration
trainer = Trainer(
    model=model,
    args=args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    metric="accuracy"
)

# Remaining code for training and evaluation...


# Train the model
trainer.train()

# Evaluate the model on the validation and test datasets
validation_metrics = trainer.evaluate(eval_dataset)
print("Validation Metrics:", validation_metrics)

test_metrics = trainer.evaluate(test_data)
print("Test Metrics:", test_metrics)

# Save and push the model to the Hub (change the model name accordingly)
model.save_pretrained("setfit-test-model")

May 21 '24 04:05 geraldstanje

Hi @geraldstanje I dont think this is the right place aha, would you mind opening this issue in huggingface/optimum repository?

Closing this one here as it's not related to the scope of optimum-nvidia.

May 24 '24 11:05 mfuntowicz

@mfuntowicz i opened the ticket under huggingface/optimum repo but amyeroberts moved it to here! - please transfer it back and dont close the ticket!

@amyeroberts can you please move it back and reopen it?

May 24 '24 13:05 geraldstanje

@geraldstanje Apologies for moving it to the wrong place (as this came up in my github notifications and I was tagged I thought this was under transformers as I have nothing to do with optimum). I don't have permissions to reopen or move this issue from here - could you create a new issue please?

Jun 03 '24 16:06 amyeroberts