onnx export for cuda does not work
System Info
$ lsb_release -a
Distributor ID: Ubuntu
Description: Ubuntu 20.04.6 LTS
Release: 20.04
Codename: focal
$ python --version
Python 3.10.10
$ pip list
Package Version
------------------------- --------------
absl-py 2.1.0
aiohttp 3.9.5
aiosignal 1.3.1
annotated-types 0.6.0
anyio 4.3.0
argon2-cffi 23.1.0
argon2-cffi-bindings 21.2.0
arrow 1.3.0
asttokens 2.4.1
async-lru 2.0.4
async-timeout 4.0.3
attrs 23.2.0
awscrt 0.20.9
Babel 2.14.0
backoff 2.2.1
beautifulsoup4 4.12.3
bleach 6.1.0
boto3 1.34.96
botocore 1.34.96
cachetools 5.3.3
certifi 2024.2.2
cffi 1.16.0
charset-normalizer 3.3.2
click 8.1.7
coloredlogs 15.0.1
comm 0.2.2
contourpy 1.2.1
cycler 0.12.1
datasets 2.19.1
debugpy 1.8.1
decorator 5.1.1
defusedxml 0.7.1
dill 0.3.8
evaluate 0.4.2
exceptiongroup 1.2.1
executing 2.0.1
fastapi 0.110.0
fastjsonschema 2.19.1
filelock 3.14.0
fire 0.6.0
flatbuffers 24.3.25
fonttools 4.51.0
fqdn 1.5.1
frozenlist 1.4.1
fsspec 2024.3.1
google-auth 2.29.0
google-auth-oauthlib 1.2.0
grpcio 1.63.0
h11 0.14.0
huggingface-hub 0.23.0
humanfriendly 10.0
idna 3.7
ipykernel 6.26.0
ipython 8.17.2
ipywidgets 8.1.1
isoduration 20.11.0
jedi 0.19.1
Jinja2 3.1.3
jmespath 1.0.1
joblib 1.4.0
json5 0.9.25
jsonpointer 2.4
jsonschema 4.22.0
jsonschema-specifications 2023.12.1
jupyter_client 8.6.1
jupyter_core 5.7.2
jupyter-events 0.10.0
jupyter-lsp 2.2.5
jupyter_server 2.14.0
jupyter_server_terminals 0.5.3
jupyterlab 4.0.6
jupyterlab_pygments 0.3.0
jupyterlab_server 2.27.1
jupyterlab_widgets 3.0.10
kiwisolver 1.4.5
lightning 2.2.4
lightning-cloud 0.5.64
lightning-remote-profiler 0.0.6
lightning_sdk 0.1.7
lightning-utilities 0.10.1
litdata 0.2.2
Markdown 3.6
markdown-it-py 3.0.0
MarkupSafe 2.1.5
matplotlib 3.8.2
matplotlib-inline 0.1.7
mdurl 0.1.2
mistune 3.0.2
mpmath 1.3.0
multidict 6.0.5
multiprocess 0.70.16
nbclient 0.10.0
nbconvert 7.16.4
nbformat 5.10.4
nest-asyncio 1.6.0
networkx 3.3
ninja 1.11.1.1
notebook_shim 0.2.4
numpy 1.26.2
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.19.3
nvidia-nvjitlink-cu12 12.4.127
nvidia-nvtx-cu12 12.1.105
oauthlib 3.2.2
objprint 0.2.3
onnx 1.16.0
onnxconverter-common 1.14.0
onnxruntime-gpu 1.17.1
optimum 1.19.2
overrides 7.7.0
packaging 24.0
pandas 2.1.4
pandocfilters 1.5.1
parso 0.8.4
pexpect 4.9.0
pillow 10.3.0
pip 24.0
platformdirs 4.2.1
prometheus_client 0.20.0
prompt-toolkit 3.0.43
protobuf 3.20.2
psutil 5.9.8
ptyprocess 0.7.0
pure-eval 0.2.2
pyarrow 16.0.0
pyarrow-hotfix 0.6
pyasn1 0.6.0
pyasn1_modules 0.4.0
pycparser 2.22
pydantic 2.7.1
pydantic_core 2.18.2
Pygments 2.17.2
PyJWT 2.8.0
pyparsing 3.1.2
python-dateutil 2.9.0.post0
python-json-logger 2.0.7
python-multipart 0.0.9
pytorch-lightning 2.2.4
pytz 2024.1
PyYAML 6.0.1
pyzmq 26.0.3
referencing 0.35.1
regex 2024.5.15
requests 2.31.0
requests-oauthlib 2.0.0
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
rich 13.7.1
rpds-py 0.18.0
rsa 4.9
s3transfer 0.10.1
safetensors 0.4.3
scikit-learn 1.3.2
scipy 1.11.4
Send2Trash 1.8.3
sentence-transformers 2.7.0
sentencepiece 0.2.0
setfit 1.0.3
setuptools 68.2.2
simple-term-menu 1.6.4
six 1.16.0
skl2onnx 1.16.0
sniffio 1.3.1
soupsieve 2.5
stack-data 0.6.3
starlette 0.36.3
sympy 1.12
tensorboard 2.15.1
tensorboard-data-server 0.7.2
termcolor 2.4.0
terminado 0.18.1
threadpoolctl 3.5.0
tinycss2 1.3.0
tokenizers 0.19.1
tomli 2.0.1
torch 2.2.1+cu121
torchmetrics 1.3.1
torchvision 0.17.1+cu121
tornado 6.4
tqdm 4.66.2
traitlets 5.14.3
transformers 4.40.2
triton 2.2.0
types-python-dateutil 2.9.0.20240316
typing_extensions 4.11.0
tzdata 2024.1
uri-template 1.3.0
urllib3 2.2.1
uvicorn 0.29.0
viztracer 0.16.2
wcwidth 0.2.13
webcolors 1.13
webencodings 0.5.1
websocket-client 1.8.0
Werkzeug 3.0.2
wheel 0.41.2
widgetsnbextension 4.0.10
xxhash 3.4.1
yarl 1.9.4
$ nvidia-smi
Fri May 17 04:59:05 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.08 Driver Version: 535.161.08 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
| N/A 26C P8 8W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
Who can help?
@michaelbenayoun @JingyaHuang @echarlaix @simoninithomas @amyeroberts
Information
- [ ] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction (minimal, reproducible, runnable)
hi,
i trained a setfit model with a small dataset and tried to export it to onnx for cuda. it seems the conversion fails. can someone show me how to export it?
which huggingface transformer did i use and train?
sentence-transformers/all-MiniLM-L6-v2
see: https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
here all infos regarding the trained model:
from setfit import SetFitModel
model = SetFitModel.from_pretrained("setfit-test-model")
print("model.model_head:", model.model_head)
print("model.model_body:", model.model_body)
print("model.model_body[0].auto_model:", model.model_body[0].auto_model)
print("model.model_body[0].auto_model.config:", model.model_body[0].auto_model.config)
output:
model.model_head: LogisticRegression()
model.model_body: SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
model.model_body[0].auto_model: BertModel(
(embeddings): BertEmbeddings(
(word_embeddings): Embedding(30522, 384, padding_idx=0)
(position_embeddings): Embedding(512, 384)
(token_type_embeddings): Embedding(2, 384)
(LayerNorm): LayerNorm((384,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(encoder): BertEncoder(
(layer): ModuleList(
(0-5): 6 x BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=384, out_features=384, bias=True)
(key): Linear(in_features=384, out_features=384, bias=True)
(value): Linear(in_features=384, out_features=384, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=384, out_features=384, bias=True)
(LayerNorm): LayerNorm((384,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=384, out_features=1536, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=1536, out_features=384, bias=True)
(LayerNorm): LayerNorm((384,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(pooler): BertPooler(
(dense): Linear(in_features=384, out_features=384, bias=True)
(activation): Tanh()
)
)
model.model_body[0].auto_model.config: BertConfig {
"_name_or_path": "setfit-test-model",
"architectures": [
"BertModel"
],
"attention_probs_dropout_prob": 0.1,
"classifier_dropout": null,
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 384,
"initializer_range": 0.02,
"intermediate_size": 1536,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 6,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"torch_dtype": "float32",
"transformers_version": "4.40.2",
"type_vocab_size": 2,
"use_cache": true,
"vocab_size": 30522
}
i tried to export a onnx model for cuda - which seems not to work:
optimum-cli export onnx --model setfit-test-model --task feature-extraction --optimize O4 --device cuda setfit_auto_opt_O4
Framework not specified. Using pt to export the model.
Using the export variant default. Available variants are:
- default: The default ONNX variant.
***** Exporting submodel 1/1: SentenceTransformer *****
Using framework PyTorch: 2.2.1+cu121
Overriding 1 configuration item(s)
- use_cache -> False
2024-05-17 04:46:37.141067101 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 4 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2024-05-17 04:46:37.145604395 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-17 04:46:37.145622795 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
Overridding for_gpu=False to for_gpu=True as half precision is available only on GPU.
/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/optimum/onnxruntime/configuration.py:770: FutureWarning: disable_embed_layer_norm will be deprecated soon, use disable_embed_layer_norm_fusion instead, disable_embed_layer_norm_fusion is set to True.
warnings.warn(
Optimizing model...
2024-05-17 04:46:38.933947965 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-17 04:46:38.933972326 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
symbolic shape inference disabled or failed.
symbolic shape inference disabled or failed.
Configuration saved in setfit_auto_opt_O4/ort_config.json
Optimized model saved at: setfit_auto_opt_O4 (external data format: False; saved all tensor to one file: True)
Post-processing the exported models...
Weight deduplication check in the ONNX export requires accelerate. Please install accelerate to run it.
Validating models in subprocesses...
Validating ONNX model setfit_auto_opt_O4/model.onnx...
2024-05-17 04:46:43.989705868 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-17 04:46:43.989729585 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
-[✓] ONNX model output names match reference model (sentence_embedding, token_embeddings)
- Validating ONNX Model output "token_embeddings":
-[✓] (2, 16, 384) matches (2, 16, 384)
-[x] values not close enough, max diff: 2.658904552459717 (atol: 1e-05)
- Validating ONNX Model output "sentence_embedding":
-[✓] (2, 384) matches (2, 384)
-[x] values not close enough, max diff: 0.00038395076990127563 (atol: 1e-05)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05:
- token_embeddings: max diff = 2.658904552459717
- sentence_embedding: max diff = 0.00038395076990127563.
The exported model was saved at: setfit_auto_opt_O4
âš¡ ~
âš¡ ~
âš¡ ~ ls
checkpoints setfit-test-model setfit_onnx sklearn_model.onnx training.py
export_onnx.py setfit_auto_opt_O4 setfit_onnx.onnx train_dataset.csv validate_onnx_model.py
âš¡ ~ optimum-cli export onnx --model setfit-test-model --task feature-extraction --optimize O4 --device cuda setfit_auto_opt_O4
âš¡ ~ rm -rf setfit_auto_opt_O4
âš¡ ~ optimum-cli export onnx --model setfit-test-model --task feature-extraction --optimize O4 --device cuda setfit_auto_opt_O4
Framework not specified. Using pt to export the model.
Using the export variant default. Available variants are:
- default: The default ONNX variant.
***** Exporting submodel 1/1: SentenceTransformer *****
Using framework PyTorch: 2.2.1+cu121
Overriding 1 configuration item(s)
- use_cache -> False
2024-05-17 04:47:06.392363834 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 4 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2024-05-17 04:47:06.396077224 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-17 04:47:06.396096399 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
Overridding for_gpu=False to for_gpu=True as half precision is available only on GPU.
/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/optimum/onnxruntime/configuration.py:770: FutureWarning: disable_embed_layer_norm will be deprecated soon, use disable_embed_layer_norm_fusion instead, disable_embed_layer_norm_fusion is set to True.
warnings.warn(
Optimizing model...
2024-05-17 04:47:08.066191297 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-17 04:47:08.066216170 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
symbolic shape inference disabled or failed.
symbolic shape inference disabled or failed.
Configuration saved in setfit_auto_opt_O4/ort_config.json
Optimized model saved at: setfit_auto_opt_O4 (external data format: False; saved all tensor to one file: True)
Post-processing the exported models...
Weight deduplication check in the ONNX export requires accelerate. Please install accelerate to run it.
Validating models in subprocesses...
Validating ONNX model setfit_auto_opt_O4/model.onnx...
2024-05-17 04:47:13.244194104 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-17 04:47:13.244218108 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
-[✓] ONNX model output names match reference model (sentence_embedding, token_embeddings)
- Validating ONNX Model output "token_embeddings":
-[✓] (2, 16, 384) matches (2, 16, 384)
-[x] values not close enough, max diff: 2.7059507369995117 (atol: 1e-05)
- Validating ONNX Model output "sentence_embedding":
-[✓] (2, 384) matches (2, 384)
-[x] values not close enough, max diff: 0.0004076659679412842 (atol: 1e-05)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05:
- token_embeddings: max diff = 2.7059507369995117
- sentence_embedding: max diff = 0.0004076659679412842.
The exported model was saved at: setfit_auto_opt_O4
here to validate the generated onnx model:
import onnxruntime
# Load the ONNX model
onnx_model_path = 'setfit_auto_opt_O4/model.onnx'
session = onnxruntime.InferenceSession(onnx_model_path)
# Check if CUDA execution provider is available
providers = session.get_providers()
print(providers)
the output:
['CPUExecutionProvider']
Expected behavior
setfit can be exported to onnx for cuda
i tried to install accelerate - same issue:
$ pip list
Package Version
------------------------- --------------
absl-py 2.1.0
accelerate 0.30.1
aiohttp 3.9.5
aiosignal 1.3.1
annotated-types 0.6.0
anyio 4.3.0
argon2-cffi 23.1.0
argon2-cffi-bindings 21.2.0
arrow 1.3.0
asttokens 2.4.1
async-lru 2.0.4
async-timeout 4.0.3
attrs 23.2.0
awscrt 0.20.9
Babel 2.14.0
backoff 2.2.1
beautifulsoup4 4.12.3
bleach 6.1.0
boto3 1.34.96
botocore 1.34.96
cachetools 5.3.3
certifi 2024.2.2
cffi 1.16.0
charset-normalizer 3.3.2
click 8.1.7
coloredlogs 15.0.1
comm 0.2.2
contourpy 1.2.1
cycler 0.12.1
datasets 2.19.1
debugpy 1.8.1
decorator 5.1.1
defusedxml 0.7.1
dill 0.3.8
evaluate 0.4.2
exceptiongroup 1.2.1
executing 2.0.1
fastapi 0.110.0
fastjsonschema 2.19.1
filelock 3.14.0
fire 0.6.0
flatbuffers 24.3.25
fonttools 4.51.0
fqdn 1.5.1
frozenlist 1.4.1
fsspec 2024.3.1
google-auth 2.29.0
google-auth-oauthlib 1.2.0
grpcio 1.63.0
h11 0.14.0
huggingface-hub 0.23.0
humanfriendly 10.0
idna 3.7
ipykernel 6.26.0
ipython 8.17.2
ipywidgets 8.1.1
isoduration 20.11.0
jedi 0.19.1
Jinja2 3.1.3
jmespath 1.0.1
joblib 1.4.0
json5 0.9.25
jsonpointer 2.4
jsonschema 4.22.0
jsonschema-specifications 2023.12.1
jupyter_client 8.6.1
jupyter_core 5.7.2
jupyter-events 0.10.0
jupyter-lsp 2.2.5
jupyter_server 2.14.0
jupyter_server_terminals 0.5.3
jupyterlab 4.0.6
jupyterlab_pygments 0.3.0
jupyterlab_server 2.27.1
jupyterlab_widgets 3.0.10
kiwisolver 1.4.5
lightning 2.2.4
lightning-cloud 0.5.64
lightning-remote-profiler 0.0.6
lightning_sdk 0.1.7
lightning-utilities 0.10.1
litdata 0.2.2
Markdown 3.6
markdown-it-py 3.0.0
MarkupSafe 2.1.5
matplotlib 3.8.2
matplotlib-inline 0.1.7
mdurl 0.1.2
mistune 3.0.2
mpmath 1.3.0
multidict 6.0.5
multiprocess 0.70.16
nbclient 0.10.0
nbconvert 7.16.4
nbformat 5.10.4
nest-asyncio 1.6.0
networkx 3.3
ninja 1.11.1.1
notebook_shim 0.2.4
numpy 1.26.2
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.19.3
nvidia-nvjitlink-cu12 12.4.127
nvidia-nvtx-cu12 12.1.105
oauthlib 3.2.2
objprint 0.2.3
onnx 1.16.0
onnxconverter-common 1.14.0
onnxruntime-gpu 1.17.1
optimum 1.19.2
overrides 7.7.0
packaging 24.0
pandas 2.1.4
pandocfilters 1.5.1
parso 0.8.4
pexpect 4.9.0
pillow 10.3.0
pip 24.0
platformdirs 4.2.1
prometheus_client 0.20.0
prompt-toolkit 3.0.43
protobuf 3.20.2
psutil 5.9.8
ptyprocess 0.7.0
pure-eval 0.2.2
pyarrow 16.0.0
pyarrow-hotfix 0.6
pyasn1 0.6.0
pyasn1_modules 0.4.0
pycparser 2.22
pydantic 2.7.1
pydantic_core 2.18.2
Pygments 2.17.2
PyJWT 2.8.0
pyparsing 3.1.2
python-dateutil 2.9.0.post0
python-json-logger 2.0.7
python-multipart 0.0.9
pytorch-lightning 2.2.4
pytz 2024.1
PyYAML 6.0.1
pyzmq 26.0.3
referencing 0.35.1
regex 2024.5.15
requests 2.31.0
requests-oauthlib 2.0.0
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
rich 13.7.1
rpds-py 0.18.0
rsa 4.9
s3transfer 0.10.1
safetensors 0.4.3
scikit-learn 1.3.2
scipy 1.11.4
Send2Trash 1.8.3
sentence-transformers 2.7.0
sentencepiece 0.2.0
setfit 1.0.3
setuptools 68.2.2
simple-term-menu 1.6.4
six 1.16.0
skl2onnx 1.16.0
sniffio 1.3.1
soupsieve 2.5
stack-data 0.6.3
starlette 0.36.3
sympy 1.12
tensorboard 2.15.1
tensorboard-data-server 0.7.2
termcolor 2.4.0
terminado 0.18.1
threadpoolctl 3.5.0
tinycss2 1.3.0
tokenizers 0.19.1
tomli 2.0.1
torch 2.2.1+cu121
torchmetrics 1.3.1
torchvision 0.17.1+cu121
tornado 6.4
tqdm 4.66.2
traitlets 5.14.3
transformers 4.40.2
triton 2.2.0
types-python-dateutil 2.9.0.20240316
typing_extensions 4.11.0
tzdata 2024.1
uri-template 1.3.0
urllib3 2.2.1
uvicorn 0.29.0
viztracer 0.16.2
wcwidth 0.2.13
webcolors 1.13
webencodings 0.5.1
websocket-client 1.8.0
Werkzeug 3.0.2
wheel 0.41.2
widgetsnbextension 4.0.10
xxhash 3.4.1
yarl 1.9.4
$ optimum-cli export onnx --model setfit-test-model --task feature-extraction --optimize O4 --device cuda setfit_auto_opt_O4
Framework not specified. Using pt to export the model.
Using the export variant default. Available variants are:
- default: The default ONNX variant.
***** Exporting submodel 1/1: SentenceTransformer *****
Using framework PyTorch: 2.2.1+cu121
Overriding 1 configuration item(s)
- use_cache -> False
2024-05-17 16:12:43.669923443 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 4 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2024-05-17 16:12:43.674687159 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-17 16:12:43.674710116 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
Overridding for_gpu=False to for_gpu=True as half precision is available only on GPU.
/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/optimum/onnxruntime/configuration.py:770: FutureWarning: disable_embed_layer_norm will be deprecated soon, use disable_embed_layer_norm_fusion instead, disable_embed_layer_norm_fusion is set to True.
warnings.warn(
Optimizing model...
2024-05-17 16:12:45.256902100 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-17 16:12:45.256924632 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
symbolic shape inference disabled or failed.
symbolic shape inference disabled or failed.
Configuration saved in setfit_auto_opt_O4/ort_config.json
Optimized model saved at: setfit_auto_opt_O4 (external data format: False; saved all tensor to one file: True)
Post-processing the exported models...
Deduplicating shared (tied) weights...
Validating models in subprocesses...
Validating ONNX model setfit_auto_opt_O4/model.onnx...
2024-05-17 16:12:51.207877203 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-17 16:12:51.207902300 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
-[✓] ONNX model output names match reference model (token_embeddings, sentence_embedding)
- Validating ONNX Model output "token_embeddings":
-[✓] (2, 16, 384) matches (2, 16, 384)
-[x] values not close enough, max diff: 2.0768027305603027 (atol: 1e-05)
- Validating ONNX Model output "sentence_embedding":
-[✓] (2, 384) matches (2, 384)
-[x] values not close enough, max diff: 0.0004524439573287964 (atol: 1e-05)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05:
- token_embeddings: max diff = 2.0768027305603027
- sentence_embedding: max diff = 0.0004524439573287964.
The exported model was saved at: setfit_auto_opt_O4
Hi @geraldstanje, thanks for opening an issue! Transferring to the optimum library as they handle onnx exports
@amyeroberts are you sure that the right place?
here the training of the setfit model:
import pandas as pd
from datasets import Dataset, DatasetDict
from sklearn.model_selection import train_test_split
from setfit import SetFitModel, Trainer, TrainingArguments, sample_dataset
# Load your CSV files into Pandas DataFrames
df = pd.read_csv("train_dataset.csv")
train_df = df.iloc[:120,:]
test_df = df.iloc[120:,:]
# Perform train-test split
print(train_df.shape, test_df.shape)
train_df, valid_df = train_test_split(train_df, test_size=0.1)
# Rename columns to match expected names and drop unnecessary columns
train_df = train_df[['text', 'categories']].rename(columns={'categories': 'label'}).drop(columns=['__index_level_0__'], errors='ignore')
valid_df = valid_df[['text', 'categories']].rename(columns={'categories': 'label'}).drop(columns=['__index_level_0__'], errors='ignore')
test_df = test_df[['text', 'categories']].rename(columns={'categories': 'label'}).drop(columns=['__index_level_0__'], errors='ignore')
# Convert to Dataset objects
train_data = Dataset.from_pandas(train_df)
valid_data = Dataset.from_pandas(valid_df)
test_data = Dataset.from_pandas(test_df)
dataset = DatasetDict({
'train': train_data,
'validation': valid_data,
'test': test_data
})
print("len(train_df):", len(train_df))
print("len(valid_df):", len(valid_df))
print("len(test_df):", len(test_df))
# Sample the dataset for few-shot learning
train_dataset = sample_dataset(dataset["train"], label_column="label", num_samples=20)
eval_dataset = dataset["validation"]
# Define your categories and SetFit model
categories = ["aws_iam","access_management", "DOC", "NONE"]
#model = SetFitModel.from_pretrained("sentence-transformers/paraphrase-mpnet-base-v2", labels=categories)
model = SetFitModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2", labels=categories)
# Training arguments
args = TrainingArguments(
batch_size=16,
num_epochs=1,
evaluation_strategy="epoch",
save_strategy="epoch",
load_best_model_at_end=True,
report_to="none"
)
# Adjust column_mapping based on the printed column names
column_mapping = {
"text": "sentence", # Ensure 'text' is the actual column name in your dataset
"categories": "label" # Ensure 'categories' is the actual column name in your dataset
}
# Trainer configuration
trainer = Trainer(
model=model,
args=args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
metric="accuracy"
)
# Remaining code for training and evaluation...
# Train the model
trainer.train()
# Evaluate the model on the validation and test datasets
validation_metrics = trainer.evaluate(eval_dataset)
print("Validation Metrics:", validation_metrics)
test_metrics = trainer.evaluate(test_data)
print("Test Metrics:", test_metrics)
# Save and push the model to the Hub (change the model name accordingly)
model.save_pretrained("setfit-test-model")
Hi @geraldstanje I dont think this is the right place aha, would you mind opening this issue in huggingface/optimum repository?
Closing this one here as it's not related to the scope of optimum-nvidia.
@mfuntowicz i opened the ticket under huggingface/optimum repo but amyeroberts moved it to here! - please transfer it back and dont close the ticket!
@amyeroberts can you please move it back and reopen it?
@geraldstanje Apologies for moving it to the wrong place (as this came up in my github notifications and I was tagged I thought this was under transformers as I have nothing to do with optimum). I don't have permissions to reopen or move this issue from here - could you create a new issue please?