Docs: HuggingFace (NLP) Migration Guide
Add guidance on how to use (NLP) models from HuggingFace
- Tokenizers
- TorchSharp / ONNX
- Tensors
Install dependencies
pip install transformers torch torchvision torchaudio torchsharp onnxruntime
from transformers import AutoTokenizer, AutoModel import torch import torch.nn as nn import torchsharp import onnxruntime as ort
1. Tokenization
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") text = "Hugging Face is great!" tokens = tokenizer(text, padding=True, truncation=True, return_tensors="pt")
2. Load Model (Torch)
model = AutoModel.from_pretrained("bert-base-uncased") with torch.no_grad(): output = model(**tokens)
3. Convert PyTorch model to ONNX
torch.onnx.export( model, # Model (tokens["input_ids"], tokens["attention_mask"]), # Inputs "bert_model.onnx", # Output file input_names=["input_ids", "attention_mask"], output_names=["output"], dynamic_axes={"input_ids": {0: "batch_size"}, "attention_mask": {0: "batch_size"}}, opset_version=11 )
4. Run ONNX Model
ort_session = ort.InferenceSession("bert_model.onnx") onnx_inputs = {k: v.cpu().numpy() for k, v in tokens.items()} onnx_output = ort_session.run(None, onnx_inputs)
5. Convert Output to Tensor (TorchSharp)
output_tensor = torch.tensor(onnx_output[0]) print(output_tensor.shape)
Use this for better result
@tarun111111 This ticket is for documenting migration from the huggingface python world, to the c# world of:
- System.Numerics.Tensors
- Microsoft.Extenstions.Tokenizers
- Onnx / torchsharp / ML.Net
The conversion to onnx is great docs, but more is needed for this story to be complete:
- tokenizers - how to migrate from huggingface tokenizers to the new tokenizers from dotnet 9
- how to migrate the pipeline from huggingface to c# And much more.
I'd be content if there was a Tokenizer.FromPretrained("tokenizer.json") factory method for my particular scenario :)