machinelearning icon indicating copy to clipboard operation
machinelearning copied to clipboard

Docs: HuggingFace (NLP) Migration Guide

Open luisquintanilla opened this issue 11 months ago • 3 comments

Add guidance on how to use (NLP) models from HuggingFace

  • Tokenizers
  • TorchSharp / ONNX
  • Tensors

luisquintanilla avatar Feb 11 '25 16:02 luisquintanilla

Install dependencies

pip install transformers torch torchvision torchaudio torchsharp onnxruntime

from transformers import AutoTokenizer, AutoModel import torch import torch.nn as nn import torchsharp import onnxruntime as ort

1. Tokenization

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") text = "Hugging Face is great!" tokens = tokenizer(text, padding=True, truncation=True, return_tensors="pt")

2. Load Model (Torch)

model = AutoModel.from_pretrained("bert-base-uncased") with torch.no_grad(): output = model(**tokens)

3. Convert PyTorch model to ONNX

torch.onnx.export( model, # Model (tokens["input_ids"], tokens["attention_mask"]), # Inputs "bert_model.onnx", # Output file input_names=["input_ids", "attention_mask"], output_names=["output"], dynamic_axes={"input_ids": {0: "batch_size"}, "attention_mask": {0: "batch_size"}}, opset_version=11 )

4. Run ONNX Model

ort_session = ort.InferenceSession("bert_model.onnx") onnx_inputs = {k: v.cpu().numpy() for k, v in tokens.items()} onnx_output = ort_session.run(None, onnx_inputs)

5. Convert Output to Tensor (TorchSharp)

output_tensor = torch.tensor(onnx_output[0]) print(output_tensor.shape)

Use this for better result

tarun111111 avatar Feb 16 '25 16:02 tarun111111

@tarun111111 This ticket is for documenting migration from the huggingface python world, to the c# world of:

  • System.Numerics.Tensors
  • Microsoft.Extenstions.Tokenizers
  • Onnx / torchsharp / ML.Net

The conversion to onnx is great docs, but more is needed for this story to be complete:

  • tokenizers - how to migrate from huggingface tokenizers to the new tokenizers from dotnet 9
  • how to migrate the pipeline from huggingface to c# And much more.

tjwald avatar Feb 16 '25 20:02 tjwald

I'd be content if there was a Tokenizer.FromPretrained("tokenizer.json") factory method for my particular scenario :)

kzu avatar Mar 13 '25 17:03 kzu