machinelearning icon indicating copy to clipboard operation
machinelearning copied to clipboard

Running AI inference of phi3 and other llms from c# using NPU + GPU in comming processors?

Open agonzalezm opened this issue 1 year ago • 3 comments

Intel, AMD, Qualqomm, etc are getting powerful NPUs (+40TOPS) for inferencing.

Is there any plan to incluide in ml.net functionality to be able to run and inference these models easily from C# offloading to NPU or GPU or both. Next Intel processors will have 40TOPS NPU and 60TOPS CPU/GPU.

How from C# can we easily make the most and inference using all of these TOPS coming from NPU + GPU?

All samples i see about this require using python etc, would be great to have all this available in .NET C# directly.

Maybe including some C# wrapper around https://github.com/intel/intel-npu-acceleration-library but what about AMD and qualcomm?

agonzalezm avatar May 28 '24 12:05 agonzalezm

Hi @agonzalezm, have you take a look on https://github.com/SciSharp/LLamaSharp project? It allows to run inference for plenty of llm models on consumer level GPUs

asmirnov82 avatar May 29 '24 09:05 asmirnov82

I am not asking about inferencing in GPU, but in new NPUs. DirectML says it will support intel NPU but by now not AMD. But asking for easy way to do in c#.

agonzalezm avatar Jun 01 '24 13:06 agonzalezm

Generally for ONNX / TorchSharp models, ML.NET is dependent on hardware support by the respective frameworks.

In the ONNX case, what you'd be looking to use is the DirectML or respective hardware vendor's Execution Provider.

Here's an example running an image classification model in ML.NET using the DirectML execution provider.

luisquintanilla avatar Aug 27 '24 20:08 luisquintanilla