TorchSharp icon indicating copy to clipboard operation
TorchSharp copied to clipboard

torch.deleters performance issue on .NET framework cpu target

Open hglee opened this issue 1 year ago • 3 comments

I discovered this performance issue while using the MNIST dataset.

This only happens with the combination of .NET framework and cpu target. It does not happen with .NET target or cuda.

This is a minimal reproducible source.

            // match to MNIST dataset size
            var size = 70000;

            var tensors = new List<torch.Tensor>(size);

            var dev = new torch.Device("cpu");

            for (int i = 0; i < size; ++i)
            {
                tensors.Add(torch.tensor(new[] { 1.0f }, device: dev));
            }

            Console.WriteLine(tensors.Count);

            foreach (var tensor in tensors)
            {
                tensor.Dispose();
            }

            tensors.Clear();

            Console.WriteLine(tensors.Count);

The profiler indicates ConcurrentDictionary.TryAdd() and ConcurrentDictionary.TryRemove(), but it looks like MulticastDelegate.Equals() is the problem.

Image

Image

For .NET framework and cuda combination, it seems like removed directly on _tensor_generic.

Image

hglee avatar Jan 10 '25 10:01 hglee

Maybe related to https://github.com/dotnet/coreclr/pull/11019

And it seems not ported to .NET framework runtime.

hglee avatar Jan 13 '25 03:01 hglee

Hey @hglee , thanks for bringing up the issue.

Would it be possible to share which profiler you're using for the benchmarking?

ghost avatar Jan 27 '25 15:01 ghost

Hey @hglee , thanks for bringing up the issue.

Would it be possible to share which profiler you're using for the benchmarking?

Hi, it's Jetbrains dotTrace (https://www.jetbrains.com/profiler/).

The upper was measured in timeline mode and the bottom CUDA was measured in tracing mode.

hglee avatar Jan 27 '25 19:01 hglee