torch.deleters performance issue on .NET framework cpu target
I discovered this performance issue while using the MNIST dataset.
This only happens with the combination of .NET framework and cpu target. It does not happen with .NET target or cuda.
This is a minimal reproducible source.
// match to MNIST dataset size
var size = 70000;
var tensors = new List<torch.Tensor>(size);
var dev = new torch.Device("cpu");
for (int i = 0; i < size; ++i)
{
tensors.Add(torch.tensor(new[] { 1.0f }, device: dev));
}
Console.WriteLine(tensors.Count);
foreach (var tensor in tensors)
{
tensor.Dispose();
}
tensors.Clear();
Console.WriteLine(tensors.Count);
The profiler indicates ConcurrentDictionary.TryAdd() and ConcurrentDictionary.TryRemove(), but it looks like MulticastDelegate.Equals() is the problem.
For .NET framework and cuda combination, it seems like removed directly on _tensor_generic.
Maybe related to https://github.com/dotnet/coreclr/pull/11019
And it seems not ported to .NET framework runtime.
Hey @hglee , thanks for bringing up the issue.
Would it be possible to share which profiler you're using for the benchmarking?
Hey @hglee , thanks for bringing up the issue.
Would it be possible to share which profiler you're using for the benchmarking?
Hi, it's Jetbrains dotTrace (https://www.jetbrains.com/profiler/).
The upper was measured in timeline mode and the bottom CUDA was measured in tracing mode.