rhbackshiftdict
rhbackshiftdict copied to clipboard
Collapse of performance around ~2M items
I have been toying a bit with the Robin Hood dictionary, and the correct implementation seems to suffer from an odd collapse of performance around 2 million items. Here are the results that I get with 2110000 items:
[Dictionary] Mean 226.1293 Max 226.1293 Min => 226.1293
[RobinHood] Mean 1927.3044 Max 1927.3044 Min => 1927.3044
While with 2000000 items I get:
[Dictionary] Mean 226.1166 Max 226.1166 Min => 226.1166
[RobinHood] Mean 217.3498 Max 217.3498 Min => 217.3498
The benchmarking code is given below. I suspect there is a flaw in the implementation. It might explain why the Robin Hood approach did not seem competitive to Dictionary. I would very curious to see how it performs if this flaw is fixed.
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using robinhood;
namespace rhsandbox
{
public class Program
{
public static void Main(string[] args)
{
var dicCount = 2110000;
var repeats = 1;
var rand = new Random(42);
var keys = new int[dicCount];
var values = new double[dicCount];
for (var i = 0; i < dicCount; i++)
{
keys[i] = i;
values[i] = rand.NextDouble();
}
for (var i = 0; i < dicCount; i++)
{
var tmp = keys[i];
var j = rand.Next(dicCount);
keys[i] = keys[j];
keys[j] = tmp;
}
var times_RH = RunBenchmark(() => RunRH(keys, values), repeats).ToList();
var times_dict = RunBenchmark(() => RunDict(keys, values), repeats).ToList();
WriteTimes("Dictionary", times_dict);
WriteTimes("RobinHood", times_RH);
Console.ReadKey();
}
private static void WriteTimes(string name, IEnumerable<TimeSpan> times)
{
Console.WriteLine($"[{name}] Mean {times.Average(ts => ts.TotalMilliseconds)} Max {times.Max(ts => ts.TotalMilliseconds)} Min => {times.Min(ts => ts.TotalMilliseconds)}");
}
private static TimeSpan RunRH(int[] keys, double[] values)
{
var dict = new RobinHoodDictionary<int, double>();
var sw = Stopwatch.StartNew();
for (var i = 0; i < keys.Length; i++)
{
dict.Add(keys[i], values[i]);
}
sw.Stop();
return sw.Elapsed;
}
private static TimeSpan RunDict(int[] keys, double[] values)
{
var dict = new Dictionary<int, double>();
var sw = Stopwatch.StartNew();
for (var i = 0; i < keys.Length; i++)
{
dict.Add(keys[i], values[i]);
}
sw.Stop();
return sw.Elapsed;
}
private static IEnumerable<TimeSpan> RunBenchmark(Func<TimeSpan> method, int repeats)
{
for (int i = 0; i < repeats; i++)
{
method();
yield return method();
method();
}
}
}
}