rhbackshiftdict icon indicating copy to clipboard operation
rhbackshiftdict copied to clipboard

Collapse of performance around ~2M items

Open vermorel opened this issue 7 years ago • 1 comments

I have been toying a bit with the Robin Hood dictionary, and the correct implementation seems to suffer from an odd collapse of performance around 2 million items. Here are the results that I get with 2110000 items:

[Dictionary] Mean 226.1293 Max 226.1293 Min => 226.1293
[RobinHood] Mean 1927.3044 Max 1927.3044 Min => 1927.3044

While with 2000000 items I get:

[Dictionary] Mean 226.1166 Max 226.1166 Min => 226.1166
[RobinHood] Mean 217.3498 Max 217.3498 Min => 217.3498

The benchmarking code is given below. I suspect there is a flaw in the implementation. It might explain why the Robin Hood approach did not seem competitive to Dictionary. I would very curious to see how it performs if this flaw is fixed.

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using robinhood;

namespace rhsandbox
{
    public class Program
    {
        public static void Main(string[] args)
        {
            var dicCount = 2110000;
            var repeats = 1;

            var rand = new Random(42);
            var keys = new int[dicCount];
            var values = new double[dicCount];
            for (var i = 0; i < dicCount; i++)
            {
                keys[i] = i;
                values[i] = rand.NextDouble();
            }
            for (var i = 0; i < dicCount; i++)
            {
                var tmp = keys[i];
                var j = rand.Next(dicCount);
                keys[i] = keys[j];
                keys[j] = tmp;
            }

            var times_RH = RunBenchmark(() => RunRH(keys, values), repeats).ToList();
            var times_dict = RunBenchmark(() => RunDict(keys, values), repeats).ToList();

            WriteTimes("Dictionary", times_dict);
            WriteTimes("RobinHood", times_RH);
            Console.ReadKey();
        }

        private static void WriteTimes(string name, IEnumerable<TimeSpan> times)
        {
            Console.WriteLine($"[{name}] Mean {times.Average(ts => ts.TotalMilliseconds)} Max {times.Max(ts => ts.TotalMilliseconds)} Min => {times.Min(ts => ts.TotalMilliseconds)}");
        }

        private static TimeSpan RunRH(int[] keys, double[] values)
        {
            var dict = new RobinHoodDictionary<int, double>();
 
            var sw = Stopwatch.StartNew();
            for (var i = 0; i < keys.Length; i++)
            {
                dict.Add(keys[i], values[i]);
            }
            sw.Stop();

            return sw.Elapsed;
        }

        private static TimeSpan RunDict(int[] keys, double[] values)
        {
            var dict = new Dictionary<int, double>();

            var sw = Stopwatch.StartNew();
            for (var i = 0; i < keys.Length; i++)
            {
                dict.Add(keys[i], values[i]);
            }
            sw.Stop();

            return sw.Elapsed;
        }

        private static IEnumerable<TimeSpan> RunBenchmark(Func<TimeSpan> method, int repeats)
        {
            for (int i = 0; i < repeats; i++)
            {
                method();
                yield return method();
                method();
            }
        }
    }
}

vermorel avatar Jun 02 '18 15:06 vermorel