Map improve remove speed for large map and high percentage of element removed
This PR try to deal with the high execution time when removing large amount of entries for a large map. Still not very good at performance but at least some improvement.
What cause the problem as far as I understand: When removing a large amount of entries, map's freelist (the one containing all empty indexes) increase in size, and manipulating this list becomes costly when the map's memory is very fragmented.
The proposed solution defragment the map when the size of freelist is bigger than a threshold. The defragment method is moving elements in map. The threshold is chosen in experimentation, to do not increase the execution time when removing 10% of elements and/or 100% of elements. It is a very high threshold and is only impacting the performance when map size reach around 500,000.
Performance comparison before => after
- Nearly no change below N=100,000. Time for N=100,000:
- 10% remove: 0.0068 sec
- 100% remove: 0.2151 sec
- N=500,000
- 10% remove: 0.1685 sec => 0.0727sec
- 100% remove: 4.3643 sec => 0.7718 sec
- N=1000000
- 10% remove: 0.6612 sec => 0.1569 sec
- 100% remove: 19.4425 sec => 1.200 sec
- N=5000000
- 10% remove: (Unknown) => 1.9806 sec
- 100% remove: (Unknown) => 19.0631 sec
Test code:
static function mapTest(N : Int, repeat : Int = 10) {
trace("N", N, "repeat", repeat);
var timeSet : Float = 0;
var timeGet : Float = 0;
var timeRemove1 : Float = 0;
var timeAdd1 : Float = 0;
var timeValid : Float = 0;
var timeRemoveAll : Float = 0;
for (r in 0...repeat) {
// Prepare key array
var arr : Array<String> = [];
for (i in 0...N) {
arr.push(i + String.fromCharCode(65+Std.random(26)));
}
// Prepare map
var map:Map<String, Int> = [];
var t0 = haxe.Timer.stamp();
// Set
for (i in 0...N) {
map.set(arr[i], i);
}
var t1 = haxe.Timer.stamp();
// Get
var keys = map.keys();
for (key in keys) {
var a = map.get(key);
a++;
}
var t2 = haxe.Timer.stamp();
// Remove 1/10
var N10 : Int = Std.int(N/10);
var N5 : Int = Std.int(N/5);
for (i in N10...N5) {
map.remove(arr[i]);
}
var t3 = haxe.Timer.stamp();
// Add 1/10
for (i in N10...N5) {
map.set(arr[i], i);
}
var t4 = haxe.Timer.stamp();
// Validate
for (i in 0...N){
var a = map.get(arr[i]);
if (a != i) trace("assert", a, i);
}
var t5 = haxe.Timer.stamp();
// Remove all
for (i in 0...N){
var b = map.remove(arr[i]);
if (!b) trace("assert remove do not exist", arr[i], i);
}
var t6 = haxe.Timer.stamp();
timeSet += t1 - t0;
timeGet += t2 - t1;
timeRemove1 += t3 - t2;
timeAdd1 += t4 - t3;
timeValid += t5 - t4;
timeRemoveAll += t6 - t5;
}
trace("timeSet :", timeSet / repeat);
trace("timeGet :", timeGet / repeat);
trace("timeRemove1 :", timeRemove1 / repeat);
trace("timeAdd1 :", timeAdd1 / repeat);
trace("timeValid :", timeValid / repeat);
trace("timeRemoveAll:", timeRemoveAll / repeat);
}
I just implemented the Robin Hood Hashing for Hashmap. (References : https://www.youtube.com/watch?v=4cTKiTQtS_k&t=893s , https://programming.guide/robin-hood-hashing.html) Tested with the previously described script, it seems to greatly reduce remove time without scarifying set/get time, however it occupied 50% more memory space.
- Removing all elements one by one for N= 500 000 only takes 0.029secs on my machine (compare to 0.041secs on JS, and 4.38secs with existing implementation, 0.772secs with previous optimization).
- Removing all elements one by one for N= 5 000 000 only takes 0.453secs
- Get/Set time seems to be reduced 5% to 20% in most cases
The overall performance is now comparable to javascript target - even faster in some situations.
TODO: after we're sure of this change, we'll need memdump + debugger support