The original, unoptimized stable B-Tree (with new metering) (c.f. #97)
Sames as #97 but using unoptimized version of stable btree (to see perf improvements relative to #97.
Note Diffing the performance result against the published result from main branch. Unchanged benchmarks are omitted.
Map
| binary_size | generate 1m | max mem | batch_get 50 | batch_put 50 | batch_remove 50 | upgrade | |
|---|---|---|---|---|---|---|---|
| hashmap | 160_146 ($\textcolor{green}{-0.05\%}$) | 8_415_853_637 ($\textcolor{red}{20.50\%}$) | 61_987_852 | 344_668 ($\textcolor{red}{19.40\%}$) | 6_554_314_393 ($\textcolor{red}{18.38\%}$) | 372_027 ($\textcolor{red}{19.93\%}$) | 11_086_889_804 ($\textcolor{red}{21.45\%}$) |
| triemap | 163_317 ($\textcolor{green}{-0.10\%}$) | 14_814_581_233 ($\textcolor{red}{29.23\%}$) | 74_216_172 | 291_744 ($\textcolor{red}{30.87\%}$) | 714_312 ($\textcolor{red}{30.01\%}$) | 700_163 ($\textcolor{red}{29.61\%}$) | 16_814_045_622 ($\textcolor{red}{28.60\%}$) |
| rbtree | 157_922 ($\textcolor{green}{-0.14\%}$) | 7_517_002_376 ($\textcolor{red}{25.72\%}$) | 57_996_060 | 113_955 ($\textcolor{red}{28.18\%}$) | 339_364 ($\textcolor{red}{26.36\%}$) | 362_212 ($\textcolor{red}{30.13\%}$) | 7_125_606_793 ($\textcolor{red}{23.45\%}$) |
| splay | 159_759 ($\textcolor{green}{-0.12\%}$) | 15_011_765_466 ($\textcolor{red}{29.77\%}$) | 53_995_996 | 719_957 ($\textcolor{red}{30.42\%}$) | 758_215 ($\textcolor{red}{30.33\%}$) | 1_051_199 ($\textcolor{red}{29.73\%}$) | 4_732_822_678 ($\textcolor{red}{27.14\%}$) |
| btree | 187_666 ($\textcolor{green}{-0.12\%}$) | 10_705_101_825 ($\textcolor{red}{30.17\%}$) | 31_104_012 | 365_187 ($\textcolor{red}{31.58\%}$) | 503_327 ($\textcolor{red}{31.02\%}$) | 559_436 ($\textcolor{red}{30.39\%}$) | 3_120_410_218 ($\textcolor{red}{23.93\%}$) |
| zhenya_hashmap | 160_516 ($\textcolor{red}{0.00\%}$) | 2_776_814_274 ($\textcolor{red}{26.13\%}$) | 22_773_100 | 64_709 ($\textcolor{red}{33.07\%}$) | 82_032 ($\textcolor{red}{32.65\%}$) | 94_090 ($\textcolor{red}{32.76\%}$) | 3_319_787_035 ($\textcolor{red}{23.16\%}$) |
| btreemap_rs | 477_037 ($\textcolor{green}{-0.12\%}$) | 1_793_694_183 ($\textcolor{red}{8.60\%}$) | 27_590_656 | 75_427 ($\textcolor{red}{12.81\%}$) | 125_068 ($\textcolor{red}{11.19\%}$) | 86_003 ($\textcolor{red}{12.81\%}$) | 2_927_540_279 ($\textcolor{red}{10.02\%}$) |
| imrc_hashmap_rs | 479_206 ($\textcolor{green}{-0.12\%}$) | 2_580_421_602 ($\textcolor{red}{7.84\%}$) | 244_973_568 | 38_236 ($\textcolor{red}{16.70\%}$) | 178_903 ($\textcolor{red}{9.59\%}$) | 114_251 ($\textcolor{red}{16.12\%}$) | 5_754_488_547 ($\textcolor{red}{10.84\%}$) |
| hashmap_rs | 467_420 ($\textcolor{green}{-0.12\%}$) | 432_765_549 ($\textcolor{red}{7.31\%}$) | 73_138_176 | 21_704 ($\textcolor{red}{28.80\%}$) | 26_758 ($\textcolor{red}{23.42\%}$) | 25_073 ($\textcolor{red}{23.74\%}$) | 1_280_477_959 ($\textcolor{red}{11.85\%}$) |
Priority queue
| binary_size | heapify 1m | max mem | pop_min 50 | put 50 | pop_min 50.1 | upgrade | |
|---|---|---|---|---|---|---|---|
| heap | 147_633 ($\textcolor{green}{-0.00\%}$) | 6_206_169_472 ($\textcolor{red}{32.48\%}$) | 29_995_956 | 680_841 ($\textcolor{red}{33.11\%}$) | 250_925 ($\textcolor{red}{34.57\%}$) | 648_731 ($\textcolor{red}{33.15\%}$) | 3_293_282_643 ($\textcolor{red}{24.01\%}$) |
| heap_rs | 463_292 ($\textcolor{green}{-0.12\%}$) | 138_669_631 ($\textcolor{red}{14.04\%}$) | 18_284_544 | 57_362 ($\textcolor{red}{11.04\%}$) | 23_196 ($\textcolor{red}{27.14\%}$) | 57_496 ($\textcolor{red}{10.99\%}$) | 505_960_753 ($\textcolor{red}{14.80\%}$) |
Growable array
| binary_size | generate 5k | max mem | batch_get 500 | batch_put 500 | batch_remove 500 | upgrade | |
|---|---|---|---|---|---|---|---|
| buffer | 150_978 ($\textcolor{green}{-0.02\%}$) | 2_718_410 ($\textcolor{red}{30.53\%}$) | 65_644 | 104_997 ($\textcolor{red}{43.65\%}$) | 858_376 ($\textcolor{red}{27.83\%}$) | 172_497 ($\textcolor{red}{35.19\%}$) | 3_141_451 ($\textcolor{red}{26.95\%}$) |
| vector | 152_598 ($\textcolor{red}{0.03\%}$) | 2_131_660 ($\textcolor{red}{34.21\%}$) | 24_580 | 141_137 ($\textcolor{red}{34.17\%}$) | 203_207 ($\textcolor{red}{35.53\%}$) | 195_769 ($\textcolor{red}{32.19\%}$) | 4_756_048 ($\textcolor{red}{23.71\%}$) |
| vec_rs | 459_107 ($\textcolor{green}{-0.12\%}$) | 288_898 ($\textcolor{red}{8.74\%}$) | 1_310_720 | 17_148 ($\textcolor{red}{31.77\%}$) | 30_473 ($\textcolor{red}{20.15\%}$) | 25_609 ($\textcolor{red}{20.53\%}$) | 3_141_188 ($\textcolor{red}{14.48\%}$) |
Warning Skip table 3 ## Stable structures from _out/collections/README.md, due to table shape mismatches from main branch.
Statistics
- binary_size: -0.08% [-0.11%, -0.05%]
- max_mem: no change
- cycles: 24.15% [22.42%, 25.89%]
Overall Statistics
- binary_size: -0.08% [-0.11%, -0.05%]
- max_mem: no change
- cycles: 24.15% [22.42%, 25.89%]
Note The flamegraph link only works after you merge. Unchanged benchmarks are omitted.
Collection libraries
Measure different collection libraries written in both Motoko and Rust.
The library names with _rs suffix are written in Rust; the rest are written in Motoko.
The _stable and _stable_rs suffix represents that the library directly writes the state to stable memory using Region in Motoko and ic-stable-stuctures in Rust.
We use the same random number generator with fixed seed to ensure that all collections contain the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:
- generate 1m. Insert 1m Nat64 integers into the collection. For Motoko collections, it usually triggers the GC; the rest of the column are not likely to trigger GC.
- max mem. For Motoko, it reports
rts_max_heap_sizeaftergeneratecall; For Rust, it reports the Wasm's memory page * 64Kb; For stable benchmarks, it reports the region size of the stable memory storing the map. - batch_get 50. Find 50 elements from the collection.
- batch_put 50. Insert 50 elements to the collection.
- batch_remove 50. Remove 50 elements from the collection.
- upgrade. Upgrade the canister with the same Wasm module. For non-stable benchmarks, the map state is persisted by serializing and deserializing states into stable memory. For stable benchmarks, the upgrade only needs to initialize the metadata, as the state is already in the stable memory.
💎 Takeaways
- The platform only charges for instruction count. Data structures which make use of caching and locality have no impact on the cost.
- We have a limit on the maximal cycles per round. This means asymptotic behavior doesn't matter much. We care more about the performance up to a fixed N. In the extreme cases, you may see an $O(10000 n\log n)$ algorithm hitting the limit, while an $O(n^2)$ algorithm runs just fine.
- Amortized algorithms/GC may need to be more eager to avoid hitting the cycle limit on a particular round.
- Rust costs more cycles to process complicated Candid data, but it is more efficient in performing core computations.
Note
- The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.
- Due to the instrumentation overhead and cycle limit, we cannot profile computations with very large collections.
- The
upgradecolumn uses Candid for serializing stable data. In Rust, you may get better cycle cost by using a different serialization format. Another slowdown in Rust is thatic-stable-structurestends to be slower than the region memory in Motoko.- Different library has different ways for persisting data during upgrades, there are mainly three categories:
- Use stable variable directly in Motoko:
zhenya_hashmap,btree,vector- Expose and serialize external state (
share/unsharein Motoko,candid::Encodein Rust):rbtree,heap,btreemap_rs,hashmap_rs,heap_rs,vector_rs- Use pre/post-upgrade hooks to convert data into an array:
hashmap,splay,triemap,buffer,imrc_hashmap_rs- The stable benchmarks are much more expensive than their non-stable counterpart, because the stable memory API is much more expensive. The benefit is that they get fast upgrade. The upgrade still needs to parse the metadata when initializing the upgraded Wasm module.
hashmapuses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost ofbatch_put 50is much higher than other data structures.btreecomes from mops.one/stableheapbtreemap.btree_stablecomes from github.com/sardariuss.zhenya_hashmapcomes from mops.one/map.vectorcomes from mops.one/vector. Compare withbuffer,puthas better worst case time and space complexity ($O(\sqrt{n})$ vs $O(n)$);gethas a slightly larger constant overhead.hashmap_rsuses thefxhashcrate, which is the same asstd::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.imrc_hashmap_rsuses theim-rccrate, which is the immutable version hashmap in Rust.
Map
| binary_size | generate 1m | max mem | batch_get 50 | batch_put 50 | batch_remove 50 | upgrade | |
|---|---|---|---|---|---|---|---|
| hashmap | 160_146 | 8_415_853_637 | 61_987_852 | 344_668 | 6_554_314_393 | 372_027 | 11_086_889_804 |
| triemap | 163_317 | 14_814_581_233 | 74_216_172 | 291_744 | 714_312 | 700_163 | 16_814_045_622 |
| rbtree | 157_922 | 7_517_002_376 | 57_996_060 | 113_955 | 339_364 | 362_212 | 7_125_606_793 |
| splay | 159_759 | 15_011_765_466 | 53_995_996 | 719_957 | 758_215 | 1_051_199 | 4_732_822_678 |
| btree | 187_666 | 10_705_101_825 | 31_104_012 | 365_187 | 503_327 | 559_436 | 3_120_410_218 |
| zhenya_hashmap | 160_516 | 2_776_814_274 | 22_773_100 | 64_709 | 82_032 | 94_090 | 3_319_787_035 |
| btreemap_rs | 477_037 | 1_793_694_183 | 27_590_656 | 75_427 | 125_068 | 86_003 | 2_927_540_279 |
| imrc_hashmap_rs | 479_206 | 2_580_421_602 | 244_973_568 | 38_236 | 178_903 | 114_251 | 5_754_488_547 |
| hashmap_rs | 467_420 | 432_765_549 | 73_138_176 | 21_704 | 26_758 | 25_073 | 1_280_477_959 |
Priority queue
| binary_size | heapify 1m | max mem | pop_min 50 | put 50 | pop_min 50 | upgrade | |
|---|---|---|---|---|---|---|---|
| heap | 147_633 | 6_206_169_472 | 29_995_956 | 680_841 | 250_925 | 648_731 | 3_293_282_643 |
| heap_rs | 463_292 | 138_669_631 | 18_284_544 | 57_362 | 23_196 | 57_496 | 505_960_753 |
Growable array
| binary_size | generate 5k | max mem | batch_get 500 | batch_put 500 | batch_remove 500 | upgrade | |
|---|---|---|---|---|---|---|---|
| buffer | 150_978 | 2_718_410 | 65_644 | 104_997 | 858_376 | 172_497 | 3_141_451 |
| vector | 152_598 | 2_131_660 | 24_580 | 141_137 | 203_207 | 195_769 | 4_756_048 |
| vec_rs | 459_107 | 288_898 | 1_310_720 | 17_148 | 30_473 | 25_609 | 3_141_188 |
Stable structures
| binary_size | generate 50k | max mem | batch_get 50 | batch_put 50 | batch_remove 50 | upgrade | |
|---|---|---|---|---|---|---|---|
| btree | 187_666 | 457_750_863 | 1_554_152 | 289_410 | 442_003 | 479_799 | 155_920_331 |
| btree_stable | 205_452 | 17_160_920_654 | 2_621_440 | 13_928_890 | 18_985_993 | 37_332_002 | 40_832 |
| btreemap_rs | 477_037 | 76_218_029 | 2_555_904 | 64_985 | 97_003 | 84_994 | 125_790_335 |
| btreemap_stable_rs | 478_103 | 4_751_143_101 | 2_621_440 | 2_844_949 | 5_181_254 | 8_809_593 | 731_069 |
| heap_rs | 463_292 | 7_001_559 | 2_293_760 | 49_871 | 23_444 | 49_845 | 26_519_303 |
| heap_stable_rs | 450_480 | 317_932_953 | 458_752 | 2_667_588 | 276_814 | 2_647_531 | 731_182 |
| vec_rs | 459_107 | 3_079_223 | 2_228_224 | 17_148 | 18_323 | 17_850 | 24_471_685 |
| vec_stable_rs | 445_471 | 74_445_158 | 458_752 | 69_489 | 90_685 | 92_681 | 731_195 |
Environment
- dfx 0.15.1
- Motoko compiler 0.10.0 (source a3ywvw0a-p5a03qy6-vscbl9j8-qxszbxa6)
- rustc 1.73.0 (cc66ad468 2023-10-03)
- ic-repl 0.5.1
- ic-wasm 0.6.0