Optimization: add type guards to functions?
Apparently adding type guards on numerical functions can let the BEAM optimizer generate better code, Wings3D uses this heavily. If you're interested I can try it out and benchmark how much it actually helps.
That would be lovely!
I'm curious if the guard will still work for both integers and floats, is my main concern.
I'd love to use is_number, if that would have similar benefits.
I dunno, good question! I'm used to assuming that everything in a vector is a float, but I'll try out a few variations and see how it goes.
I have some results! Confusing ones. Altered graphmath to add guards to a few functions in this fork: https://github.com/icefoxen/graphmath
Benchmarked with this set of tests: https://hg.sr.ht/~icefox/graphmath_bench
Unaltered code:
Name ips average deviation median 99th %
graphmath_vec3_add 13.01 M 76.84 ns ±9159.72% 57 ns 210 ns
graphmath_vec3_dot 9.90 M 101.03 ns ±18801.93% 76 ns 274 ns
graphmath_mat4_transp 8.12 M 123.14 ns ±37117.69% 32 ns 125 ns
graphmath_vec3_cross 6.52 M 153.28 ns ±12000.18% 116 ns 422 ns
graphmath_vm4_mul 4.81 M 207.93 ns ±1560.15% 174 ns 632 ns
graphmath_mat4_add 4.65 M 215.15 ns ±2762.27% 170 ns 655 ns
graphmath_vec3_rotate 2.66 M 376.20 ns ±6629.31% 301 ns 1092 ns
graphmath_mat4_mul 0.77 M 1305.93 ns ±397.82% 1093 ns 4081 ns
Name Memory usage
graphmath_vec3_add 32 B
graphmath_vec3_dot 0 B
graphmath_mat4_transp 136 B
graphmath_vec3_cross 32 B
graphmath_vm4_mul 32 B
graphmath_mat4_add 136 B
graphmath_vec3_rotate 208 B
graphmath_mat4_mul 136 B
Type guards using is_float():
Name ips average deviation median 99th %
graphmath_vec3_dot 14.82 M 67.46 ns ±37823.92% 38 ns 138 ns
graphmath_vec3_cross 9.06 M 110.41 ns ±32179.37% 49 ns 169 ns
graphmath_vec3_add 8.97 M 111.48 ns ±46094.33% 41 ns 146 ns
graphmath_mat4_transp 8.42 M 118.78 ns ±43366.05% 32 ns 140 ns
graphmath_vm4_mul 7.91 M 126.39 ns ±30059.67% 64 ns 229 ns
graphmath_vec3_rotate 6.86 M 145.75 ns ±23640.86% 91 ns 330 ns
graphmath_mat4_add 4.00 M 250.14 ns ±24711.68% 93 ns 357 ns
graphmath_mat4_mul 2.77 M 360.52 ns ±15763.40% 223 ns 812 ns
Name Memory usage
graphmath_vec3_dot 16 B
graphmath_vec3_cross 80 B
graphmath_vec3_add 80 B
graphmath_mat4_transp 136 B
graphmath_vm4_mul 80 B
graphmath_vec3_rotate 80 B
graphmath_mat4_add 392 B
graphmath_mat4_mul 392 B
Type guards using is_float(), with a fallback using no guards:
Name ips average deviation median 99th %
graphmath_vec3_dot 14.18 M 70.50 ns ±38031.32% 40 ns 143 ns
graphmath_vec3_add 8.89 M 112.45 ns ±46384.95% 42 ns 150 ns
graphmath_mat4_transp 8.80 M 113.65 ns ±33097.61% 32 ns 127 ns
graphmath_vm4_mul 8.65 M 115.57 ns ±25127.65% 64 ns 225 ns
graphmath_vec3_cross 6.80 M 147.05 ns ±2449.46% 114 ns 420 ns
graphmath_vec3_rotate 5.97 M 167.44 ns ±23624.68% 95 ns 335 ns
graphmath_mat4_add 4.99 M 200.23 ns ±22949.23% 89 ns 325 ns
graphmath_mat4_mul 2.73 M 365.69 ns ±13068.49% 231 ns 834 ns
Name Memory usage
graphmath_vec3_dot 16 B
graphmath_vec3_add 80 B
graphmath_mat4_transp 136 B
graphmath_vm4_mul 80 B
graphmath_vec3_cross 32 B
graphmath_vec3_rotate 80 B
graphmath_mat4_add 392 B
graphmath_mat4_mul 392 B
Type guards using is_number(), with a fallback using no guards:
is_number() with fallback:
Name ips average deviation median 99th %
graphmath_vec3_add 12.51 M 79.96 ns ±5947.86% 60 ns 225 ns
graphmath_vec3_dot 9.92 M 100.79 ns ±15170.89% 80 ns 246 ns
graphmath_mat4_transp 9.41 M 106.27 ns ±28274.27% 32 ns 128 ns
graphmath_vec3_cross 6.29 M 158.88 ns ±12414.54% 115 ns 420 ns
graphmath_vm4_mul 4.27 M 234.30 ns ±1652.00% 194 ns 705 ns
graphmath_mat4_add 4.24 M 235.83 ns ±2705.20% 197 ns 726 ns
graphmath_vec3_rotate 2.54 M 394.43 ns ±7408.05% 309 ns 1125 ns
graphmath_mat4_mul 0.80 M 1251.89 ns ±397.93% 1107 ns 3982 ns
Memory usage statistics:
Name Memory usage
graphmath_vec3_add 32 B
graphmath_vec3_dot 0 B
graphmath_mat4_transp 136 B
graphmath_vec3_cross 32 B
graphmath_vm4_mul 32 B
graphmath_mat4_add 136 B
graphmath_vec3_rotate 208 B
graphmath_mat4_mul 136 B
Sooooo yeah, the results are... confusing, and probably very noisy though it seems repeatable enough. My half-assed conclusions so far:
- Typeguards definitely make a difference but not always the difference you expect
- is_float() usually is better than is_number()
- having a fallback function with no type guards is not free but is very cheap
- I have no idea what's up with memory usage
That's wild, thank you.
I think this is enough for me to be comfortable switching to is_float with a fallback for the weird cases otherwise.
Yeah that seems like the way to go. The only case where adding typeguards mysteriously makes life slower is Vec3.add/Mat44.add, for some reason. I'd love to understand why, but in practical terms I don't care that much.
@icefoxen just a heads-up...haven't forgotten about your good work here, just been a little swamped (literally--hit by Hurricane Beryl).
@icefoxen sorry to be a bother--can you check if the Mat44 and Mat33 work in #45 help with the performance numbers you noticed?