Apply Pixman WASM SIMD Patch
Problem is that WASM SIMD support is still a bit experimental and most browsers got it in 2022. So is imo risky to roll this out as it will break likely on some devices.
There is this patch by libreoffice: https://cgit.freedesktop.org/libreoffice/core/commit/?id=d5f5f0984510d6c1b453e31c1ad58fb29fed278b
And here some benchmarks:
~~lol. nope. Better luck next time~~ node provided by emscripten is too old. System Node works >.>
RuntimeError: Aborted(CompileError: WebAssembly.instantiate():
Compiling function #41284:"_pixman_compute_composite_region32" failed:
invalid simd opcode @+7672059)
I marked the interesting ones with a <-. Top: No SIMD. Bottom: SIMD
Bitmap:
Unfortunately the normal Blit does not become faster :/. This is the typical operation.
------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------
BM_FindFormatSingle 6.22 ns 6.22 ns 111794665
BM_FindFormat 23.7 ns 23.7 ns 29602334
BM_ComputeImageOpacity 25185 ns 25185 ns 27740 <-
BM_ComputeImageOpacityChipset 62613 ns 62613 ns 11134 <-
BM_Create 12401 ns 12401 ns 56128
BM_Blit 43057 ns 43057 ns 16243
BM_BlitFast 23854 ns 23854 ns 29167
BM_TiledBlit 364 ns 364 ns 1922473
BM_TiledBlitOffset 6786 ns 6786 ns 102803
BM_StretchBlit 62384 ns 62384 ns 11198
BM_StretchBlitRect 62219 ns 62219 ns 11224
BM_FlipBlit 62320 ns 62320 ns 11207
BM_ZoomOpacityBlit 3820 ns 3820 ns 183017
BM_RotateZoomOpacityBlit 4078 ns 4078 ns 171257
BM_WaverBlit 155964 ns 155964 ns 4485 <-
BM_Fill 25066 ns 25066 ns 27915 <-
BM_FillRect 293335 ns 293335 ns 2382 <-
BM_Clear 12309 ns 12309 ns 56928
BM_ClearRect 25110 ns 25110 ns 27890 <-
BM_HueChangeBlit 751022 ns 751022 ns 935 <-
BM_ToneBlit 125215 ns 125215 ns 5590
BM_BlendBlit 431912 ns 431912 ns 1614 <-
BM_Flip 66692 ns 66692 ns 10486
BM_MaskedBlit 492941 ns 492941 ns 1420 <-
BM_MaskedColorBlit 388289 ns 388289 ns 1800 <-
BM_Blit2x 38527 ns 38527 ns 18163
BM_TransformRectangle 86.0 ns 86.0 ns 8136947
BM_EffectsBlit 150537 ns 150537 ns 4368 <-
------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------
BM_FindFormatSingle 6.16 ns 6.16 ns 113345317
BM_FindFormat 23.6 ns 23.6 ns 29618854
BM_ComputeImageOpacity 7474 ns 7474 ns 94677 <-
BM_ComputeImageOpacityChipset 35757 ns 35757 ns 19544 <-
BM_Create 12144 ns 12144 ns 57417
BM_Blit 43108 ns 43108 ns 16242
BM_BlitFast 23532 ns 23532 ns 29834
BM_TiledBlit 364 ns 364 ns 1920530
BM_TiledBlitOffset 6776 ns 6776 ns 103035
BM_StretchBlit 62228 ns 62228 ns 11196
BM_StretchBlitRect 62181 ns 62181 ns 11252
BM_FlipBlit 62331 ns 62331 ns 11226
BM_ZoomOpacityBlit 3804 ns 3804 ns 183853
BM_RotateZoomOpacityBlit 4085 ns 4085 ns 171303
BM_WaverBlit 144637 ns 144637 ns 4838 <-
BM_Fill 9979 ns 9979 ns 70038 <-
BM_FillRect 120635 ns 120635 ns 5785 <-
BM_Clear 12000 ns 12000 ns 58375
BM_ClearRect 9965 ns 9965 ns 69936 <-
BM_HueChangeBlit 426308 ns 426308 ns 1665 <-
BM_ToneBlit 124925 ns 124926 ns 5599
BM_BlendBlit 188520 ns 188520 ns 3716 <-
BM_Flip 66193 ns 66193 ns 10576
BM_MaskedBlit 181776 ns 181776 ns 3975 <-
BM_MaskedColorBlit 144807 ns 144807 ns 4824 <-
BM_Blit2x 38458 ns 38458 ns 18189
BM_TransformRectangle 86.1 ns 86.1 ns 8135229
BM_EffectsBlit 144725 ns 144725 ns 4600 <-
Draw:
(Crashes with out of memory in both builds)
Font:
----------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------
BM_FontSizeStr 3000 ns 3000 ns 232883
BM_FontSizeChar 62.3 ns 62.3 ns 11231302
BM_vRender 195 ns 195 ns 3584303
BM_Render 8427 ns 8427 ns 79988 <-
----------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------
BM_FontSizeStr 3006 ns 3006 ns 232460
BM_FontSizeChar 62.3 ns 62.3 ns 11227807
BM_vRender 188 ns 188 ns 3728400
BM_Render 7214 ns 7214 ns 96501 <-
Pixel Format:
Interestingly ARGB and ABGR become twice as fast but we already figured this out years ago and use RGBA and BGRA by default.
--------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------
BM_BlitBGRA_a 43159 ns 43159 ns 16222
BM_BlitRGBA_a 43076 ns 43076 ns 16249
BM_BlitABGR_a 208881 ns 208881 ns 3366 <-
BM_BlitARGB_a 410677 ns 410677 ns 1706 <-
BM_BlitBGRA_n 23190 ns 23190 ns 30179
BM_BlitRGBA_n 23203 ns 23203 ns 30199
BM_BlitABGR_n 157621 ns 157622 ns 4454
BM_BlitARGB_n 23233 ns 23233 ns 29990
--------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------
BM_BlitBGRA_a 43074 ns 43074 ns 16249
BM_BlitRGBA_a 43125 ns 43125 ns 16234
BM_BlitABGR_a 130252 ns 130252 ns 5308 <-
BM_BlitARGB_a 148160 ns 148160 ns 4688 <-
BM_BlitBGRA_n 23426 ns 23426 ns 29832
BM_BlitRGBA_n 23398 ns 23398 ns 29877
BM_BlitABGR_n 60078 ns 60078 ns 11441
BM_BlitARGB_n 23456 ns 23456 ns 29836
Text:
------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------
BM_TextDrawStrSystem 334333 ns 334334 ns 2085 <-
BM_TextDrawStrColor 146541 ns 146541 ns 4754 <-
BM_TextDrawCharSystem 8454 ns 8454 ns 83365 <-
BM_TextDrawCharSystemEx 14.5 ns 14.5 ns 48445341
BM_TextDrawCharColor 3506 ns 3506 ns 200249 <-
BM_TextDrawCharColorEx 8.70 ns 8.70 ns 80582626
------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------
BM_TextDrawStrSystem 285243 ns 285243 ns 2423 <-
BM_TextDrawStrColor 128538 ns 128538 ns 5430 <-
BM_TextDrawCharSystem 7254 ns 7254 ns 96237 <-
BM_TextDrawCharSystemEx 14.5 ns 14.5 ns 48438472
BM_TextDrawCharColor 3189 ns 3189 ns 226936 <-
BM_TextDrawCharColorEx 8.69 ns 8.69 ns 80346327
About the state of SIMD: This patch is rolled out since a while for ynoproject and they have complaints since months that the Player fails to start.
Though the issue is not SIMD support in the browser, but old CPUs from 10 years ago with lack of certain SIMD instructions. :sweat_smile:
Apologies if this is off-topic, but recently I've been hearing reports of iOS players not being able to run the SIMD-less version of ynoproject's Player, due to the presence of SIMD code. Is it something fixable in the foreseeable future, or does the team need more feedback from players? Thanks!
We only accept bug reports when they are reproducable in the official web player: https://easyrpg.org/play/master
I observed that pixman actually doesn't have a lot of fast paths for SSSE3, but rather SSE2 so I opted to patch for SSE2 instead. BM_Blit and other benchmarks saw great improvements at the cost of others. It is also noteworthy that enabling SSSE3 with SSE2 as fallback seems to have a negative effect perhaps due to branching code.
Here are some very unscientific measurements:
bench/bitmap.cpp (baseline)
------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------
BM_FindFormatSingle 3.49 ns 3.49 ns 200809868
BM_FindFormat 13.0 ns 13.0 ns 53818665
BM_ComputeImageOpacity 4294 ns 4294 ns 162646
BM_ComputeImageOpacityChipset 23996 ns 23996 ns 29018
BM_Create 6081 ns 6081 ns 117449
BM_Blit 34395 ns 34395 ns 20275
BM_BlitFast 5765 ns 5765 ns 118172
BM_TiledBlit 214 ns 214 ns 3201599
BM_TiledBlitOffset 3575 ns 3575 ns 195864
BM_StretchBlit 42705 ns 42705 ns 16307
BM_StretchBlitRect 43079 ns 43079 ns 16319
BM_FlipBlit 42844 ns 42844 ns 15974
BM_ZoomOpacityBlit 2241 ns 2241 ns 311206
BM_RotateZoomOpacityBlit 2422 ns 2422 ns 294152
BM_WaverBlit 88215 ns 88215 ns 7954
BM_Fill 8565 ns 8565 ns 81945
BM_FillRect 171672 ns 171672 ns 4047
BM_Clear 5845 ns 5845 ns 117351
BM_ClearRect 8613 ns 8613 ns 81666
BM_HueChangeBlit 474084 ns 474084 ns 1477
BM_ToneBlit 85586 ns 85586 ns 8126
BM_BlendBlit 277076 ns 277077 ns 2548
BM_Flip 101741 ns 101741 ns 6750
BM_MaskedBlit 295763 ns 295764 ns 2358
BM_MaskedColorBlit 239980 ns 239980 ns 2894
BM_Blit2x 26050 ns 26051 ns 26904
BM_TransformRectangle 44.9 ns 44.9 ns 15608992
BM_EffectsBlit 89569 ns 89569 ns 7856
bench/bitmap.cpp (SSE2)
------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------
BM_FindFormatSingle 3.60 ns 3.60 ns 194575457
BM_FindFormat 13.3 ns 13.3 ns 52517737
BM_ComputeImageOpacity 4500 ns 4500 ns 156938
BM_ComputeImageOpacityChipset 24523 ns 24523 ns 28682
BM_Create 6130 ns 6130 ns 118493
BM_Blit 11356 ns 11356 ns 62753 (faster)
BM_BlitFast 5917 ns 5917 ns 117949
BM_TiledBlit 168 ns 168 ns 4167409
BM_TiledBlitOffset 2182 ns 2182 ns 314725 (faster)
BM_StretchBlit 61337 ns 61337 ns 11449 (slower)
BM_StretchBlitRect 61469 ns 61469 ns 11535 (slower)
BM_FlipBlit 61309 ns 61309 ns 11476 (slower)
BM_ZoomOpacityBlit 3222 ns 3222 ns 217607 (slower)
BM_RotateZoomOpacityBlit 3355 ns 3355 ns 208259 (slower)
BM_WaverBlit 107832 ns 107832 ns 6333 (slower)
BM_Fill 4205 ns 4205 ns 166465 (faster)
BM_FillRect 46137 ns 46137 ns 15269 (faster)
BM_Clear 6035 ns 6035 ns 116866 (faster)
BM_ClearRect 4194 ns 4194 ns 166409 (faster)
BM_HueChangeBlit 207792 ns 207792 ns 3342 (faster)
BM_ToneBlit 62345 ns 62346 ns 11146 (faster)
BM_BlendBlit 67146 ns 67146 ns 10455 (faster)
BM_Flip 103784 ns 103784 ns 6603
BM_MaskedBlit 71068 ns 71068 ns 9418 (faster)
BM_MaskedColorBlit 59258 ns 59258 ns 12253 (faster)
BM_Blit2x 28626 ns 28626 ns 24228
BM_TransformRectangle 49.1 ns 49.1 ns 14092031
BM_EffectsBlit 116290 ns 116290 ns 6097 (slower)
bench/font.cpp (baseline)
----------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------
BM_FontSizeStr 1772 ns 1772 ns 402582
BM_FontSizeChar 40.6 ns 40.6 ns 17183091
BM_vRender 97.4 ns 97.4 ns 7058369
BM_Render 3464 ns 3464 ns 198550
bench/font.cpp (SSE2)
----------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------
BM_FontSizeStr 1745 ns 1745 ns 405199
BM_FontSizeChar 39.0 ns 39.0 ns 17993729
BM_vRender 95.3 ns 95.3 ns 7136918
BM_Render 2676 ns 2676 ns 261644 (faster)
bench/pixel_format.cpp (baseline)
--------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------
BM_BlitBGRA_a 36895 ns 36895 ns 18855
BM_BlitRGBA_a 37173 ns 37173 ns 18761
BM_BlitABGR_a 141954 ns 141954 ns 4850
BM_BlitARGB_a 266456 ns 266456 ns 2640
BM_BlitBGRA_n 6639 ns 6639 ns 109780
BM_BlitRGBA_n 6704 ns 6704 ns 101545
BM_BlitABGR_n 66521 ns 66521 ns 10104
BM_BlitARGB_n 6722 ns 6722 ns 102618
bench/pixel_format.cpp (SSE2)
--------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------
BM_BlitBGRA_a 11941 ns 11941 ns 59098 (faster)
BM_BlitRGBA_a 12101 ns 12101 ns 58016 (faster)
BM_BlitABGR_a 46923 ns 46923 ns 15021 (faster)
BM_BlitARGB_a 53954 ns 53954 ns 12877 (faster)
BM_BlitBGRA_n 7065 ns 7065 ns 97536 (slower)
BM_BlitRGBA_n 7016 ns 7016 ns 95469 (slower)
BM_BlitABGR_n 27180 ns 27180 ns 25899 (faster)
BM_BlitARGB_n 6612 ns 6612 ns 102371
bench/text.cpp (baseline)
------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------
BM_TextDrawStrSystem 139300 ns 139300 ns 4959
BM_TextDrawStrColor 62378 ns 62378 ns 10875
BM_TextDrawCharSystem 3538 ns 3538 ns 196086
BM_TextDrawCharSystemEx 7.31 ns 7.31 ns 94806000
BM_TextDrawCharColor 1564 ns 1564 ns 446296
BM_TextDrawCharColorEx 5.56 ns 5.56 ns 125895293
bench/text.cpp (SSE2)
------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------
BM_TextDrawStrSystem 120893 ns 120893 ns 5727 (faster)
BM_TextDrawStrColor 57627 ns 57627 ns 11817 (faster)
BM_TextDrawCharSystem 3004 ns 3004 ns 234003 (faster)
BM_TextDrawCharSystemEx 7.37 ns 7.37 ns 93764693
BM_TextDrawCharColor 1427 ns 1427 ns 484388
BM_TextDrawCharColorEx 5.59 ns 5.59 ns 117193273
Cool. The normal Blit operation is the most common, a speedup in it helps alot :).