zetscript icon indicating copy to clipboard operation
zetscript copied to clipboard

Slow for.zs

Open jespa007 opened this issue 2 years ago • 3 comments

Description

For 'for.zs' that makes a performance test of a 1000000 of iterations that covers .push and add it shows an important slow respect other languages. For example one iteration in lua it takes 0.07s whereas ZetScript it takes ~64s

Lua is fast 64/0.07 = x914 times  that ZetScript

For wren it takes 0.18s

Wren is fast 64/0.18s = x355 times  that ZetScript

Has been detected that the slow part of the execution is affected to push and load,

elapsed push: 56.667000s
elapsed load: 61.035000s

Test code

Slow code 1 (60s)

var list=[];
for (var i=0; i < 1000000; i++) {
  list.push(i)
}

Byte code

[0000| 1|01]    NEW_ARRAY
[0001| 1|02]    PUSH_STK_LOCAL          list
[0002|-1|00]    STORE                   n:1 [RST]
[0003| 0|00]    PUSH_SCOPE
[0004| 1|01]    LOAD_INT                0
[0005| 1|02]    PUSH_STK_LOCAL          i
[0006|-1|00]    STORE                   n:1 [RST]
[0007| 1|01]    LT                      Local['i'],1000000
[0008|-1|00]    JNT                     019 (ins+11) 
[0009| 0|00]    PUSH_SCOPE
[0010| 1|01]    LOAD_LOCAL              list
[0011| 0|01]    LOAD_OBJ@ITEM           push 
[0012| 1|02]    PUSH_STK_LOCAL          i
[0013|-1|00]    MEMBER_CALL             arg:1 ret:1 [RST]
[0014| 0|00]    POP_SCOPE
[0015| 1|01]    PUSH_STK_LOCAL          i
[0016| 0|00]    POST_INC                [RST]
[0017| 0|00]    JMP                     007 (ins-10) 
[0018| 0|00]    POP_SCOPE
[0019| 0|00]    POP_SCOPE

Slow code 2 (60s)

var list=[];
var sum = 0
for (var i in list) {
        sum = sum + i
}

Byte code

[0000| 1|01]    NEW_ARRAY
[0001| 1|02]    PUSH_STK_LOCAL          list
[0002|-1|00]    STORE                   n:1 [RST]
[0003| 1|01]    LOAD_INT                0
[0004| 1|02]    PUSH_STK_LOCAL          sum
[0005|-1|00]    STORE                   n:1 [RST]
[0006| 0|00]    PUSH_SCOPE
[0007| 1|01]    LOAD_LOCAL              list
[0008| 1|02]    PUSH_STK_LOCAL          @_iter_0
[0009| 0|00]    IT_INIT                 [RST]
[0010| 1|01]    LOAD_LOCAL              @_iter_0
[0011| 0|01]    LOAD_OBJ@ITEM           _end 
[0012| 0|01]    MEMBER_CALL             arg:0 ret:1 
[0013|-1|00]    JT                      029 (ins+16) 
[0014| 0|00]    PUSH_SCOPE
[0015| 1|01]    LOAD_LOCAL              @_iter_0
[0016| 0|01]    LOAD_OBJ@ITEM           _get 
[0017| 0|01]    MEMBER_CALL             arg:0 ret:1 
[0018| 1|02]    PUSH_STK_LOCAL          i
[0019|-1|00]    STORE                   n:1 [RST]
[0020| 1|01]    ADD                     Local['sum'],Local['i']
[0021| 1|02]    PUSH_STK_LOCAL          sum
[0022|-1|00]    STORE                   n:1 [RST]
[0023| 0|00]    POP_SCOPE
[0024| 1|01]    LOAD_LOCAL              @_iter_0
[0025| 0|01]    LOAD_OBJ@ITEM           _next 
[0026|-1|00]    MEMBER_CALL             arg:0 ret:0 [RST]
[0027| 0|00]    JMP                     010 (ins-17) 
[0028| 0|00]    POP_SCOPE
[0029| 0|00]    POP_SCOPE

Fast code (77ms)

var sum=0;
for (var i=0; i < 1000000; i++) {
  sum=sum+i
}

Byte code

[0000| 1|01]    LOAD_INT                0
[0001| 1|02]    PUSH_STK_LOCAL          sum
[0002|-1|00]    STORE                   n:1 [RST]
[0003| 0|00]    PUSH_SCOPE
[0004| 1|01]    LOAD_INT                0
[0005| 1|02]    PUSH_STK_LOCAL          i
[0006|-1|00]    STORE                   n:1 [RST]
[0007| 1|01]    LT                      Local['i'],1000000
[0008|-1|00]    JNT                     018 (ins+10) 
[0009| 0|00]    PUSH_SCOPE
[0010| 1|01]    ADD                     Local['sum'],Local['i']
[0011| 1|02]    PUSH_STK_LOCAL          sum
[0012|-1|00]    STORE                   n:1 [RST]
[0013| 0|00]    POP_SCOPE
[0014| 1|01]    PUSH_STK_LOCAL          i
[0015| 0|00]    POST_INC                [RST]
[0016| 0|00]    JMP                     007 (ins-9) 
[0017| 0|00]    POP_SCOPE
[0018| 0|00]    POP_SCOPE

jespa007 avatar Sep 10 '23 09:09 jespa007

The list.push It has improve x20 by doing double size when reaches capacity. It has still an performance issue:

  1. vm_find_native_function (2.69% overhead of calls): it searchs for the c++ function every time before execute call.
  2. vm_load_field (34%): It does expensive operations like 'getSymbolMemberFunction' and 'ZS_NEW_OBJECT_MEMBER_FUNCTION'

jespa007 avatar Sep 11 '23 10:09 jespa007

Has been tested that if we don't use vm_find_native_function it speeds up x2.69. So with this improve,

Lua is fast 0.9/0.04 = x22 times that ZetScript

jespa007 avatar Sep 15 '23 13:09 jespa007

Has been modified zs_string constructor for fast creation. Furthermore, in the "vm_load_field" it had a CPU overload in searching member symbol. Because instruction value_op2 is not used, it has been used to save the last symbol searched in the instruction. In general the performance has been increased by x4.

So in metrics ZetScript longs 0.6 seconds. In general has been improved by,

64s/0.6s = x106 times faster

And now,


Lua
-----

Lua is fast 1.34s/0.07s = x8 times faster that ZetScript

Wren
-------

Wren is fast 1.34s/0.17s = x3.5 times faster that ZetScript

jespa007 avatar Sep 26 '23 11:09 jespa007