Slow for.zs
Description
For 'for.zs' that makes a performance test of a 1000000 of iterations that covers .push and add it shows an important slow respect other languages. For example one iteration in lua it takes 0.07s whereas ZetScript it takes ~64s
Lua is fast 64/0.07 = x914 times that ZetScript
For wren it takes 0.18s
Wren is fast 64/0.18s = x355 times that ZetScript
Has been detected that the slow part of the execution is affected to push and load,
elapsed push: 56.667000s
elapsed load: 61.035000s
Test code
Slow code 1 (60s)
var list=[];
for (var i=0; i < 1000000; i++) {
list.push(i)
}
Byte code
[0000| 1|01] NEW_ARRAY
[0001| 1|02] PUSH_STK_LOCAL list
[0002|-1|00] STORE n:1 [RST]
[0003| 0|00] PUSH_SCOPE
[0004| 1|01] LOAD_INT 0
[0005| 1|02] PUSH_STK_LOCAL i
[0006|-1|00] STORE n:1 [RST]
[0007| 1|01] LT Local['i'],1000000
[0008|-1|00] JNT 019 (ins+11)
[0009| 0|00] PUSH_SCOPE
[0010| 1|01] LOAD_LOCAL list
[0011| 0|01] LOAD_OBJ@ITEM push
[0012| 1|02] PUSH_STK_LOCAL i
[0013|-1|00] MEMBER_CALL arg:1 ret:1 [RST]
[0014| 0|00] POP_SCOPE
[0015| 1|01] PUSH_STK_LOCAL i
[0016| 0|00] POST_INC [RST]
[0017| 0|00] JMP 007 (ins-10)
[0018| 0|00] POP_SCOPE
[0019| 0|00] POP_SCOPE
Slow code 2 (60s)
var list=[];
var sum = 0
for (var i in list) {
sum = sum + i
}
Byte code
[0000| 1|01] NEW_ARRAY
[0001| 1|02] PUSH_STK_LOCAL list
[0002|-1|00] STORE n:1 [RST]
[0003| 1|01] LOAD_INT 0
[0004| 1|02] PUSH_STK_LOCAL sum
[0005|-1|00] STORE n:1 [RST]
[0006| 0|00] PUSH_SCOPE
[0007| 1|01] LOAD_LOCAL list
[0008| 1|02] PUSH_STK_LOCAL @_iter_0
[0009| 0|00] IT_INIT [RST]
[0010| 1|01] LOAD_LOCAL @_iter_0
[0011| 0|01] LOAD_OBJ@ITEM _end
[0012| 0|01] MEMBER_CALL arg:0 ret:1
[0013|-1|00] JT 029 (ins+16)
[0014| 0|00] PUSH_SCOPE
[0015| 1|01] LOAD_LOCAL @_iter_0
[0016| 0|01] LOAD_OBJ@ITEM _get
[0017| 0|01] MEMBER_CALL arg:0 ret:1
[0018| 1|02] PUSH_STK_LOCAL i
[0019|-1|00] STORE n:1 [RST]
[0020| 1|01] ADD Local['sum'],Local['i']
[0021| 1|02] PUSH_STK_LOCAL sum
[0022|-1|00] STORE n:1 [RST]
[0023| 0|00] POP_SCOPE
[0024| 1|01] LOAD_LOCAL @_iter_0
[0025| 0|01] LOAD_OBJ@ITEM _next
[0026|-1|00] MEMBER_CALL arg:0 ret:0 [RST]
[0027| 0|00] JMP 010 (ins-17)
[0028| 0|00] POP_SCOPE
[0029| 0|00] POP_SCOPE
Fast code (77ms)
var sum=0;
for (var i=0; i < 1000000; i++) {
sum=sum+i
}
Byte code
[0000| 1|01] LOAD_INT 0
[0001| 1|02] PUSH_STK_LOCAL sum
[0002|-1|00] STORE n:1 [RST]
[0003| 0|00] PUSH_SCOPE
[0004| 1|01] LOAD_INT 0
[0005| 1|02] PUSH_STK_LOCAL i
[0006|-1|00] STORE n:1 [RST]
[0007| 1|01] LT Local['i'],1000000
[0008|-1|00] JNT 018 (ins+10)
[0009| 0|00] PUSH_SCOPE
[0010| 1|01] ADD Local['sum'],Local['i']
[0011| 1|02] PUSH_STK_LOCAL sum
[0012|-1|00] STORE n:1 [RST]
[0013| 0|00] POP_SCOPE
[0014| 1|01] PUSH_STK_LOCAL i
[0015| 0|00] POST_INC [RST]
[0016| 0|00] JMP 007 (ins-9)
[0017| 0|00] POP_SCOPE
[0018| 0|00] POP_SCOPE
The list.push It has improve x20 by doing double size when reaches capacity. It has still an performance issue:
- vm_find_native_function (2.69% overhead of calls): it searchs for the c++ function every time before execute call.
- vm_load_field (34%): It does expensive operations like 'getSymbolMemberFunction' and 'ZS_NEW_OBJECT_MEMBER_FUNCTION'
Has been tested that if we don't use vm_find_native_function it speeds up x2.69. So with this improve,
Lua is fast 0.9/0.04 = x22 times that ZetScript
Has been modified zs_string constructor for fast creation. Furthermore, in the "vm_load_field" it had a CPU overload in searching member symbol. Because instruction value_op2 is not used, it has been used to save the last symbol searched in the instruction. In general the performance has been increased by x4.
So in metrics ZetScript longs 0.6 seconds. In general has been improved by,
64s/0.6s = x106 times faster
And now,
Lua
-----
Lua is fast 1.34s/0.07s = x8 times faster that ZetScript
Wren
-------
Wren is fast 1.34s/0.17s = x3.5 times faster that ZetScript