[Spike] Run opcode benchmark, re-calculate opcode costs
Objective: To run opcode benchmarks and recalculate opcode costs on the Casper Network to ensure optimal performance and accurate gas costs.
To maintain the efficiency and fairness of gas consumption on the Casper Network, we need to run benchmarks for all supported opcodes and recalculate their associated costs. This process will ensure that the gas fees accurately reflect the computational resources used by each opcode, improving network performance and reducing the likelihood of under- or overcharging for operations.
Hi @mpapierski , As part of backlog clean-up on the 13th of Sep, Ed advised me to assign this ticket to you.
Sidebar for tomorrow.
Results of most basic benchmark (executing 1000 * 7000 nops) in comparison:
| Gas injector | Stack limiter | Time | Throughput | Comment |
|---|---|---|---|---|
| false | false | 71.655 µs |
90.024 Gelem/s |
baseline, nops only |
| false | true | 81.540 µs |
85.848 Gelem/s |
compared to baseline: time +8% thrpt: -7.9% |
| true | false | 224.49 µs |
31.181 Gelem/s |
compared to baseline: time +148.02%, thrpt: -59.680% |
| true | true | 212.79 µs |
32.896 Gelem/s |
time +135.09% thrpt: -57.463% change within noise compared to row above |
Observation: The stack height limiter does not change the outcome as much in this benchmark due to the nature of this modification: it modifies function calls, not blocks of code. Where gas injector works with blocks of codes which makes it more noticeable.
The issue I think is with the stack limiter is that it is hard to benchmark because it's cost is amortizing over the number of expected opcodes, but when running on real wasm it is more noticeable due to the depth of calls.
The code written is made into a separate repo and can be reproduced easily and tweaked more. It also has calculated opcode costs for both wasmi and wasmer VMs. The validation logic checks the set of CPU intensive Wasms for run time and gas costs (3.3k CSPR + 16384ms) and the run times are below the target time but it does not hit the time target exactly as expected (i.e. 15s != 16.3s) etc. @AlexanderLimonov @fizyk20 we should meet and review the methodology together
@mpapierski I should generally be available for meetings anytime during regular US Central working hours next week, with some exceptions
After meeting with @fizyk20
- Zug should handle slower validator nodes just fine
- We should calculate opcodes for 16s. Consensus has an additional timeout for proposals that's ~30s
- If the concern is that the benchmark is imperfect and the pessimistic validation Wasms run at different times (i.e. 10s, 15s, etc.) we should measure gas per second and measure also for happy paths (i.e. fib(1), fib(2), ..., fib(50)) and analyze the distribution. Measured and calculated opcodes should be fine as long as the median of the gas per second metric is stable.
Closing this spike story as the objective is achieved. A follow-up user story #4951 has been created to proceed with the analysis and design based on the findings.