feat: optimize to_radix
When converting ToRadix, the items are outputted with field type. Knowing that, brillig_gen codegens a loop of casts of the items to the desired type. Noir expects ToRadix to output bytes, and ToBits to output bits. So it sometimes codegens casts to u8, and other times to u1. The problem is that for a u1 cast the transpiler transpiles it to another ToRadix call, so in the case of ToBits, we are doing N extra decompositions (due to the u1 casting), where N is the number of bits. For example, for a decomposition of a Field, we'd have 255 toradix calls executed
This PR follows this approach:
- We can change the ToRadix gadget/blackbox to emit u8 limbs instead of fields
- We modify the toradix blackbox in brillig with an output_bits flag, to emit u1 limbs
- No casting is needed in either case (u8 or u1) saving some emitted brillig opcodes
- The AVM transpiler, then ignores the output_bits flag, since it'll output u8s which is what the AVM expects for bits
Benchmark results
Metrics with a significant change:
- avm_simulation_time_ms (Token:mint_public): 45.6 (-34%)
- avm_simulation_time_ms (Token:assert_minter_and_mint): 65.9 (+52%)
- avm_simulation_time_ms (Token:transfer_public): 27.4 (+35%)
Detailed results
All benchmarks are run on txs on the Benchmarking contract on the repository. Each tx consists of a batch call to create_note and increment_balance, which guarantees that each tx has a private call, a nested private call, a public call, and a nested public call, as well as an emitted private note, an unencrypted log, and public storage read and write.
This benchmark source data is available in JSON format on S3 here.
Proof generation
Each column represents the number of threads used in proof generation.
| Metric | 1 threads | 4 threads | 16 threads | 32 threads | 64 threads |
|---|---|---|---|---|---|
| proof_construction_time_sha256_ms | 5,786 | 1,568 | 713 | 774 (+3%) | 781 |
| proof_construction_time_sha256_30_ms | 11,621 (+1%) | 3,097 (-1%) | 1,380 | 1,430 | 1,467 |
| proof_construction_time_sha256_100_ms | 44,259 | 11,863 | 5,429 | 5,552 (+2%) | 5,974 (+4%) |
| proof_construction_time_poseidon_hash_ms | 79.0 (+1%) | 34.0 | 34.0 | 58.0 (+2%) | 87.0 (-1%) |
| proof_construction_time_poseidon_hash_30_ms | 1,534 | 423 | 202 | 230 (+1%) | 274 (+1%) |
| proof_construction_time_poseidon_hash_100_ms | 5,675 | 1,525 (+1%) | 681 (+1%) | 742 | 744 (-1%) |
L2 block published to L1
Each column represents the number of txs on an L2 block published to L1.
| Metric | 4 txs | 8 txs | 16 txs |
|---|---|---|---|
| l1_rollup_calldata_size_in_bytes | 4,356 | 7,876 | 14,884 |
| l1_rollup_calldata_gas | 50,208 | 92,930 | 177,964 |
| l1_rollup_execution_gas | 1,396,459 | 2,130,077 | 3,915,167 |
| l2_block_processing_time_in_ms | 257 (+1%) | 450 (+2%) | 813 (+1%) |
| l2_block_building_time_in_ms | 11,337 (-1%) | 22,116 (-2%) | 44,056 (-1%) |
| l2_block_rollup_simulation_time_in_ms | 11,336 (-1%) | 22,116 (-2%) | 44,056 (-1%) |
| l2_block_public_tx_process_time_in_ms | 9,609 (-1%) | 20,324 (-1%) | 42,226 (-1%) |
L2 chain processing
Each column represents the number of blocks on the L2 chain where each block has 8 txs.
| Metric | 3 blocks | 5 blocks |
|---|---|---|
| node_history_sync_time_in_ms | 2,999 (+1%) | 3,886 (+2%) |
| node_database_size_in_bytes | 12,636,240 | 16,658,512 |
| pxe_database_size_in_bytes | 16,254 | 26,813 |
Circuits stats
Stats on running time and I/O sizes collected for every kernel circuit run across all benchmarks.
| Circuit | simulation_time_in_ms | witness_generation_time_in_ms | input_size_in_bytes | output_size_in_bytes | proving_time_in_ms |
|---|---|---|---|---|---|
| private-kernel-init | 90.5 | 377 | 21,755 | 44,860 | N/A |
| private-kernel-inner | 167 (-1%) | 690 (+3%) | 72,566 | 45,007 | N/A |
| private-kernel-reset-tiny | 307 | 686 (+1%) | 65,590 | 44,846 | N/A |
| private-kernel-tail | 163 (+1%) | 135 (+5%) | 50,643 | 52,257 | N/A |
| base-parity | 5.59 (+1%) | N/A | 160 | 96.0 | N/A |
| root-parity | 35.6 | N/A | 73,948 | 96.0 | N/A |
| base-rollup | 3,433 (-2%) | N/A | 189,136 | 664 | N/A |
| block-root-rollup | 41.3 | N/A | 58,205 | 2,448 | N/A |
| public-kernel-setup | 82.5 (-1%) | N/A | 105,085 | 71,222 | N/A |
| public-kernel-app-logic | 95.1 | N/A | 104,911 | 71,222 | N/A |
| public-kernel-tail | 853 (-2%) | N/A | 390,582 | 16,414 | N/A |
| private-kernel-reset-small | 304 | N/A | 66,341 | 45,629 | N/A |
| private-kernel-tail-to-public | 1,355 (-1%) | 613 (+5%) | 455,020 | 1,825 | N/A |
| public-kernel-teardown | 82.2 (+1%) | N/A | 105,349 | 71,222 | N/A |
| merge-rollup | 20.2 (-3%) | N/A | 38,174 | 664 | N/A |
| undefined | N/A | N/A | N/A | N/A | 83,676 (-1%) |
Stats on running time collected for app circuits
| Function | input_size_in_bytes | output_size_in_bytes | witness_generation_time_in_ms |
|---|---|---|---|
| ContractClassRegisterer:register | 1,344 | 11,731 | 347 (+2%) |
| ContractInstanceDeployer:deploy | 1,408 | 11,731 | 18.3 (+1%) |
| MultiCallEntrypoint:entrypoint | 1,920 | 11,731 | 406 (+1%) |
| FeeJuice:deploy | 1,376 | 11,731 | 398 (+4%) |
| SchnorrAccount:constructor | 1,312 | 11,731 | 73.1 (-1%) |
| SchnorrAccount:entrypoint | 2,304 | 11,731 | 397 |
| Token:privately_mint_private_note | 1,280 | 11,731 | 101 (-2%) |
| FPC:fee_entrypoint_public | 1,344 | 11,731 | 25.3 (-5%) |
| Token:transfer | 1,312 | 11,731 | 237 (+4%) |
| Benchmarking:create_note | 1,344 | 11,731 | 86.5 (-2%) |
| SchnorrAccount:verify_private_authwit | 1,280 | 11,731 | 27.3 (-13%) |
| Token:unshield | 1,376 | 11,731 | 528 |
| FPC:fee_entrypoint_private | 1,376 | 11,731 | 743 (-1%) |
AVM Simulation
Time to simulate various public functions in the AVM.
| Function | time_ms | bytecode_size_in_bytes |
|---|---|---|
| FeeJuice:_increase_public_balance | 56.8 (+4%) | 7,739 |
| FeeJuice:set_portal | 9.56 (-7%) | 2,354 |
| Token:constructor | 94.8 (+18%) | 26,051 |
| FPC:constructor | 55.5 (+2%) | 18,001 |
| FeeJuice:mint_public | 39.4 (+1%) | 5,877 |
| Token:mint_public | :warning: 45.6 (-34%) | 10,917 |
| Token:assert_minter_and_mint | :warning: 65.9 (+52%) | 7,512 |
| AuthRegistry:set_authorized | 46.3 | 4,391 |
| FPC:prepare_fee | 226 (-3%) | 7,043 |
| Token:transfer_public | :warning: 27.4 (+35%) | 39,302 |
| FPC:pay_refund | 57.5 | 9,986 (-2%) |
| Benchmarking:increment_balance | 1,190 (-2%) | 6,563 |
| Token:_increase_public_balance | 40.3 (-1%) | 8,433 |
| FPC:pay_refund_with_shielded_rebate | 60.8 (-8%) | 10,535 (-2%) |
Public DB Access
Time to access various public DBs.
| Function | time_ms |
|---|---|
| get-nullifier-index | 0.165 (-1%) |
Tree insertion stats
The duration to insert a fixed batch of leaves into each tree type.
| Metric | 1 leaves | 16 leaves | 64 leaves | 128 leaves | 256 leaves | 512 leaves | 1024 leaves |
|---|---|---|---|---|---|---|---|
| batch_insert_into_append_only_tree_16_depth_ms | 2.18 | 3.88 (+1%) | N/A | N/A | N/A | N/A | N/A |
| batch_insert_into_append_only_tree_16_depth_hash_count | 16.8 | 31.7 | N/A | N/A | N/A | N/A | N/A |
| batch_insert_into_append_only_tree_16_depth_hash_ms | 0.112 (-1%) | 0.109 (+1%) | N/A | N/A | N/A | N/A | N/A |
| batch_insert_into_append_only_tree_32_depth_ms | N/A | N/A | 11.5 (+2%) | 17.8 (-1%) | 31.0 (+1%) | 60.6 (+1%) | 114 (+1%) |
| batch_insert_into_append_only_tree_32_depth_hash_count | N/A | N/A | 95.9 | 159 | 287 | 543 | 1,055 |
| batch_insert_into_append_only_tree_32_depth_hash_ms | N/A | N/A | 0.110 (+2%) | 0.103 (-1%) | 0.101 (+1%) | 0.104 (+1%) | 0.102 (+1%) |
| batch_insert_into_indexed_tree_20_depth_ms | N/A | N/A | 14.3 (+1%) | 26.3 (+4%) | 43.7 (+1%) | 84.4 (+2%) | 164 (+2%) |
| batch_insert_into_indexed_tree_20_depth_hash_count | N/A | N/A | 109 | 207 | 355 | 691 | 1,363 |
| batch_insert_into_indexed_tree_20_depth_hash_ms | N/A | N/A | 0.108 (+1%) | 0.106 (+3%) | 0.106 (+1%) | 0.104 (+1%) | 0.104 (+3%) |
| batch_insert_into_indexed_tree_40_depth_ms | N/A | N/A | 16.8 (+1%) | N/A | N/A | N/A | N/A |
| batch_insert_into_indexed_tree_40_depth_hash_count | N/A | N/A | 132 | N/A | N/A | N/A | N/A |
| batch_insert_into_indexed_tree_40_depth_hash_ms | N/A | N/A | 0.107 (+1%) | N/A | N/A | N/A | N/A |
Miscellaneous
Transaction sizes based on how many contract classes are registered in the tx.
| Metric | 0 registered classes | 1 registered classes |
|---|---|---|
| tx_size_in_bytes | 64,779 | 668,997 |
Transaction size based on fee payment method
| Metric | | | - | |