ARROW-4709: [C++] Optimize for ordered JSON fields
Addresses ARROW-4709.
Replaces RawArrayBuilder<Kind::kObject>'s vector of field builders with a vector of structs containing the builder and name as a string_view. This makes field name indexing random access, enabling checks for ordered fields before hitting the hash map.
Added benefits:
- Removes the expensive
unordered_map->vectortransform inRawArrayBuilder::Finish - Key names passed to
HandlerBase::SetFieldBuilderdon't need to be converted tostd::stringunless a map lookup is necessary
The prediction logic isn't particularly advanced... more could be done to detect worst-case input and skip to the hash map for subsequent keys. Might be worth it, but I'd need to add benchmarks.
https://issues.apache.org/jira/browse/ARROW-4709
Note there are a couple basic benchmarks currently in src/arrow/json/parser_benchmark.cc. You can build them by passing -DARROW_BUILD_BENCHMARKS=ON to CMake (be sure to build in release mode for optimizations). It will create an executable arrow-json-parser-benchmark in the build directory.
(but you may want to write more benchmarks that vary the fields from line to line, for example)
Thanks @benibus . I think it would be nice to first submit the benchmark changes as a separate PR. That way, we'll be able to better compare the performance changes after the optimization lands.
(also, you should rebase/merge from latest git master)
For the record, I think the benchmarks should have another parameter: the proportion of "present" fields. Currently it's implicitly 1.0, i.e. all fields are present in every row; but we should benchmark for a couple different values, e.g. 1.0, 0.9 and 0.1 (i.e. some JSON files will be very "sparse").
I think it would be nice to first submit the benchmark changes as a separate PR.
Will do. Should I keep this one open or convert it to a draft until the separate PR is merged?
Also, I reverted the benchmarks in a new commit rather than drop the old commits ...not sure if that matters. I have a patch for it in any case.
Will do. Should I keep this one open or convert it to a draft until the separate PR is merged?
Converting to draft sounds fine.
Also, I reverted the benchmarks in a new commit rather than drop the old commits ...not sure if that matters. I have a patch for it in any case.
I don't think it matters. In any case you'll have to rebase/merge once the benchmarks are merged :-)
The benchmark PR is now up: https://github.com/apache/arrow/pull/14552.
Rebased on the latest changes.
I compared against master locally and I'm seeing roughly +12/16/25% bytes/sec for 10/100/1000 fields (in the ordered/non-sparse cases).
Right, it seems the speedup is relatively minor. I get these results:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Non-regressions: (39)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
benchmark baseline contender change % counters
ParseJSONFields/ordered:1/schema:1/sparsity:0/num_fields:1000 141.204 MiB/sec 163.772 MiB/sec 15.983 {'family_index': 5, 'per_family_instance_index': 24, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:0/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 23, 'json_size': 4544425.0}
ParseJSONFields/ordered:1/schema:0/sparsity:0/num_fields:1000 137.627 MiB/sec 158.607 MiB/sec 15.244 {'family_index': 5, 'per_family_instance_index': 26, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:0/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 22, 'json_size': 4544425.0}
ParseJSONFields/ordered:1/schema:1/sparsity:0/num_fields:100 149.699 MiB/sec 167.002 MiB/sec 11.559 {'family_index': 5, 'per_family_instance_index': 12, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:0/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 259, 'json_size': 424102.0}
ParseJSONFields/ordered:1/schema:0/sparsity:0/num_fields:100 146.855 MiB/sec 163.307 MiB/sec 11.202 {'family_index': 5, 'per_family_instance_index': 14, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:0/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 253, 'json_size': 424102.0}
ChunkJSONLineDelimited 97.748902 104.282934 6.685 {'family_index': 1, 'per_family_instance_index': 0, 'run_name': 'ChunkJSONLineDelimited', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 7071295, 'json_size': 150361.0}
ParseJSONBlockWithSchema 136.226 MiB/sec 138.375 MiB/sec 1.577 {'family_index': 2, 'per_family_instance_index': 0, 'run_name': 'ParseJSONBlockWithSchema', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 664, 'json_size': 150361.0}
ParseJSONFields/ordered:1/schema:1/sparsity:0/num_fields:10 193.421 MiB/sec 195.560 MiB/sec 1.106 {'family_index': 5, 'per_family_instance_index': 0, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:0/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 291, 'json_size': 483895.0}
ParseJSONFields/ordered:1/schema:0/sparsity:0/num_fields:10 192.316 MiB/sec 194.350 MiB/sec 1.058 {'family_index': 5, 'per_family_instance_index': 2, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:0/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 287, 'json_size': 483895.0}
ParseJSONFields/ordered:0/schema:1/sparsity:0/num_fields:10 179.680 MiB/sec 180.962 MiB/sec 0.714 {'family_index': 5, 'per_family_instance_index': 1, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:0/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 272, 'json_size': 484344.0}
ReadJSONBlockWithSchemaSingleThread 117.169 MiB/sec 117.609 MiB/sec 0.376 {'family_index': 3, 'per_family_instance_index': 0, 'run_name': 'ReadJSONBlockWithSchemaSingleThread', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 6, 'json_size': 15026882.0}
ParseJSONFields/ordered:0/schema:1/sparsity:0/num_fields:100 143.278 MiB/sec 141.908 MiB/sec -0.957 {'family_index': 5, 'per_family_instance_index': 13, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:0/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 248, 'json_size': 424088.0}
ParseJSONFields/ordered:1/schema:1/sparsity:10/num_fields:100 144.507 MiB/sec 142.969 MiB/sec -1.064 {'family_index': 5, 'per_family_instance_index': 16, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:10/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 248, 'json_size': 425955.0}
ParseJSONFields/ordered:0/schema:1/sparsity:10/num_fields:100 139.161 MiB/sec 137.485 MiB/sec -1.204 {'family_index': 5, 'per_family_instance_index': 17, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:10/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 241, 'json_size': 422790.0}
ParseJSONFields/ordered:1/schema:0/sparsity:10/num_fields:10 184.568 MiB/sec 182.084 MiB/sec -1.346 {'family_index': 5, 'per_family_instance_index': 6, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:10/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 281, 'json_size': 482610.0}
ParseJSONFields/ordered:0/schema:0/sparsity:10/num_fields:100 135.707 MiB/sec 133.874 MiB/sec -1.351 {'family_index': 5, 'per_family_instance_index': 19, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:10/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 236, 'json_size': 422790.0}
ParseJSONFields/ordered:1/schema:1/sparsity:10/num_fields:10 190.419 MiB/sec 187.804 MiB/sec -1.373 {'family_index': 5, 'per_family_instance_index': 4, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:10/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 289, 'json_size': 482610.0}
ParseJSONFields/ordered:0/schema:1/sparsity:90/num_fields:10 63.273 MiB/sec 62.386 MiB/sec -1.402 {'family_index': 5, 'per_family_instance_index': 9, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:90/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 87, 'json_size': 530883.0}
ParseJSONFields/ordered:1/schema:1/sparsity:90/num_fields:10 62.917 MiB/sec 62.032 MiB/sec -1.406 {'family_index': 5, 'per_family_instance_index': 8, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:90/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 88, 'json_size': 524228.0}
ParseJSONFields/ordered:1/schema:0/sparsity:10/num_fields:100 141.608 MiB/sec 139.480 MiB/sec -1.503 {'family_index': 5, 'per_family_instance_index': 18, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:10/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 244, 'json_size': 425955.0}
ParseJSONFields/ordered:0/schema:0/sparsity:0/num_fields:100 140.454 MiB/sec 138.317 MiB/sec -1.521 {'family_index': 5, 'per_family_instance_index': 15, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:0/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 244, 'json_size': 424088.0}
ParseJSONFields/ordered:0/schema:1/sparsity:10/num_fields:10 177.692 MiB/sec 174.088 MiB/sec -2.028 {'family_index': 5, 'per_family_instance_index': 5, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:10/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 268, 'json_size': 485740.0}
ParseJSONFields/ordered:1/schema:1/sparsity:10/num_fields:1000 135.271 MiB/sec 132.217 MiB/sec -2.258 {'family_index': 5, 'per_family_instance_index': 28, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:10/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 24, 'json_size': 4085536.0}
ParseJSONFields/ordered:0/schema:0/sparsity:0/num_fields:10 180.369 MiB/sec 176.281 MiB/sec -2.267 {'family_index': 5, 'per_family_instance_index': 3, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:0/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 274, 'json_size': 484344.0}
ParseJSONFields/ordered:0/schema:0/sparsity:90/num_fields:100 48.815 MiB/sec 47.630 MiB/sec -2.427 {'family_index': 5, 'per_family_instance_index': 23, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:90/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 84, 'json_size': 425534.0}
ParseJSONFields/ordered:0/schema:0/sparsity:10/num_fields:10 173.820 MiB/sec 169.515 MiB/sec -2.477 {'family_index': 5, 'per_family_instance_index': 7, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:10/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 262, 'json_size': 485740.0}
ChunkJSONPrettyPrinted 305.981 MiB/sec 298.223 MiB/sec -2.536 {'family_index': 0, 'per_family_instance_index': 0, 'run_name': 'ChunkJSONPrettyPrinted', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1033, 'json_size': 215361.0}
ParseJSONFields/ordered:1/schema:1/sparsity:90/num_fields:100 49.710 MiB/sec 48.375 MiB/sec -2.685 {'family_index': 5, 'per_family_instance_index': 20, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:90/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 84, 'json_size': 430278.0}
ParseJSONFields/ordered:0/schema:1/sparsity:90/num_fields:100 49.334 MiB/sec 47.915 MiB/sec -2.878 {'family_index': 5, 'per_family_instance_index': 21, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:90/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 85, 'json_size': 425534.0}
ParseJSONFields/ordered:0/schema:1/sparsity:10/num_fields:1000 130.316 MiB/sec 126.470 MiB/sec -2.951 {'family_index': 5, 'per_family_instance_index': 29, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:10/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 23, 'json_size': 4088946.0}
ParseJSONFields/ordered:0/schema:1/sparsity:0/num_fields:1000 134.779 MiB/sec 130.799 MiB/sec -2.953 {'family_index': 5, 'per_family_instance_index': 25, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:0/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 22, 'json_size': 4546025.0}
ParseJSONFields/ordered:1/schema:0/sparsity:90/num_fields:100 49.652 MiB/sec 48.147 MiB/sec -3.031 {'family_index': 5, 'per_family_instance_index': 22, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:90/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 85, 'json_size': 430278.0}
ParseJSONFields/ordered:0/schema:0/sparsity:90/num_fields:10 62.658 MiB/sec 60.611 MiB/sec -3.267 {'family_index': 5, 'per_family_instance_index': 11, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:90/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 86, 'json_size': 530883.0}
ParseJSONFields/ordered:1/schema:0/sparsity:10/num_fields:1000 131.295 MiB/sec 126.847 MiB/sec -3.388 {'family_index': 5, 'per_family_instance_index': 30, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:10/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 24, 'json_size': 4085536.0}
ParseJSONFields/ordered:0/schema:0/sparsity:10/num_fields:1000 126.625 MiB/sec 121.335 MiB/sec -4.178 {'family_index': 5, 'per_family_instance_index': 31, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:10/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 23, 'json_size': 4088946.0}
ParseJSONFields/ordered:1/schema:0/sparsity:90/num_fields:1000 43.432 MiB/sec 41.614 MiB/sec -4.187 {'family_index': 5, 'per_family_instance_index': 34, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:90/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 70, 'json_size': 457089.0}
ParseJSONFields/ordered:0/schema:0/sparsity:0/num_fields:1000 131.544 MiB/sec 125.765 MiB/sec -4.393 {'family_index': 5, 'per_family_instance_index': 27, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:0/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 21, 'json_size': 4546025.0}
ParseJSONFields/ordered:0/schema:1/sparsity:90/num_fields:1000 43.140 MiB/sec 41.204 MiB/sec -4.488 {'family_index': 5, 'per_family_instance_index': 33, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:90/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 70, 'json_size': 454665.0}
ParseJSONFields/ordered:1/schema:1/sparsity:90/num_fields:1000 43.505 MiB/sec 41.523 MiB/sec -4.556 {'family_index': 5, 'per_family_instance_index': 32, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:90/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 70, 'json_size': 457089.0}
ParseJSONFields/ordered:0/schema:0/sparsity:90/num_fields:1000 43.536 MiB/sec 41.527 MiB/sec -4.615 {'family_index': 5, 'per_family_instance_index': 35, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:90/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 70, 'json_size': 454665.0}
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Regressions: (2)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
benchmark baseline contender change % counters
ParseJSONFields/ordered:1/schema:0/sparsity:90/num_fields:10 63.104 MiB/sec 60.157 MiB/sec -4.670 {'family_index': 5, 'per_family_instance_index': 10, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:90/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 88, 'json_size': 524228.0}
ReadJSONBlockWithSchemaMultiThread/real_time 752.152 MiB/sec 695.624 MiB/sec -7.515 {'family_index': 4, 'per_family_instance_index': 0, 'run_name': 'ReadJSONBlockWithSchemaMultiThread/real_time', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 36, 'json_size': 15026882.0}
You shouldn't even need a vector since the indices are already implicitly sequential, just a tracked index.
In retrospect, the list was an artifact from a time before BuildContext, where I was trying to store strings inline with their index - or something to that effect.
Thanks. Here are the updated benchmark numbers that I get:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Non-regressions: (40)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
benchmark baseline contender change % counters
ParseJSONFields/ordered:1/schema:0/sparsity:0/num_fields:1000 137.627 MiB/sec 166.086 MiB/sec 20.679 {'family_index': 5, 'per_family_instance_index': 26, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:0/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 22, 'json_size': 4544425.0}
ParseJSONFields/ordered:1/schema:1/sparsity:0/num_fields:1000 141.204 MiB/sec 167.385 MiB/sec 18.542 {'family_index': 5, 'per_family_instance_index': 24, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:0/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 23, 'json_size': 4544425.0}
ParseJSONFields/ordered:1/schema:0/sparsity:0/num_fields:100 146.855 MiB/sec 168.656 MiB/sec 14.845 {'family_index': 5, 'per_family_instance_index': 14, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:0/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 253, 'json_size': 424102.0}
ParseJSONFields/ordered:1/schema:1/sparsity:0/num_fields:100 149.699 MiB/sec 170.197 MiB/sec 13.693 {'family_index': 5, 'per_family_instance_index': 12, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:0/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 259, 'json_size': 424102.0}
ChunkJSONLineDelimited 97.748902 102.645837 5.010 {'family_index': 1, 'per_family_instance_index': 0, 'run_name': 'ChunkJSONLineDelimited', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 7071295, 'json_size': 150361.0}
ParseJSONFields/ordered:1/schema:0/sparsity:0/num_fields:10 192.316 MiB/sec 197.609 MiB/sec 2.752 {'family_index': 5, 'per_family_instance_index': 2, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:0/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 287, 'json_size': 483895.0}
ParseJSONFields/ordered:1/schema:1/sparsity:0/num_fields:10 193.421 MiB/sec 198.712 MiB/sec 2.736 {'family_index': 5, 'per_family_instance_index': 0, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:0/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 291, 'json_size': 483895.0}
ParseJSONBlockWithSchema 136.226 MiB/sec 137.839 MiB/sec 1.184 {'family_index': 2, 'per_family_instance_index': 0, 'run_name': 'ParseJSONBlockWithSchema', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 664, 'json_size': 150361.0}
ChunkJSONPrettyPrinted 305.981 MiB/sec 308.826 MiB/sec 0.930 {'family_index': 0, 'per_family_instance_index': 0, 'run_name': 'ChunkJSONPrettyPrinted', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1033, 'json_size': 215361.0}
ReadJSONBlockWithSchemaMultiThread/real_time 752.152 MiB/sec 758.075 MiB/sec 0.787 {'family_index': 4, 'per_family_instance_index': 0, 'run_name': 'ReadJSONBlockWithSchemaMultiThread/real_time', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 36, 'json_size': 15026882.0}
ReadJSONBlockWithSchemaSingleThread 117.169 MiB/sec 117.971 MiB/sec 0.685 {'family_index': 3, 'per_family_instance_index': 0, 'run_name': 'ReadJSONBlockWithSchemaSingleThread', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 6, 'json_size': 15026882.0}
ParseJSONFields/ordered:1/schema:0/sparsity:10/num_fields:10 184.568 MiB/sec 184.871 MiB/sec 0.164 {'family_index': 5, 'per_family_instance_index': 6, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:10/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 281, 'json_size': 482610.0}
ParseJSONFields/ordered:0/schema:0/sparsity:10/num_fields:100 135.707 MiB/sec 135.288 MiB/sec -0.309 {'family_index': 5, 'per_family_instance_index': 19, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:10/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 236, 'json_size': 422790.0}
ParseJSONFields/ordered:0/schema:1/sparsity:0/num_fields:10 179.680 MiB/sec 179.073 MiB/sec -0.338 {'family_index': 5, 'per_family_instance_index': 1, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:0/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 272, 'json_size': 484344.0}
ParseJSONFields/ordered:0/schema:0/sparsity:10/num_fields:10 173.820 MiB/sec 172.981 MiB/sec -0.483 {'family_index': 5, 'per_family_instance_index': 7, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:10/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 262, 'json_size': 485740.0}
ParseJSONFields/ordered:0/schema:0/sparsity:0/num_fields:10 180.369 MiB/sec 179.352 MiB/sec -0.564 {'family_index': 5, 'per_family_instance_index': 3, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:0/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 274, 'json_size': 484344.0}
ParseJSONFields/ordered:0/schema:0/sparsity:0/num_fields:100 140.454 MiB/sec 139.630 MiB/sec -0.587 {'family_index': 5, 'per_family_instance_index': 15, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:0/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 244, 'json_size': 424088.0}
ParseJSONFields/ordered:1/schema:0/sparsity:10/num_fields:100 141.608 MiB/sec 140.558 MiB/sec -0.741 {'family_index': 5, 'per_family_instance_index': 18, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:10/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 244, 'json_size': 425955.0}
ParseJSONFields/ordered:1/schema:0/sparsity:10/num_fields:1000 131.295 MiB/sec 129.469 MiB/sec -1.390 {'family_index': 5, 'per_family_instance_index': 30, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:10/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 24, 'json_size': 4085536.0}
ParseJSONFields/ordered:0/schema:0/sparsity:10/num_fields:1000 126.625 MiB/sec 124.593 MiB/sec -1.605 {'family_index': 5, 'per_family_instance_index': 31, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:10/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 23, 'json_size': 4088946.0}
ParseJSONFields/ordered:0/schema:0/sparsity:90/num_fields:10 62.658 MiB/sec 61.562 MiB/sec -1.750 {'family_index': 5, 'per_family_instance_index': 11, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:90/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 86, 'json_size': 530883.0}
ParseJSONFields/ordered:1/schema:0/sparsity:90/num_fields:10 63.104 MiB/sec 61.941 MiB/sec -1.842 {'family_index': 5, 'per_family_instance_index': 10, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:90/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 88, 'json_size': 524228.0}
ParseJSONFields/ordered:0/schema:0/sparsity:0/num_fields:1000 131.544 MiB/sec 128.895 MiB/sec -2.014 {'family_index': 5, 'per_family_instance_index': 27, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:0/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 21, 'json_size': 4546025.0}
ParseJSONFields/ordered:0/schema:1/sparsity:10/num_fields:10 177.692 MiB/sec 174.002 MiB/sec -2.076 {'family_index': 5, 'per_family_instance_index': 5, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:10/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 268, 'json_size': 485740.0}
ParseJSONFields/ordered:1/schema:1/sparsity:90/num_fields:10 62.917 MiB/sec 61.534 MiB/sec -2.198 {'family_index': 5, 'per_family_instance_index': 8, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:90/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 88, 'json_size': 524228.0}
ParseJSONFields/ordered:1/schema:1/sparsity:10/num_fields:100 144.507 MiB/sec 141.261 MiB/sec -2.246 {'family_index': 5, 'per_family_instance_index': 16, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:10/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 248, 'json_size': 425955.0}
ParseJSONFields/ordered:1/schema:1/sparsity:10/num_fields:10 190.419 MiB/sec 186.112 MiB/sec -2.262 {'family_index': 5, 'per_family_instance_index': 4, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:10/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 289, 'json_size': 482610.0}
ParseJSONFields/ordered:0/schema:1/sparsity:90/num_fields:10 63.273 MiB/sec 61.837 MiB/sec -2.268 {'family_index': 5, 'per_family_instance_index': 9, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:90/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 87, 'json_size': 530883.0}
ParseJSONFields/ordered:0/schema:0/sparsity:90/num_fields:100 48.815 MiB/sec 47.622 MiB/sec -2.443 {'family_index': 5, 'per_family_instance_index': 23, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:90/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 84, 'json_size': 425534.0}
ParseJSONFields/ordered:0/schema:1/sparsity:0/num_fields:100 143.278 MiB/sec 139.759 MiB/sec -2.456 {'family_index': 5, 'per_family_instance_index': 13, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:0/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 248, 'json_size': 424088.0}
ParseJSONFields/ordered:0/schema:1/sparsity:0/num_fields:1000 134.779 MiB/sec 131.456 MiB/sec -2.465 {'family_index': 5, 'per_family_instance_index': 25, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:0/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 22, 'json_size': 4546025.0}
ParseJSONFields/ordered:0/schema:1/sparsity:10/num_fields:100 139.161 MiB/sec 135.641 MiB/sec -2.529 {'family_index': 5, 'per_family_instance_index': 17, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:10/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 241, 'json_size': 422790.0}
ParseJSONFields/ordered:1/schema:0/sparsity:90/num_fields:100 49.652 MiB/sec 48.377 MiB/sec -2.568 {'family_index': 5, 'per_family_instance_index': 22, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:90/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 85, 'json_size': 430278.0}
ParseJSONFields/ordered:1/schema:1/sparsity:10/num_fields:1000 135.271 MiB/sec 131.763 MiB/sec -2.593 {'family_index': 5, 'per_family_instance_index': 28, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:10/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 24, 'json_size': 4085536.0}
ParseJSONFields/ordered:0/schema:1/sparsity:10/num_fields:1000 130.316 MiB/sec 126.775 MiB/sec -2.717 {'family_index': 5, 'per_family_instance_index': 29, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:10/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 23, 'json_size': 4088946.0}
ParseJSONFields/ordered:1/schema:1/sparsity:90/num_fields:100 49.710 MiB/sec 48.254 MiB/sec -2.928 {'family_index': 5, 'per_family_instance_index': 20, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:90/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 84, 'json_size': 430278.0}
ParseJSONFields/ordered:0/schema:1/sparsity:90/num_fields:100 49.334 MiB/sec 47.731 MiB/sec -3.250 {'family_index': 5, 'per_family_instance_index': 21, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:90/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 85, 'json_size': 425534.0}
ParseJSONFields/ordered:1/schema:0/sparsity:90/num_fields:1000 43.432 MiB/sec 41.727 MiB/sec -3.926 {'family_index': 5, 'per_family_instance_index': 34, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:90/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 70, 'json_size': 457089.0}
ParseJSONFields/ordered:0/schema:0/sparsity:90/num_fields:1000 43.536 MiB/sec 41.661 MiB/sec -4.306 {'family_index': 5, 'per_family_instance_index': 35, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:90/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 70, 'json_size': 454665.0}
ParseJSONFields/ordered:1/schema:1/sparsity:90/num_fields:1000 43.505 MiB/sec 41.598 MiB/sec -4.383 {'family_index': 5, 'per_family_instance_index': 32, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:90/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 70, 'json_size': 457089.0}
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Regressions: (1)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
benchmark baseline contender change % counters
ParseJSONFields/ordered:0/schema:1/sparsity:90/num_fields:1000 43.140 MiB/sec 41.202 MiB/sec -4.492 {'family_index': 5, 'per_family_instance_index': 33, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:90/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 70, 'json_size': 454665.0}
The AWS-related test failures are unrelated to this PR.
Benchmark runs are scheduled for baseline = 1a9b1e859126dc12e69eaf8852c6bd103b421ea5 and contender = 21309eaaeb6b2e7f4d4987830df23d5d711ee409. 21309eaaeb6b2e7f4d4987830df23d5d711ee409 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished :arrow_down:0.0% :arrow_up:0.0%] ec2-t3-xlarge-us-east-2
[Finished :arrow_down:0.37% :arrow_up:0.64%] test-mac-arm
[Finished :arrow_down:0.0% :arrow_up:0.0%] ursa-i9-9960x
[Finished :arrow_down:0.24% :arrow_up:0.31%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 21309eaa ec2-t3-xlarge-us-east-2
[Finished] 21309eaa test-mac-arm
[Finished] 21309eaa ursa-i9-9960x
[Finished] 21309eaa ursa-thinkcentre-m75q
[Finished] 1a9b1e85 ec2-t3-xlarge-us-east-2
[Finished] 1a9b1e85 test-mac-arm
[Finished] 1a9b1e85 ursa-i9-9960x
[Finished] 1a9b1e85 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java