arrow icon indicating copy to clipboard operation
arrow copied to clipboard

ARROW-4709: [C++] Optimize for ordered JSON fields

Open benibus opened this issue 3 years ago • 6 comments

Addresses ARROW-4709.

Replaces RawArrayBuilder<Kind::kObject>'s vector of field builders with a vector of structs containing the builder and name as a string_view. This makes field name indexing random access, enabling checks for ordered fields before hitting the hash map.

Added benefits:

  • Removes the expensive unordered_map -> vector transform in RawArrayBuilder::Finish
  • Key names passed to HandlerBase::SetFieldBuilder don't need to be converted to std::string unless a map lookup is necessary

The prediction logic isn't particularly advanced... more could be done to detect worst-case input and skip to the hash map for subsequent keys. Might be worth it, but I'd need to add benchmarks.

benibus avatar Sep 12 '22 18:09 benibus

https://issues.apache.org/jira/browse/ARROW-4709

github-actions[bot] avatar Sep 12 '22 18:09 github-actions[bot]

Note there are a couple basic benchmarks currently in src/arrow/json/parser_benchmark.cc. You can build them by passing -DARROW_BUILD_BENCHMARKS=ON to CMake (be sure to build in release mode for optimizations). It will create an executable arrow-json-parser-benchmark in the build directory.

(but you may want to write more benchmarks that vary the fields from line to line, for example)

pitrou avatar Sep 14 '22 15:09 pitrou

Thanks @benibus . I think it would be nice to first submit the benchmark changes as a separate PR. That way, we'll be able to better compare the performance changes after the optimization lands.

(also, you should rebase/merge from latest git master)

pitrou avatar Oct 27 '22 11:10 pitrou

For the record, I think the benchmarks should have another parameter: the proportion of "present" fields. Currently it's implicitly 1.0, i.e. all fields are present in every row; but we should benchmark for a couple different values, e.g. 1.0, 0.9 and 0.1 (i.e. some JSON files will be very "sparse").

pitrou avatar Oct 27 '22 11:10 pitrou

I think it would be nice to first submit the benchmark changes as a separate PR.

Will do. Should I keep this one open or convert it to a draft until the separate PR is merged?

Also, I reverted the benchmarks in a new commit rather than drop the old commits ...not sure if that matters. I have a patch for it in any case.

benibus avatar Oct 27 '22 18:10 benibus

Will do. Should I keep this one open or convert it to a draft until the separate PR is merged?

Converting to draft sounds fine.

Also, I reverted the benchmarks in a new commit rather than drop the old commits ...not sure if that matters. I have a patch for it in any case.

I don't think it matters. In any case you'll have to rebase/merge once the benchmarks are merged :-)

pitrou avatar Oct 27 '22 19:10 pitrou

The benchmark PR is now up: https://github.com/apache/arrow/pull/14552.

benibus avatar Nov 01 '22 17:11 benibus

Rebased on the latest changes.

I compared against master locally and I'm seeing roughly +12/16/25% bytes/sec for 10/100/1000 fields (in the ordered/non-sparse cases).

benibus avatar Nov 14 '22 23:11 benibus

Right, it seems the speedup is relatively minor. I get these results:

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Non-regressions: (39)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                     benchmark        baseline       contender  change %                                                                                                                                                                                                                            counters
 ParseJSONFields/ordered:1/schema:1/sparsity:0/num_fields:1000 141.204 MiB/sec 163.772 MiB/sec    15.983  {'family_index': 5, 'per_family_instance_index': 24, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:0/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 23, 'json_size': 4544425.0}
 ParseJSONFields/ordered:1/schema:0/sparsity:0/num_fields:1000 137.627 MiB/sec 158.607 MiB/sec    15.244  {'family_index': 5, 'per_family_instance_index': 26, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:0/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 22, 'json_size': 4544425.0}
  ParseJSONFields/ordered:1/schema:1/sparsity:0/num_fields:100 149.699 MiB/sec 167.002 MiB/sec    11.559   {'family_index': 5, 'per_family_instance_index': 12, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:0/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 259, 'json_size': 424102.0}
  ParseJSONFields/ordered:1/schema:0/sparsity:0/num_fields:100 146.855 MiB/sec 163.307 MiB/sec    11.202   {'family_index': 5, 'per_family_instance_index': 14, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:0/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 253, 'json_size': 424102.0}
                                        ChunkJSONLineDelimited       97.748902      104.282934     6.685                                      {'family_index': 1, 'per_family_instance_index': 0, 'run_name': 'ChunkJSONLineDelimited', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 7071295, 'json_size': 150361.0}
                                      ParseJSONBlockWithSchema 136.226 MiB/sec 138.375 MiB/sec     1.577                                        {'family_index': 2, 'per_family_instance_index': 0, 'run_name': 'ParseJSONBlockWithSchema', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 664, 'json_size': 150361.0}
   ParseJSONFields/ordered:1/schema:1/sparsity:0/num_fields:10 193.421 MiB/sec 195.560 MiB/sec     1.106     {'family_index': 5, 'per_family_instance_index': 0, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:0/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 291, 'json_size': 483895.0}
   ParseJSONFields/ordered:1/schema:0/sparsity:0/num_fields:10 192.316 MiB/sec 194.350 MiB/sec     1.058     {'family_index': 5, 'per_family_instance_index': 2, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:0/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 287, 'json_size': 483895.0}
   ParseJSONFields/ordered:0/schema:1/sparsity:0/num_fields:10 179.680 MiB/sec 180.962 MiB/sec     0.714     {'family_index': 5, 'per_family_instance_index': 1, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:0/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 272, 'json_size': 484344.0}
                           ReadJSONBlockWithSchemaSingleThread 117.169 MiB/sec 117.609 MiB/sec     0.376                             {'family_index': 3, 'per_family_instance_index': 0, 'run_name': 'ReadJSONBlockWithSchemaSingleThread', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 6, 'json_size': 15026882.0}
  ParseJSONFields/ordered:0/schema:1/sparsity:0/num_fields:100 143.278 MiB/sec 141.908 MiB/sec    -0.957   {'family_index': 5, 'per_family_instance_index': 13, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:0/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 248, 'json_size': 424088.0}
 ParseJSONFields/ordered:1/schema:1/sparsity:10/num_fields:100 144.507 MiB/sec 142.969 MiB/sec    -1.064  {'family_index': 5, 'per_family_instance_index': 16, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:10/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 248, 'json_size': 425955.0}
 ParseJSONFields/ordered:0/schema:1/sparsity:10/num_fields:100 139.161 MiB/sec 137.485 MiB/sec    -1.204  {'family_index': 5, 'per_family_instance_index': 17, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:10/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 241, 'json_size': 422790.0}
  ParseJSONFields/ordered:1/schema:0/sparsity:10/num_fields:10 184.568 MiB/sec 182.084 MiB/sec    -1.346    {'family_index': 5, 'per_family_instance_index': 6, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:10/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 281, 'json_size': 482610.0}
 ParseJSONFields/ordered:0/schema:0/sparsity:10/num_fields:100 135.707 MiB/sec 133.874 MiB/sec    -1.351  {'family_index': 5, 'per_family_instance_index': 19, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:10/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 236, 'json_size': 422790.0}
  ParseJSONFields/ordered:1/schema:1/sparsity:10/num_fields:10 190.419 MiB/sec 187.804 MiB/sec    -1.373    {'family_index': 5, 'per_family_instance_index': 4, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:10/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 289, 'json_size': 482610.0}
  ParseJSONFields/ordered:0/schema:1/sparsity:90/num_fields:10  63.273 MiB/sec  62.386 MiB/sec    -1.402     {'family_index': 5, 'per_family_instance_index': 9, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:90/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 87, 'json_size': 530883.0}
  ParseJSONFields/ordered:1/schema:1/sparsity:90/num_fields:10  62.917 MiB/sec  62.032 MiB/sec    -1.406     {'family_index': 5, 'per_family_instance_index': 8, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:90/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 88, 'json_size': 524228.0}
 ParseJSONFields/ordered:1/schema:0/sparsity:10/num_fields:100 141.608 MiB/sec 139.480 MiB/sec    -1.503  {'family_index': 5, 'per_family_instance_index': 18, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:10/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 244, 'json_size': 425955.0}
  ParseJSONFields/ordered:0/schema:0/sparsity:0/num_fields:100 140.454 MiB/sec 138.317 MiB/sec    -1.521   {'family_index': 5, 'per_family_instance_index': 15, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:0/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 244, 'json_size': 424088.0}
  ParseJSONFields/ordered:0/schema:1/sparsity:10/num_fields:10 177.692 MiB/sec 174.088 MiB/sec    -2.028    {'family_index': 5, 'per_family_instance_index': 5, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:10/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 268, 'json_size': 485740.0}
ParseJSONFields/ordered:1/schema:1/sparsity:10/num_fields:1000 135.271 MiB/sec 132.217 MiB/sec    -2.258 {'family_index': 5, 'per_family_instance_index': 28, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:10/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 24, 'json_size': 4085536.0}
   ParseJSONFields/ordered:0/schema:0/sparsity:0/num_fields:10 180.369 MiB/sec 176.281 MiB/sec    -2.267     {'family_index': 5, 'per_family_instance_index': 3, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:0/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 274, 'json_size': 484344.0}
 ParseJSONFields/ordered:0/schema:0/sparsity:90/num_fields:100  48.815 MiB/sec  47.630 MiB/sec    -2.427   {'family_index': 5, 'per_family_instance_index': 23, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:90/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 84, 'json_size': 425534.0}
  ParseJSONFields/ordered:0/schema:0/sparsity:10/num_fields:10 173.820 MiB/sec 169.515 MiB/sec    -2.477    {'family_index': 5, 'per_family_instance_index': 7, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:10/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 262, 'json_size': 485740.0}
                                        ChunkJSONPrettyPrinted 305.981 MiB/sec 298.223 MiB/sec    -2.536                                         {'family_index': 0, 'per_family_instance_index': 0, 'run_name': 'ChunkJSONPrettyPrinted', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1033, 'json_size': 215361.0}
 ParseJSONFields/ordered:1/schema:1/sparsity:90/num_fields:100  49.710 MiB/sec  48.375 MiB/sec    -2.685   {'family_index': 5, 'per_family_instance_index': 20, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:90/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 84, 'json_size': 430278.0}
 ParseJSONFields/ordered:0/schema:1/sparsity:90/num_fields:100  49.334 MiB/sec  47.915 MiB/sec    -2.878   {'family_index': 5, 'per_family_instance_index': 21, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:90/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 85, 'json_size': 425534.0}
ParseJSONFields/ordered:0/schema:1/sparsity:10/num_fields:1000 130.316 MiB/sec 126.470 MiB/sec    -2.951 {'family_index': 5, 'per_family_instance_index': 29, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:10/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 23, 'json_size': 4088946.0}
 ParseJSONFields/ordered:0/schema:1/sparsity:0/num_fields:1000 134.779 MiB/sec 130.799 MiB/sec    -2.953  {'family_index': 5, 'per_family_instance_index': 25, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:0/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 22, 'json_size': 4546025.0}
 ParseJSONFields/ordered:1/schema:0/sparsity:90/num_fields:100  49.652 MiB/sec  48.147 MiB/sec    -3.031   {'family_index': 5, 'per_family_instance_index': 22, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:90/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 85, 'json_size': 430278.0}
  ParseJSONFields/ordered:0/schema:0/sparsity:90/num_fields:10  62.658 MiB/sec  60.611 MiB/sec    -3.267    {'family_index': 5, 'per_family_instance_index': 11, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:90/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 86, 'json_size': 530883.0}
ParseJSONFields/ordered:1/schema:0/sparsity:10/num_fields:1000 131.295 MiB/sec 126.847 MiB/sec    -3.388 {'family_index': 5, 'per_family_instance_index': 30, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:10/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 24, 'json_size': 4085536.0}
ParseJSONFields/ordered:0/schema:0/sparsity:10/num_fields:1000 126.625 MiB/sec 121.335 MiB/sec    -4.178 {'family_index': 5, 'per_family_instance_index': 31, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:10/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 23, 'json_size': 4088946.0}
ParseJSONFields/ordered:1/schema:0/sparsity:90/num_fields:1000  43.432 MiB/sec  41.614 MiB/sec    -4.187  {'family_index': 5, 'per_family_instance_index': 34, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:90/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 70, 'json_size': 457089.0}
 ParseJSONFields/ordered:0/schema:0/sparsity:0/num_fields:1000 131.544 MiB/sec 125.765 MiB/sec    -4.393  {'family_index': 5, 'per_family_instance_index': 27, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:0/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 21, 'json_size': 4546025.0}
ParseJSONFields/ordered:0/schema:1/sparsity:90/num_fields:1000  43.140 MiB/sec  41.204 MiB/sec    -4.488  {'family_index': 5, 'per_family_instance_index': 33, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:90/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 70, 'json_size': 454665.0}
ParseJSONFields/ordered:1/schema:1/sparsity:90/num_fields:1000  43.505 MiB/sec  41.523 MiB/sec    -4.556  {'family_index': 5, 'per_family_instance_index': 32, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:90/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 70, 'json_size': 457089.0}
ParseJSONFields/ordered:0/schema:0/sparsity:90/num_fields:1000  43.536 MiB/sec  41.527 MiB/sec    -4.615  {'family_index': 5, 'per_family_instance_index': 35, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:90/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 70, 'json_size': 454665.0}

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Regressions: (2)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                   benchmark        baseline       contender  change %                                                                                                                                                                                                                         counters
ParseJSONFields/ordered:1/schema:0/sparsity:90/num_fields:10  63.104 MiB/sec  60.157 MiB/sec    -4.670 {'family_index': 5, 'per_family_instance_index': 10, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:90/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 88, 'json_size': 524228.0}
                ReadJSONBlockWithSchemaMultiThread/real_time 752.152 MiB/sec 695.624 MiB/sec    -7.515                {'family_index': 4, 'per_family_instance_index': 0, 'run_name': 'ReadJSONBlockWithSchemaMultiThread/real_time', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 36, 'json_size': 15026882.0}

pitrou avatar Nov 15 '22 16:11 pitrou

You shouldn't even need a vector since the indices are already implicitly sequential, just a tracked index.

In retrospect, the list was an artifact from a time before BuildContext, where I was trying to store strings inline with their index - or something to that effect.

benibus avatar Nov 16 '22 18:11 benibus

Thanks. Here are the updated benchmark numbers that I get:

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Non-regressions: (40)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                     benchmark        baseline       contender  change %                                                                                                                                                                                                                            counters
 ParseJSONFields/ordered:1/schema:0/sparsity:0/num_fields:1000 137.627 MiB/sec 166.086 MiB/sec    20.679  {'family_index': 5, 'per_family_instance_index': 26, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:0/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 22, 'json_size': 4544425.0}
 ParseJSONFields/ordered:1/schema:1/sparsity:0/num_fields:1000 141.204 MiB/sec 167.385 MiB/sec    18.542  {'family_index': 5, 'per_family_instance_index': 24, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:0/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 23, 'json_size': 4544425.0}
  ParseJSONFields/ordered:1/schema:0/sparsity:0/num_fields:100 146.855 MiB/sec 168.656 MiB/sec    14.845   {'family_index': 5, 'per_family_instance_index': 14, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:0/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 253, 'json_size': 424102.0}
  ParseJSONFields/ordered:1/schema:1/sparsity:0/num_fields:100 149.699 MiB/sec 170.197 MiB/sec    13.693   {'family_index': 5, 'per_family_instance_index': 12, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:0/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 259, 'json_size': 424102.0}
                                        ChunkJSONLineDelimited       97.748902      102.645837     5.010                                      {'family_index': 1, 'per_family_instance_index': 0, 'run_name': 'ChunkJSONLineDelimited', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 7071295, 'json_size': 150361.0}
   ParseJSONFields/ordered:1/schema:0/sparsity:0/num_fields:10 192.316 MiB/sec 197.609 MiB/sec     2.752     {'family_index': 5, 'per_family_instance_index': 2, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:0/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 287, 'json_size': 483895.0}
   ParseJSONFields/ordered:1/schema:1/sparsity:0/num_fields:10 193.421 MiB/sec 198.712 MiB/sec     2.736     {'family_index': 5, 'per_family_instance_index': 0, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:0/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 291, 'json_size': 483895.0}
                                      ParseJSONBlockWithSchema 136.226 MiB/sec 137.839 MiB/sec     1.184                                        {'family_index': 2, 'per_family_instance_index': 0, 'run_name': 'ParseJSONBlockWithSchema', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 664, 'json_size': 150361.0}
                                        ChunkJSONPrettyPrinted 305.981 MiB/sec 308.826 MiB/sec     0.930                                         {'family_index': 0, 'per_family_instance_index': 0, 'run_name': 'ChunkJSONPrettyPrinted', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1033, 'json_size': 215361.0}
                  ReadJSONBlockWithSchemaMultiThread/real_time 752.152 MiB/sec 758.075 MiB/sec     0.787                   {'family_index': 4, 'per_family_instance_index': 0, 'run_name': 'ReadJSONBlockWithSchemaMultiThread/real_time', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 36, 'json_size': 15026882.0}
                           ReadJSONBlockWithSchemaSingleThread 117.169 MiB/sec 117.971 MiB/sec     0.685                             {'family_index': 3, 'per_family_instance_index': 0, 'run_name': 'ReadJSONBlockWithSchemaSingleThread', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 6, 'json_size': 15026882.0}
  ParseJSONFields/ordered:1/schema:0/sparsity:10/num_fields:10 184.568 MiB/sec 184.871 MiB/sec     0.164    {'family_index': 5, 'per_family_instance_index': 6, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:10/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 281, 'json_size': 482610.0}
 ParseJSONFields/ordered:0/schema:0/sparsity:10/num_fields:100 135.707 MiB/sec 135.288 MiB/sec    -0.309  {'family_index': 5, 'per_family_instance_index': 19, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:10/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 236, 'json_size': 422790.0}
   ParseJSONFields/ordered:0/schema:1/sparsity:0/num_fields:10 179.680 MiB/sec 179.073 MiB/sec    -0.338     {'family_index': 5, 'per_family_instance_index': 1, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:0/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 272, 'json_size': 484344.0}
  ParseJSONFields/ordered:0/schema:0/sparsity:10/num_fields:10 173.820 MiB/sec 172.981 MiB/sec    -0.483    {'family_index': 5, 'per_family_instance_index': 7, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:10/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 262, 'json_size': 485740.0}
   ParseJSONFields/ordered:0/schema:0/sparsity:0/num_fields:10 180.369 MiB/sec 179.352 MiB/sec    -0.564     {'family_index': 5, 'per_family_instance_index': 3, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:0/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 274, 'json_size': 484344.0}
  ParseJSONFields/ordered:0/schema:0/sparsity:0/num_fields:100 140.454 MiB/sec 139.630 MiB/sec    -0.587   {'family_index': 5, 'per_family_instance_index': 15, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:0/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 244, 'json_size': 424088.0}
 ParseJSONFields/ordered:1/schema:0/sparsity:10/num_fields:100 141.608 MiB/sec 140.558 MiB/sec    -0.741  {'family_index': 5, 'per_family_instance_index': 18, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:10/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 244, 'json_size': 425955.0}
ParseJSONFields/ordered:1/schema:0/sparsity:10/num_fields:1000 131.295 MiB/sec 129.469 MiB/sec    -1.390 {'family_index': 5, 'per_family_instance_index': 30, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:10/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 24, 'json_size': 4085536.0}
ParseJSONFields/ordered:0/schema:0/sparsity:10/num_fields:1000 126.625 MiB/sec 124.593 MiB/sec    -1.605 {'family_index': 5, 'per_family_instance_index': 31, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:10/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 23, 'json_size': 4088946.0}
  ParseJSONFields/ordered:0/schema:0/sparsity:90/num_fields:10  62.658 MiB/sec  61.562 MiB/sec    -1.750    {'family_index': 5, 'per_family_instance_index': 11, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:90/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 86, 'json_size': 530883.0}
  ParseJSONFields/ordered:1/schema:0/sparsity:90/num_fields:10  63.104 MiB/sec  61.941 MiB/sec    -1.842    {'family_index': 5, 'per_family_instance_index': 10, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:90/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 88, 'json_size': 524228.0}
 ParseJSONFields/ordered:0/schema:0/sparsity:0/num_fields:1000 131.544 MiB/sec 128.895 MiB/sec    -2.014  {'family_index': 5, 'per_family_instance_index': 27, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:0/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 21, 'json_size': 4546025.0}
  ParseJSONFields/ordered:0/schema:1/sparsity:10/num_fields:10 177.692 MiB/sec 174.002 MiB/sec    -2.076    {'family_index': 5, 'per_family_instance_index': 5, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:10/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 268, 'json_size': 485740.0}
  ParseJSONFields/ordered:1/schema:1/sparsity:90/num_fields:10  62.917 MiB/sec  61.534 MiB/sec    -2.198     {'family_index': 5, 'per_family_instance_index': 8, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:90/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 88, 'json_size': 524228.0}
 ParseJSONFields/ordered:1/schema:1/sparsity:10/num_fields:100 144.507 MiB/sec 141.261 MiB/sec    -2.246  {'family_index': 5, 'per_family_instance_index': 16, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:10/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 248, 'json_size': 425955.0}
  ParseJSONFields/ordered:1/schema:1/sparsity:10/num_fields:10 190.419 MiB/sec 186.112 MiB/sec    -2.262    {'family_index': 5, 'per_family_instance_index': 4, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:10/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 289, 'json_size': 482610.0}
  ParseJSONFields/ordered:0/schema:1/sparsity:90/num_fields:10  63.273 MiB/sec  61.837 MiB/sec    -2.268     {'family_index': 5, 'per_family_instance_index': 9, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:90/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 87, 'json_size': 530883.0}
 ParseJSONFields/ordered:0/schema:0/sparsity:90/num_fields:100  48.815 MiB/sec  47.622 MiB/sec    -2.443   {'family_index': 5, 'per_family_instance_index': 23, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:90/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 84, 'json_size': 425534.0}
  ParseJSONFields/ordered:0/schema:1/sparsity:0/num_fields:100 143.278 MiB/sec 139.759 MiB/sec    -2.456   {'family_index': 5, 'per_family_instance_index': 13, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:0/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 248, 'json_size': 424088.0}
 ParseJSONFields/ordered:0/schema:1/sparsity:0/num_fields:1000 134.779 MiB/sec 131.456 MiB/sec    -2.465  {'family_index': 5, 'per_family_instance_index': 25, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:0/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 22, 'json_size': 4546025.0}
 ParseJSONFields/ordered:0/schema:1/sparsity:10/num_fields:100 139.161 MiB/sec 135.641 MiB/sec    -2.529  {'family_index': 5, 'per_family_instance_index': 17, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:10/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 241, 'json_size': 422790.0}
 ParseJSONFields/ordered:1/schema:0/sparsity:90/num_fields:100  49.652 MiB/sec  48.377 MiB/sec    -2.568   {'family_index': 5, 'per_family_instance_index': 22, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:90/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 85, 'json_size': 430278.0}
ParseJSONFields/ordered:1/schema:1/sparsity:10/num_fields:1000 135.271 MiB/sec 131.763 MiB/sec    -2.593 {'family_index': 5, 'per_family_instance_index': 28, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:10/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 24, 'json_size': 4085536.0}
ParseJSONFields/ordered:0/schema:1/sparsity:10/num_fields:1000 130.316 MiB/sec 126.775 MiB/sec    -2.717 {'family_index': 5, 'per_family_instance_index': 29, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:10/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 23, 'json_size': 4088946.0}
 ParseJSONFields/ordered:1/schema:1/sparsity:90/num_fields:100  49.710 MiB/sec  48.254 MiB/sec    -2.928   {'family_index': 5, 'per_family_instance_index': 20, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:90/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 84, 'json_size': 430278.0}
 ParseJSONFields/ordered:0/schema:1/sparsity:90/num_fields:100  49.334 MiB/sec  47.731 MiB/sec    -3.250   {'family_index': 5, 'per_family_instance_index': 21, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:90/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 85, 'json_size': 425534.0}
ParseJSONFields/ordered:1/schema:0/sparsity:90/num_fields:1000  43.432 MiB/sec  41.727 MiB/sec    -3.926  {'family_index': 5, 'per_family_instance_index': 34, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:90/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 70, 'json_size': 457089.0}
ParseJSONFields/ordered:0/schema:0/sparsity:90/num_fields:1000  43.536 MiB/sec  41.661 MiB/sec    -4.306  {'family_index': 5, 'per_family_instance_index': 35, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:90/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 70, 'json_size': 454665.0}
ParseJSONFields/ordered:1/schema:1/sparsity:90/num_fields:1000  43.505 MiB/sec  41.598 MiB/sec    -4.383  {'family_index': 5, 'per_family_instance_index': 32, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:90/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 70, 'json_size': 457089.0}

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Regressions: (1)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                     benchmark       baseline      contender  change %                                                                                                                                                                                                                           counters
ParseJSONFields/ordered:0/schema:1/sparsity:90/num_fields:1000 43.140 MiB/sec 41.202 MiB/sec    -4.492 {'family_index': 5, 'per_family_instance_index': 33, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:90/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 70, 'json_size': 454665.0}

pitrou avatar Nov 17 '22 13:11 pitrou

The AWS-related test failures are unrelated to this PR.

pitrou avatar Nov 22 '22 15:11 pitrou

Benchmark runs are scheduled for baseline = 1a9b1e859126dc12e69eaf8852c6bd103b421ea5 and contender = 21309eaaeb6b2e7f4d4987830df23d5d711ee409. 21309eaaeb6b2e7f4d4987830df23d5d711ee409 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. Conbench compare runs links: [Finished :arrow_down:0.0% :arrow_up:0.0%] ec2-t3-xlarge-us-east-2 [Finished :arrow_down:0.37% :arrow_up:0.64%] test-mac-arm [Finished :arrow_down:0.0% :arrow_up:0.0%] ursa-i9-9960x [Finished :arrow_down:0.24% :arrow_up:0.31%] ursa-thinkcentre-m75q Buildkite builds: [Finished] 21309eaa ec2-t3-xlarge-us-east-2 [Finished] 21309eaa test-mac-arm [Finished] 21309eaa ursa-i9-9960x [Finished] 21309eaa ursa-thinkcentre-m75q [Finished] 1a9b1e85 ec2-t3-xlarge-us-east-2 [Finished] 1a9b1e85 test-mac-arm [Finished] 1a9b1e85 ursa-i9-9960x [Finished] 1a9b1e85 ursa-thinkcentre-m75q Supported benchmarks: ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True test-mac-arm: Supported benchmark langs: C++, Python, R ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

ursabot avatar Nov 24 '22 13:11 ursabot