OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

Improve search performance for numeric sort queries

Open arjunkumargiri opened this issue 2 years ago • 25 comments

Numeric sorting is one of the key query mechanisms used across multiple OpenSearch clusters. It is critical to understand performance characteristics of numeric sorting queries and identify mechanisms to reduce query latency and reduce performance overhead.

To understand query characteristics of numeric sorting, a simple performance testing was performed with below settings:

Benchmark tool: opensearch-benchmark Workload: geonames Task: desc_sort_population Nodes: 1 node JVM size: 4 GB

Benchmark result:

Metric Value Unit
Min Throughput 70.09 ops/s
Mean Throughput 74.27 ops/s
Median Throughput 74.59 ops/s
Max Throughput 74.79 ops/s
50th percentile latency 6.32264 ms
90th percentile latency 6.81483 ms
99th percentile latency 7.45008 ms
99.9th percentile latency 15.4041 ms
100th percentile latency 16.5634 ms
50th percentile service time 5.55759 ms
90th percentile service time 5.73615 ms
99th percentile service time 6.43827 ms
99.9th percentile service time 14.8174 ms
100th percentile service time 15.4417 ms
error rate 0 %

CPU profile:

Numeric sorting CPU profile

As expected most CPU cycles for numeric sorting is spent in Long comparator to do perform sorting operation. CPU cycles are equally distributed between PointValues operations estimatePointCount and intersect

Opening this issue to brainstorm and identify potential improvements to numeric sorting.

arjunkumargiri avatar Oct 23 '23 17:10 arjunkumargiri

I was brainstorming with @harshavamsi on this one briefly last week.

I think there might be some trickery that we can do especially for the special case where a segment has no deletes.

Specifically, I'm wondering if we can inspect the BKD tree to find the leftmost/rightmost (depending on sort order) smallest range with at least N hits, where N is the size parameter (or the track_total_hits limit). Then we could implicitly attach a range query filter.

I don't know if it would ultimately help, or if it's essentially what happens in the the PointValues estimate/intersect methods anyway.

msfroh avatar Oct 23 '23 23:10 msfroh

@msfroh thanks for the inputs. Tagging @rishabhmaurya here as well.

@rishabhmaurya had the idea of essentially trying to help match_all queries that use a descending sort on a numeric field. Rather than going through the entire BKD tree like you mentioned, we could essentially look through the min/max value that makes the most sense for us and then attach a range filter on that node assuming other attributes like the number of hits and the number of docs to be returned are all taken care of first.

I don't think I did a great job of explaining, but I will put up an RFC with my thought process and how we could prune the tree.

harshavamsi avatar Oct 24 '23 01:10 harshavamsi

Thanks @harshavamsi for working on it.

I have working version of it in lucene and details of optimization are mentioned here - https://github.com/apache/lucene/issues/12534 and PR https://github.com/rishabhmaurya/lucene/pull/2. I had a discussion around it with @msfroh and we agreed upon its utility. We can take take early feedback from @nknize as he understands this part of code very well.

I started making changes in opensearch as well because lucene community may not accept it as it works for cases with MatchAllQuery with desc sort and no deletions on numeric field. You can find the opensearch changes here, its still work in progress - https://github.com/rishabhmaurya/OpenSearch/commit/f261cb380dce345c4ce3671a814a12e67258fff5

rishabhmaurya avatar Oct 24 '23 03:10 rishabhmaurya

Should we pull this in https://github.com/rishabhmaurya/OpenSearch/commit/f261cb380dce345c4ce3671a814a12e67258fff5 and run a benchmark along with profile to identify the early improvements cc: @sandeshkr419

getsaurabh02 avatar Oct 30 '23 17:10 getsaurabh02

https://github.com/rishabhmaurya/OpenSearch/commit/f261cb380dce345c4ce3671a814a12e67258fff5 is still work in progress so can't be used directly. Although, we can build custom lucene jar using - https://github.com/apache/lucene/issues/12534 where I have the changes working and check for the estimates on gains . We may have to tweak with entry condition here - https://github.com/rishabhmaurya/lucene/pull/2/files#diff-79c6a57519ecd1ef504629e62e13d17859a4ffedc58f4602e583ce758a15adc8R294 to find the sweet spot for this optimization.

rishabhmaurya avatar Oct 30 '23 17:10 rishabhmaurya

Current steps on this:

  • Build a custom lucene jar with https://github.com/apache/lucene/issues/12534 changes
  • Run a custom benchmark on match all query with a numeric desc sort and compare performance with current and new implementation to get a baseline on perf improvement

harshavamsi avatar Oct 31 '23 17:10 harshavamsi

Preliminary benchmarking results:

Without optimization

Metric Value Unit
Min Throughput 1.5 ops/s
Mean Throughput 1.51 ops/s
Median Throughput 1.51 ops/s
Max Throughput 1.51 ops/s
50th percentile latency 6.23599 ms
90th percentile latency 6.81445 ms
99th percentile latency 7.21335 ms
100th percentile latency 7.22365 ms
50th percentile service time 4.63105 ms
90th percentile service time 5.02198 ms
99th percentile service time 5.20355 ms
100th percentile service time 5.24069 ms
error rate 0 %

With optimization

Metric Value Unit
Min Throughput 1.5 ops/s
Mean Throughput 1.5 ops/s
Median Throughput 1.5 ops/s
Max Throughput 1.5 ops/s
50th percentile latency 8.20805 ms
90th percentile latency 8.61225 ms
99th percentile latency 8.91156 ms
100th percentile latency 9.02062 ms
50th percentile service time 6.5675 ms
90th percentile service time 6.76763 ms
99th percentile service time 7.00944 ms
100th percentile service time 7.10608 ms
error rate 0 %

harshavamsi avatar Nov 01 '23 02:11 harshavamsi

thanks @harshavamsi for running the benchmark. Could you provide more details on the workload and queries you ran?

rishabhmaurya avatar Nov 01 '23 15:11 rishabhmaurya

@rishabhmaurya

I ran this workload and this task:

Workload: geonames Task: desc_sort_population

I used an r5.2xlarge cluster for both benchmarks. The non optimized run was a regular cluster I had set up to run keyword benchmarking. The optimized cluster was running a custom build of OS with a patched lucene version that included the optimization.

This is the query:

    {
      "name": "desc_sort_population",
      "operation-type": "search",
      "body": {
        "query": {
          "match_all": {}
        },
        "sort" : [
          {"population" : "desc"}
        ]
      }
    },

harshavamsi avatar Nov 01 '23 17:11 harshavamsi

Re-running the benchmark on the optimized cluster:

|                                                 Min Throughput | desc_sort_population |         1.5 |  ops/s |
|                                                Mean Throughput | desc_sort_population |         1.5 |  ops/s |
|                                              Median Throughput | desc_sort_population |         1.5 |  ops/s |
|                                                 Max Throughput | desc_sort_population |         1.5 |  ops/s |
|                                        50th percentile latency | desc_sort_population |     6.71526 |     ms |
|                                        90th percentile latency | desc_sort_population |     7.17203 |     ms |
|                                        99th percentile latency | desc_sort_population |     7.40734 |     ms |
|                                       100th percentile latency | desc_sort_population |     7.46786 |     ms |
|                                   50th percentile service time | desc_sort_population |     5.15482 |     ms |
|                                   90th percentile service time | desc_sort_population |     5.38515 |     ms |
|                                   99th percentile service time | desc_sort_population |     5.79006 |     ms |
|                                  100th percentile service time | desc_sort_population |     5.89911 |     ms |
|                                                     error rate | desc_sort_population |           0 |      % |

harshavamsi avatar Nov 01 '23 18:11 harshavamsi

can you also post the segment stats here and overall index size. Given the latency is already pretty low, this maybe not be the right workload to test against.

rishabhmaurya avatar Nov 02 '23 00:11 rishabhmaurya

@rishabhmaurya here's the segment stats:

{
    "_shards": {
        "total": 7,
        "successful": 6,
        "failed": 0
    },
    "indices": {
        "geonames": {
            "shards": {
                "0": [
                    {
                        "routing": {
                            "state": "STARTED",
                            "primary": true,
                            "node": "modFRW6URWi6fqeFypw2fg"
                        },
                        "num_committed_segments": 16,
                        "num_search_segments": 16,
                        "segments": {
                            "_0": {
                                "generation": 0,
                                "num_docs": 10535,
                                "deleted_docs": 0,
                                "size_in_bytes": 3746696,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_1": {
                                "generation": 1,
                                "num_docs": 10070,
                                "deleted_docs": 0,
                                "size_in_bytes": 3459472,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_2": {
                                "generation": 2,
                                "num_docs": 47250,
                                "deleted_docs": 0,
                                "size_in_bytes": 15707588,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_3": {
                                "generation": 3,
                                "num_docs": 23435,
                                "deleted_docs": 0,
                                "size_in_bytes": 8124605,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_4": {
                                "generation": 4,
                                "num_docs": 111695,
                                "deleted_docs": 0,
                                "size_in_bytes": 31826890,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_5": {
                                "generation": 5,
                                "num_docs": 59986,
                                "deleted_docs": 0,
                                "size_in_bytes": 18805519,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_6": {
                                "generation": 6,
                                "num_docs": 18956,
                                "deleted_docs": 0,
                                "size_in_bytes": 6143059,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_7": {
                                "generation": 7,
                                "num_docs": 488811,
                                "deleted_docs": 0,
                                "size_in_bytes": 127081465,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_8": {
                                "generation": 8,
                                "num_docs": 545075,
                                "deleted_docs": 0,
                                "size_in_bytes": 139838084,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_9": {
                                "generation": 9,
                                "num_docs": 175324,
                                "deleted_docs": 0,
                                "size_in_bytes": 48162652,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_a": {
                                "generation": 10,
                                "num_docs": 10813,
                                "deleted_docs": 0,
                                "size_in_bytes": 2925647,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_b": {
                                "generation": 11,
                                "num_docs": 161960,
                                "deleted_docs": 0,
                                "size_in_bytes": 38022913,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_c": {
                                "generation": 12,
                                "num_docs": 52153,
                                "deleted_docs": 0,
                                "size_in_bytes": 13055539,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_d": {
                                "generation": 13,
                                "num_docs": 223779,
                                "deleted_docs": 0,
                                "size_in_bytes": 55202061,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_e": {
                                "generation": 14,
                                "num_docs": 272247,
                                "deleted_docs": 0,
                                "size_in_bytes": 66167512,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_f": {
                                "generation": 15,
                                "num_docs": 66286,
                                "deleted_docs": 0,
                                "size_in_bytes": 17067734,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            }
                        }
                    }
                ],
                "1": [
                    {
                        "routing": {
                            "state": "STARTED",
                            "primary": true,
                            "node": "modFRW6URWi6fqeFypw2fg"
                        },
                        "num_committed_segments": 17,
                        "num_search_segments": 17,
                        "segments": {
                            "_0": {
                                "generation": 0,
                                "num_docs": 16775,
                                "deleted_docs": 0,
                                "size_in_bytes": 5516801,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_1": {
                                "generation": 1,
                                "num_docs": 7823,
                                "deleted_docs": 0,
                                "size_in_bytes": 2994373,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_2": {
                                "generation": 2,
                                "num_docs": 1479,
                                "deleted_docs": 0,
                                "size_in_bytes": 612107,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_3": {
                                "generation": 3,
                                "num_docs": 42308,
                                "deleted_docs": 0,
                                "size_in_bytes": 13547068,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_4": {
                                "generation": 4,
                                "num_docs": 40185,
                                "deleted_docs": 0,
                                "size_in_bytes": 13661412,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_5": {
                                "generation": 5,
                                "num_docs": 6463,
                                "deleted_docs": 0,
                                "size_in_bytes": 2599313,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_6": {
                                "generation": 6,
                                "num_docs": 73610,
                                "deleted_docs": 0,
                                "size_in_bytes": 21328234,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_7": {
                                "generation": 7,
                                "num_docs": 120275,
                                "deleted_docs": 0,
                                "size_in_bytes": 34799549,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_8": {
                                "generation": 8,
                                "num_docs": 23483,
                                "deleted_docs": 0,
                                "size_in_bytes": 7484546,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_9": {
                                "generation": 9,
                                "num_docs": 496505,
                                "deleted_docs": 0,
                                "size_in_bytes": 129677362,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_a": {
                                "generation": 10,
                                "num_docs": 431367,
                                "deleted_docs": 0,
                                "size_in_bytes": 112317590,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_b": {
                                "generation": 11,
                                "num_docs": 153711,
                                "deleted_docs": 0,
                                "size_in_bytes": 42394841,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_c": {
                                "generation": 12,
                                "num_docs": 64727,
                                "deleted_docs": 0,
                                "size_in_bytes": 15216055,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_d": {
                                "generation": 13,
                                "num_docs": 3895,
                                "deleted_docs": 0,
                                "size_in_bytes": 1048412,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_e": {
                                "generation": 14,
                                "num_docs": 214024,
                                "deleted_docs": 0,
                                "size_in_bytes": 53305902,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_f": {
                                "generation": 15,
                                "num_docs": 500718,
                                "deleted_docs": 0,
                                "size_in_bytes": 118285309,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_g": {
                                "generation": 16,
                                "num_docs": 84258,
                                "deleted_docs": 0,
                                "size_in_bytes": 21819490,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            }
                        }
                    }
                ],
                "2": [
                    {
                        "routing": {
                            "state": "STARTED",
                            "primary": true,
                            "node": "modFRW6URWi6fqeFypw2fg"
                        },
                        "num_committed_segments": 17,
                        "num_search_segments": 17,
                        "segments": {
                            "_0": {
                                "generation": 0,
                                "num_docs": 18219,
                                "deleted_docs": 0,
                                "size_in_bytes": 6406307,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_1": {
                                "generation": 1,
                                "num_docs": 14097,
                                "deleted_docs": 0,
                                "size_in_bytes": 4702834,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_2": {
                                "generation": 2,
                                "num_docs": 1766,
                                "deleted_docs": 0,
                                "size_in_bytes": 801728,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_3": {
                                "generation": 3,
                                "num_docs": 24677,
                                "deleted_docs": 0,
                                "size_in_bytes": 8677890,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_4": {
                                "generation": 4,
                                "num_docs": 66197,
                                "deleted_docs": 0,
                                "size_in_bytes": 20670999,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_5": {
                                "generation": 5,
                                "num_docs": 8773,
                                "deleted_docs": 0,
                                "size_in_bytes": 3353062,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_6": {
                                "generation": 6,
                                "num_docs": 140084,
                                "deleted_docs": 0,
                                "size_in_bytes": 38727079,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_7": {
                                "generation": 7,
                                "num_docs": 102668,
                                "deleted_docs": 0,
                                "size_in_bytes": 29354176,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_8": {
                                "generation": 8,
                                "num_docs": 11886,
                                "deleted_docs": 0,
                                "size_in_bytes": 3646252,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_9": {
                                "generation": 9,
                                "num_docs": 481359,
                                "deleted_docs": 0,
                                "size_in_bytes": 124498298,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_a": {
                                "generation": 10,
                                "num_docs": 420947,
                                "deleted_docs": 0,
                                "size_in_bytes": 110980771,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_b": {
                                "generation": 11,
                                "num_docs": 122864,
                                "deleted_docs": 0,
                                "size_in_bytes": 33941196,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_c": {
                                "generation": 12,
                                "num_docs": 55618,
                                "deleted_docs": 0,
                                "size_in_bytes": 13156027,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_d": {
                                "generation": 13,
                                "num_docs": 28840,
                                "deleted_docs": 0,
                                "size_in_bytes": 7174693,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_e": {
                                "generation": 14,
                                "num_docs": 493797,
                                "deleted_docs": 0,
                                "size_in_bytes": 116969201,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_f": {
                                "generation": 15,
                                "num_docs": 237488,
                                "deleted_docs": 0,
                                "size_in_bytes": 58809773,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_g": {
                                "generation": 16,
                                "num_docs": 47355,
                                "deleted_docs": 0,
                                "size_in_bytes": 12706968,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            }
                        }
                    }
                ],
                "3": [
                    {
                        "routing": {
                            "state": "STARTED",
                            "primary": true,
                            "node": "modFRW6URWi6fqeFypw2fg"
                        },
                        "num_committed_segments": 14,
                        "num_search_segments": 14,
                        "segments": {
                            "_0": {
                                "generation": 0,
                                "num_docs": 24750,
                                "deleted_docs": 0,
                                "size_in_bytes": 7997829,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_1": {
                                "generation": 1,
                                "num_docs": 15401,
                                "deleted_docs": 0,
                                "size_in_bytes": 5526373,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_2": {
                                "generation": 2,
                                "num_docs": 4274,
                                "deleted_docs": 0,
                                "size_in_bytes": 1670168,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_3": {
                                "generation": 3,
                                "num_docs": 74714,
                                "deleted_docs": 0,
                                "size_in_bytes": 23282221,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_4": {
                                "generation": 4,
                                "num_docs": 49504,
                                "deleted_docs": 0,
                                "size_in_bytes": 16640256,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_5": {
                                "generation": 5,
                                "num_docs": 1152,
                                "deleted_docs": 0,
                                "size_in_bytes": 425884,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_6": {
                                "generation": 6,
                                "num_docs": 723774,
                                "deleted_docs": 0,
                                "size_in_bytes": 185696573,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_7": {
                                "generation": 7,
                                "num_docs": 374910,
                                "deleted_docs": 0,
                                "size_in_bytes": 102114378,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_8": {
                                "generation": 8,
                                "num_docs": 144763,
                                "deleted_docs": 0,
                                "size_in_bytes": 40681188,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_9": {
                                "generation": 9,
                                "num_docs": 72038,
                                "deleted_docs": 0,
                                "size_in_bytes": 16996726,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_a": {
                                "generation": 10,
                                "num_docs": 60729,
                                "deleted_docs": 0,
                                "size_in_bytes": 14683267,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_b": {
                                "generation": 11,
                                "num_docs": 434765,
                                "deleted_docs": 0,
                                "size_in_bytes": 102456131,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_c": {
                                "generation": 12,
                                "num_docs": 87455,
                                "deleted_docs": 0,
                                "size_in_bytes": 23013397,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_d": {
                                "generation": 13,
                                "num_docs": 211170,
                                "deleted_docs": 0,
                                "size_in_bytes": 53235342,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            }
                        }
                    }
                ],
                "4": [
                    {
                        "routing": {
                            "state": "STARTED",
                            "primary": true,
                            "node": "modFRW6URWi6fqeFypw2fg"
                        },
                        "num_committed_segments": 13,
                        "num_search_segments": 13,
                        "segments": {
                            "_0": {
                                "generation": 0,
                                "num_docs": 29157,
                                "deleted_docs": 0,
                                "size_in_bytes": 9596750,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_1": {
                                "generation": 1,
                                "num_docs": 25289,
                                "deleted_docs": 0,
                                "size_in_bytes": 9046435,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_2": {
                                "generation": 2,
                                "num_docs": 101070,
                                "deleted_docs": 0,
                                "size_in_bytes": 30725411,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_3": {
                                "generation": 3,
                                "num_docs": 36789,
                                "deleted_docs": 0,
                                "size_in_bytes": 12505981,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_4": {
                                "generation": 4,
                                "num_docs": 16360,
                                "deleted_docs": 0,
                                "size_in_bytes": 5759995,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_5": {
                                "generation": 5,
                                "num_docs": 627072,
                                "deleted_docs": 0,
                                "size_in_bytes": 161563718,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_6": {
                                "generation": 6,
                                "num_docs": 447053,
                                "deleted_docs": 0,
                                "size_in_bytes": 118268120,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_7": {
                                "generation": 7,
                                "num_docs": 131989,
                                "deleted_docs": 0,
                                "size_in_bytes": 37526334,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_8": {
                                "generation": 8,
                                "num_docs": 50138,
                                "deleted_docs": 0,
                                "size_in_bytes": 12468992,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_9": {
                                "generation": 9,
                                "num_docs": 54350,
                                "deleted_docs": 0,
                                "size_in_bytes": 12764422,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_a": {
                                "generation": 10,
                                "num_docs": 245263,
                                "deleted_docs": 0,
                                "size_in_bytes": 62303574,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_b": {
                                "generation": 11,
                                "num_docs": 449658,
                                "deleted_docs": 0,
                                "size_in_bytes": 105290863,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_c": {
                                "generation": 12,
                                "num_docs": 66300,
                                "deleted_docs": 0,
                                "size_in_bytes": 17125599,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            }
                        }
                    }
                ]
            }
        }
    }
}

Index size:

"store": {
    "size_in_bytes": 2975896540,
    "reserved_in_bytes": 0
},

harshavamsi avatar Nov 02 '23 23:11 harshavamsi

The segment sizes are too small to see any noticeable difference, I can work with you on it next week.

rishabhmaurya avatar Nov 03 '23 01:11 rishabhmaurya

@rishabhmaurya The POC you tried would only work for MatchAllQuery. I did try exactly same thing couple of months back, but matchallDocs query along with sorting (vanilla) has rare usage IMO, hence I skipped prototyping it.

gashutos avatar Nov 06 '23 06:11 gashutos

+1 on @gashutos point. @rishabhmaurya - Do you have a specific use case where this will be useful?

backslasht avatar Nov 06 '23 07:11 backslasht

@gashutos thanks for looking. Yes, I have mentioned in the poc that it is supposed to work only for MatchAllQuery with no doc deletions. This will be helpful in 2 cases -

  1. Desc numeric sort on any numeric field - This will make the iteration on bigger segments fast assuming there is no index sort on this numeric field and the lucene index size is significant (in GBs). Since such queries usually span across all segments, so theoretically it should makes things fast. I think this is a common use case and we capture this query type in most of benchmark.
  2. Desc sort on @timestamp field with merge policy as LogByteSize - After force merge, the smallest segment could be big enough to make the desc sort query slow. This will be helpful for such cases too.

Can you point me to your poc/issue and also why do you think its a rare case. Thank you

rishabhmaurya avatar Nov 06 '23 16:11 rishabhmaurya

@rishabhmaurya This problem can be divided in two parts why desc order sort is slower compare to asc order.

  1. For timeseries indices, they are in nearly sort in asc. ( which will be the case for logbytesizemerge policy as well ) RFC in Lucene -> https://github.com/apache/lucene/issues/12448

  2. For non-timeseries workload where our docIdBased disjoint iterator with bkd based competitive iterator works only in asc order of docIds. Reverse BKD based iteration -> https://github.com/opensearch-project/OpenSearch/issues/7680

The reason we think it is rare scenario because generally in production, we dont see just sort on single field without any filtering clause wrapping it. Again this is observation based on my seen user usecases.

gashutos avatar Nov 06 '23 18:11 gashutos

Posting some more number here, same workload and instance but this time with force merging into 1 large segment to see if it could have any impact as well as running on a single primary shard:

Non optimized cluster:

|                                        50th percentile latency |     desc_sort_population |     9.38135 |     ms |
|                                        90th percentile latency |     desc_sort_population |     10.1048 |     ms |
|                                        99th percentile latency |     desc_sort_population |     10.3617 |     ms |
|                                       100th percentile latency |     desc_sort_population |     10.7949 |     ms |
|                                   50th percentile service time |     desc_sort_population |     7.83975 |     ms |
|                                   90th percentile service time |     desc_sort_population |     8.14815 |     ms |
|                                   99th percentile service time |     desc_sort_population |     8.64486 |     ms |
|                                  100th percentile service time |     desc_sort_population |     8.80505 |     ms |
|                                                     error rate |     desc_sort_population |           0 |      % |

Optimized cluster:

|                                        50th percentile latency |     desc_sort_population |     13.4777 |     ms |
|                                        90th percentile latency |     desc_sort_population |     14.0544 |     ms |
|                                        99th percentile latency |     desc_sort_population |      14.372 |     ms |
|                                       100th percentile latency |     desc_sort_population |     15.1146 |     ms |
|                                   50th percentile service time |     desc_sort_population |     11.8186 |     ms |
|                                   90th percentile service time |     desc_sort_population |      12.006 |     ms |
|                                   99th percentile service time |     desc_sort_population |     12.4779 |     ms |
|                                  100th percentile service time |     desc_sort_population |       12.48 |     ms |
|                                                     error rate |     desc_sort_population |           0 |      % |

Will dive into lucene code path to understand where we're spending time when running this workload.

harshavamsi avatar Nov 07 '23 01:11 harshavamsi

Hi @harshavamsi - will documentation be required for this feature in 2.12?

hdhalter avatar Dec 11 '23 23:12 hdhalter

will documentation be required for this feature in 2.12?

This is purely an internal optimization task. It should not require any documentation.

msfroh avatar Dec 12 '23 01:12 msfroh

Hi, are we on track for this to be released in 2.12 ?

kiranprakash154 avatar Jan 19 '24 00:01 kiranprakash154

Pushing this out to v2.13, since this optimization is still in the investigation stage. Although the benchmarks numbers looks promising, it requires further deep dive into the lucene code path to understand where we're spending time and coming up with the improvement opportunities.

getsaurabh02 avatar Jan 29 '24 22:01 getsaurabh02

Moved it to 2.14.0 as per the discussion with @harshavamsi

bbarani avatar Mar 04 '24 22:03 bbarani

We should try benchmarking numeric sort queries with https://github.com/apache/lucene/pull/13149.

Based on the explanation at https://blunders.io/posts/es-benchmark-4-inlining, we may see significant improvement to numeric sorting..

msfroh avatar Mar 04 '24 22:03 msfroh

Tagging @opensearch-project/benchmark-core team

bbarani avatar Mar 05 '24 22:03 bbarani