[FEATURE] - Support pagination for PPL and SQL query
Is your feature request related to a problem?
- The new engine fetches a default size of index from OpenSearch set by this setting, the default value is 200. It is a blocker for ML plugin to pull more data by using PPL.
- Cursor only support in V1 engine. . We should support it in V2 engine.
What solution would you like?
- [ ] add pagination support for operator which will unblock ML plugin use case.
- [ ] add cursor support for V2 engine.
[Draft] Design Doc
1 Overview
Support pagination requests for PPL and SQL query
2 Problem Statements
The V2 engine limits the response size of a query. This is a blocker for ML plugin to pull more data by using Piped Processing Language (PPL). While the V1 SQL engine supports pagination using cursor scrolling, PPL uses the V2 engine explicitly. Therefore, we need to migrate cursor support to the V2 engine.
The legacy engine took a stateless approach: the cursor encodes the necessary information for re-constructing the query: https://github.com/opensearch-project/sql/blob/main/docs/dev/Pagination.md This worked well for SQL, but as PPL allows for extra calculations on the data rows returned from index scanning, we might not be able to rebuild context on the fly.
3 Requirements
3.1 Use Cases
3.1.1 Scrolling through web interface
SQL cursor Sample requests:
POST /_plugin/_sql
{
"fetch_size": 5,
"query": "SELECT firstname, lastname FROM accounts WHERE age > 20 ORDER BY state"
}
{
"schema": [...],
"cursor": "eyJhIjp7fSwicyI6IkRYRjFaWEo1UVc1a1JtVjBZMmdCQUFBQUFBQUFBQU1XZWpkdFRFRkZUMlpTZEZkeFdsWnJkRlZoYnpaeVVRPT0iLCJjIjpbeyJuYW1lIjoiZmlyc3RuYW1lIiwidHlwZSI6InRleHQifSx7Im5hbWUiOiJsYXN0bmFtZSIsInR5cGUiOiJ0ZXh0In1dLCJmIjo1LCJpIjoiYWNjb3VudHMiLCJsIjo5NTF9",
"total": 956,
"datarows": [...],
"size": 5,
"status": 200
}
POST /_plugin/_sql
{
"cursor": "eyJhIjp7fSwicyI6IkRYRjFaWEo1UVc1a1JtVjBZMmdCQUFBQUFBQUFBQU1XZWpkdFRFRkZUMlpTZEZkeFdsWnJkRlZoYnpaeVVRPT0iLCJjIjpbeyJuYW1lIjoiZmlyc3RuYW1lIiwidHlwZSI6InRleHQifSx7Im5hbWUiOiJsYXN0bmFtZSIsInR5cGUiOiJ0ZXh0In1dLCJmIjo1LCJpIjoiYWNjb3VudHMiLCJsIjo5NTF9"
}
{
"schema": [...],
"cursor": <next cursor>,
"total": 956,
"datarows": [...],
"size": 5,
"status": 200
}
POST /_plugin/_ppl
{
"fetch_size": 5,
"query": "source=accounts | where age > 20 | sort state | fields firstname,lastname"
}
{
"schema": [...],
"cursor": <first cursor>,
"total": 956,
"datarows": [...],
"size": 5,
"status": 200
}
POST /_plugin/_ppl
{
"cursor": <first cursor>
}
{
"schema": [...],
"cursor": <next cursor>,
"total": 956,
"datarows": [...],
"size": 5,
"status": 200
}
3.1.2 Interacting with ML command in PPL to load full result set
Sample request:
POST /_plugin/_ppl
{
"query": "source=iris_data | fields sepal_length_in_cm, sepal_width_in_cm, petal_length_in_cm, petal_width_in_cm | kmeans 3"
}
Users do not need to add the fetch_size parameter. The ml command implicitly invokes search and scroll to fetch all dataset.
3.1.3 Extending query size limit
#703 This allows for querying and processing large data sets. With this, the 3.1.2 use case can be resolved to some extent. Sample request:
POST /_plugin/_ppl
{
"query": "source=iris_data | head 100000 | fields sepal_length_in_cm, sepal_width_in_cm, petal_length_in_cm, petal_width_in_cm | kmeans 3"
}
3.2 Functional Requirements
- Scrolling support for V2 query backend engine
- Scrolling support for SQL and PPL requests
4 Measure of Success
- Be able to run sample queries in Section 3.1.1
- #703
5 Design
5.1 Background
5.1.1 Current execution flow

Query request
POST /_plugin/_sql
{
"query": "SELECT * FROM table"
}
execution path: (1a) → (2a) → (3)
Scroll query request
POST /_plugin/_sql
{
"fetch_size": 5,
"query": "SELECT * FROM table"
}
execution path: (1a) → (2b) → (3) → legacy OpenSearch request
Scroll cursor request
POST /_plugin/_sql
{
"cursor": <cursor>
}
execution path: (1b) → legacy cursor executor
5.2 Design considerations
5.2.1 Physical plan generation
Physical plan for index scan does not solely depend on the logical plan for the query anymore. The presence/absence of fetch_size parameter will decide whether to make a regular query request or a scroll request to OpenSearch. Therefore, the request handler should pass this info down to the bottom where physical plan is generated.
Option 1
Let the Planner generate different physical plans for query and scroll requests.
- plan(logical plan, is not scroll) => physical index scan with query request
- plan(logical plan, is scroll) => physical index scan with scroll request
Option 2
Let the Analyzer generate different logical plans for query and scroll requests.
- analyze(unresolved plan, is not scroll) => logical index scan for query request
- analyze(unresolved plan, is scroll) => logical index scan for scroll request
5.2.2 Scrolling interaction with limit/offset
Limit
#703
Offset
OpenSearch doesn't allow scroll requests to have offset != 0.
To bypass this restriction, we can set the offset of a scroll request to be 0 and skip the first few results for the user before returning. However, it is suggested that search_after be used if we need to page through more than index.max_result_window hits.
5.2.3 Physical plan retention
Extra calculations are required for some queries, such as the parse, eval, dedup commands in ppl.
When handling pagination for source=index | other commands, the scroll request for OpenSearch handles only the first part source=index, but not the rest | other commands. We should create context and keep it alive for each scroll request.
Context lookup
Where does such context live? In local memory? Shared storage?
5.2.4 Backward compatibility with legacy cursor format
https://github.com/opensearch-project/sql/blob/main/legacy/src/main/java/org/opensearch/sql/legacy/cursor/DefaultCursor.java This cursor was designed for stateless requests.
5.3 Design Overview
5.3.1 Journey of Request

5.3.2 Backend: OpenSearch Request
Interface to the OpenSearch engine
- OpenSearchQueryRequest: for regular query request
- OpenSearchScrollRequest: for scroll request
- Stateful to maintain scroll ID between calls to client search method
5.3.3 Backend: Physical Plan Generation
We add a new request builder to the OpenSearchIndexScan plan. Currently, physical plan depends solely on the (optimized) logical plan. However, the same logical plans can lead to different physical plans because:
- A regular query request and a scroll query request generate the exact same logical plan, but they need different physical plans to invoke different OpenSearch requests.
- The presence/absence of ML commands can determine how we want to fetch data.
- #703 other commands could also require scrolling depending on the parameter passed in
We add a PlanContext component to solve the problem. The PlanContext can be set during request handling and query analyzing. When building the physical plan, the planner uses it to decide which index scan mode to use.
5.3.4 Frontend: Request Handling
If the request is a scroll query request, set PlanContext so that this information is passed to the planner later. This same logic will apply to both SQL and PPL request handling.
5.3.5 Frontend: Response Formatting
5.3.6 Cursor generation and mapping
In the second cursor request can there be a scenario with some extra calculations along with push down. how do you retain entire physical plan?
In the second cursor request can there be a scenario with some extra calculations along with push down. how do you retain entire physical plan?
Thanks for pointing this out. I've come up with a design idea that takes this into consideration. It also avoids many of the problems faced in the PoC PR #693
- The design in #693 uses the the scroll ID returned from
OpenSearchScrollRequestas thecursorresponse for the sql/ppl request. New design hides it and exposes a new cursor we generated, so we no longer need to breakPhysicalPlaninterface to get the scroll ID. - retains the physical plan for extra calculations
relate to https://github.com/opensearch-project/sql/issues/947
What's the status of this?
WIP, you can track it in https://github.com/Bit-Quill/opensearch-project-sql/pull/226
https://user-images.githubusercontent.com/88679692/224208630-8d38d833-abf8-4035-8d15-d5fb4382deca.mp4
https://github.com/opensearch-project/sql/assets/88679692/f21754d0-c228-4959-b38a-736b36c4a2c2
- [x] - Pagination basement - support
select * from <table>#1666 - [x] - Support
WHEREclause - #1500 - [x] - Support
ORDER BYclause - #1599 - [ ] - Support
LIMITclause - WIP - [ ] - Support system queries - tracked by #1712
- [ ] - Support
OFFESTclause - [ ] - Support aggregation and
GROUP BYclause - [ ] - Support
NESTEDfunction - tracked by #1718 - [ ] - Support pagination in PPL
Superseded by #1759