dce Query Parameters for API Pagination

The prior art in the repo has us:

Return a Link header, which includes the URL for the next page of results
The URL includes query parameters for the "next' primary key in DynDB. eg: GET /leases?principalId=&nextPrincipalId=&nextAccountId=&

That URL will look a little nicer if we can treat Lease IDs as a primary key GET /leases?principalId=&nextId= ...though we may actually need to migrate to using Lease.ID as a primary key to get that to work (today it's a GSI, for historical reason...).

One current pain point with pagination queries is that if your DB query is using a FilterExpression, DynamoDB will potentially return you some pages of empty results, in between pages of actual results. This is because DynDB grabs a page of data, and then filters out values. This is not a great experience for end users. I'd like to either:

Only support API query params that can be queried against an index (no FilterExpressions
Hide this behavior from end users (eg. continue to paginate server-side until we get the requested number of records)

option 1. would be much simpler, if we're ok with it. For example, the GET /leases?status= request should be using our GSI on LeaseStatus (currently it's doing a full table scan)

Nov 04 '19 17:11 eschwartz

Here I would focus on the experience of the consumers of the RESTful API and try to go backwards from there unless we absolutely cannot because of how DDB works. As an example: GET /leases?leaseStatus=active&page=2&size=5&sort=principalId is a common-ish format for REST APIs that support paging and sorting (I've also seen limit or count instead of size--I think we just pick one and make it the standard).

That seems more intuitive to me than next....

Nov 04 '19 17:11 nathanagood

@nathanagood I think I generally agree with you.

For context, the ?nextPrincipalId=&nextAccountId= syntax is a direct reflection of the DynamoDB api -- for pagination, it gives you the primary key of the next available record in the table (in the case of leases, we have a compound primary key of principal/account).

We do also currently support a ?limit= param, too. The page param gets as little tricky to translate into a DynDB query though.

Another idea I'd had was to convert the next<PrimaryKey> params into a single "pagination token"(eg. just a encoded representation of the "next" keys). We could have our API respond with the pagination token in a response header (so we don't have to restructure our JSON objects), and then clients can pass the token into subsequent requests

GET /leases?paginationToken=bmV4dFByaW5jaXBhbElkPWpkb2UxMjM0Jm5leHRBY2NvdW50SWQ9MTIzNDU2Nzg5MDEy`

...or you could take it in a request header, too.

This would do a couple things:

Combine the principalId/accountId into a single param for /leases endpoints
Create a common interface for paginating different endpoints (can be used with the same syntax for /accounts and /usages, etc)
It may be a more familiar interface for general users (I've seen pagination tokens before, never seen a "nextId" param before)
IMO it's a little less work for clients than having to parse the Link header, and regenerate the query string.

For example, here's the code we're currently using for our UI portal to paginate leases:

        # If there's no link header
        # then we're done paginating
        if 'Link' not in res.headers:
            return

        # Inspect the `Link` response header
        # to get the next page of results
        link_search = re.search(
            '<(http.+)>; rel="next"',
            res.headers['Link'],
        )
        if link_search is None:
            raise Exception(
                f"Failed to paginate request to {method} {endpoint}: "
                f"Link header is malformed: {res.headers['Link']}")
        link_url = link_search.group(1)

        # Parse the Link URL
        # so we can pass the values to our next request
        parsed_url = urlparse(link_url)

        # We need to do some funky reconstruction of our URL here,
        # because we already have a `request()` method abstraction around requests
        # that uses a configured hostname and endpoint prefix.
        endpoint = f"{parsed_url.scheme}://{parsed_url.hostname}{parsed_url.path}" \
            .replace(self._base_url, "")

        self.request("GET", endpoint)

Nov 05 '19 15:11 eschwartz

The issue I find with the pagination token is that it does not allow end consumers to generate API urls in a predictable manner through simple string interpolation. Finding to way to get closer to an page / size or offset / limit is the answer here, I think.

Nov 12 '19 19:11 marinatedpork