syncstorage-rs icon indicating copy to clipboard operation
syncstorage-rs copied to clipboard

Investigate pagination improvements

Open pjenvey opened this issue 5 years ago • 3 comments

Our basic pagination using LIMIT + OFFSET causes performance issues under Spanner. We should look into implementing alternatives, e.g.:

https://medium.com/swlh/why-you-shouldnt-use-offset-and-limit-for-your-pagination-4440e421ba87

Could the previous offset optimization improve performance? Or could we overload the extra metadata it uses via the offset parameter (e.g. <value>:offset) for implementing cursor based pagination?

pjenvey avatar Oct 23 '20 23:10 pjenvey

I think if we use the previous offset optimization, except have every timestamp always be unique and have : always be 0, this will fix the problem and avoid the unnecessary complicated logic for dealing with records that have identical timestamps.

fzzzy avatar Oct 26 '20 23:10 fzzzy

So, we have a few things going on here that are of note.

  1. posts have a "modification_time" assigned to them when they're loaded in. For any BSO loaded in batch, they get the same timestamp. This leads to a lot of duplication as noted above.

  2. spanner has a [commit_timestamp()](https://cloud.google.com/spanner/docs/commit-timestamp), which is great. Apparently we can add it to an existing TIMESTAMP column, I presume it become effective after it is applied, which may be a problem for the TB or so of existing data. MySQL does not have a similar feature.

  3. of all the known BSO data, BSO_ID is known to be unique and could be used to disambiguate items or provide some order when processing records to avoid breaks.

It should be possible, as @fzzzy pointed out, to effectively batch on modified with a sub-index of the id. This would remove most of the table scan, but will require some additional index work on the remainder. I don't know how that may impact performance at this time.

jrconlin avatar Nov 04 '20 23:11 jrconlin

Pending #911 for tests fixes.

jrconlin avatar Nov 13 '20 00:11 jrconlin