6900: Add activity stream pagination to all entities
Fixes #6900
Proposed fixes:
The goal of this PR is to implement the same pagination logic we have for datasets on the activities for Organization, Group and Users.
I will also take some time to clean some code and propose some improvements.
Notes:
- I moved forgotten webassets to the activity extension: f7d50a6d65d5fe6c3baac9487fd24ff23d96b86a
- I'm simplifying some one-line functions that make the code hard to follow: ba85b0f98db445e5cdd56167ccefc3ab8781966b
- Rename a template to avoid confusion: aade8d2ebeeeb24ddeb295801d18f2b1d6efc8f1
- Created new snippets for reusability 93afe20661b0519df59870d1a61b05b9cbf286d1
- Make some functions more explicit and avoid calling so many nested functions: f79904758e1e4993365d7c36065b15e49b195f22
Features:
- [X] includes tests covering changes
- [ ] includes updated documentation
- [X] includes user-visible changes
- [ ] includes API changes
- [ ] includes bugfix for possible backport
Please [X] all the boxes above that apply
While working on this PR I have been doing some benchmark.
While filtering by an indexed datetime field is faster that offset, it's worth noticing that the performance of SQLAlchemy's offset/limit is not bad:
In [29]: with app._wsgi_app.app_context():
...: %timeit helpers.call_action('package_activity_list',{}, id='api-package-1', offset=9500)
...:
23.7 ms ± 202 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [30]: before = datetime.fromisoformat('2022-06-16T14:14:00.627446').timestamp()
In [31]: with app._wsgi_app.app_context():
...: %timeit helpers.call_action('package_activity_list',{}, id='api-package-1', before=before)
...:
2.26 ms ± 13.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
It takes only a couple of ms to get 20 activities with an offset of 9500 (in a package containing 10k activities). Navigating back and forward using offset is quite quick and it does not add a lot of overhead. Rendering the activity page using offsets takes an average of 850ms which is inside the normal parameters of CKAN rendering performance.
It is also worth noticing that the complexity of implementing pagination with offset/limit is way more simpler that implementing it with date time indexes.
Also, flask-sqlalchemy implements it's pagination feature using offset too: https://github.com/pallets-eco/flask-sqlalchemy/blob/e5bf821072de0d55ff92891432004322864a7265/src/flask_sqlalchemy/init.py#L542
@pdelboca I get that you prefer to go back to using offset and limit, but I don't understand the complexity argument.
In addition to worse database performance, offsets aren't stable so links to activity pages using offsets can't be shared and reused.
Could you explain the difficulties you have when using before and after? It shouldn't be any more complicated than grabbing the newest activity date and oldest activity date from the list returned from the API right?
I'm gonna give it a try @wardi ! I'm gonna work on this a little bit more and try to come out with more clear thoughts about the solution.
At this point I think it's just a matter of personal preference given the time I can spend working on this. I don't see that much benefit on coding and maintaining two pagination logic when one of them is just good enough :P
offset and limit are kept just for the APIs to maintain backwards compatibility, but it turned out that in #6798 it could be useful to combine limit with before or after, so keeping it makes sense.
Some thoughs on the complexities I'm finding while developing the blueprint logic:
- We are already imposing a hard limit on the amount of activities to show via configs. Using both
beforeandafterdoesn't make any sense. What if in betweenbeforeandafterare more thanckan.activity_list_limit? What should be the values ofbeforeandafterfor this scenario?
Using before + limit makes sense. But it has a complexity: it's a simple linked list where navigation is only forward. I don't have any data pointing to the previous pagination. Even if I save a came_from parameter, I no longer know where I came from when I navigate backwards once. This problem doesn't appear when using offset since I'm always moving forward/backwards in a fixed block.
In fact, I'm not sure how you can navigate backwards a Linked List.
Gonna try to explain.
Action 1: Navigate to Activity Stream
- The first time I load the page (aka, no before param) I load
beforenow andlimit✅ - Value of
prev_pageactivities isNonebecause we do not have abeforeparam. ✅ - Value of
next_pageis the timestamp of the last element of the list. ✅
Action 2: The user clicks on Older Activities
- I receive the
beforeparam so I query forbefore+limit. ✅ - Value of
prev_pageis...? Based on what? How I calculate the timestamp that will take me to the previous page I just came from? ❓ - Value of
next_pageis again the timestamp of the last element of the list. ✅
So, I don't have issues by calculating the before argument (aka, navigation forward), since it will always be the timestamp of the last element of the list. But I cannot calculate what to put in prev_page with the data I have in the request. That forces me to send and extra param in the request. (I guess that's the intention of after in the current implementation?).
So let's add a came_from argument to the parameters to keep track of where are we coming from while we take a look at older activities. (so I know where to go back to)
Action 1: Navigate to Activity Stream
- The first time I load the page (aka, no before param) I load
beforenow andlimit✅ - Value of
prev_pageactivities isNonebecause we do not have abeforeparam. ✅ - Value of
next_pageis the timestamp of the last element of the list. ✅ - I send an extra parameter
came_fromto persist the currentbefore. (So I know where to come back) ✅
Action 2: The user clicks on Older Activities
- I receive the
beforeparam so I query forbefore+limit. ✅ - Value of
prev_pagebefore param is based oncame_from. Now I know where to navigate to go back. ✅ - Value of
next_pageis again the timestamp of the last element of the list. ✅
Action 3: The user clicks on Newer Activities:
- As I have a
came_fromparam, I know we are navigating backwards so I use it for the newbefore+limit✅ - Value of
next_pageis again the timestamp of the last element of the list. ✅ - Value of
prev_pageis....? Again I'm stucked. I don't have acame_fromto use. I don't know where to go when navigating backwards.
So my basic problem is that I don't know how to nagivate backwards only using before + limit params. I don't know how to properly persist or calculate the limits of the pagination in between calls while navigating backwards.
Maybe I'm drowning in a glass of water here but I'm not sure how to proceed.
For previous page you take the oldest activity currently displayed and use its date as an "after" parameter. This will give you the next page of activities after that date.
Thank you for your work on this @pdelboca