Refactoring points calculation code
(From #880, delayed feedback)
There are several different places that find the final submission (best/last/marked as final) for one or many students. We should try to find a unified way of doing this, or at least reduce the number of different implementations. Especially now that the "delayed feedback" and "mark as final" updates have made the logic of finding the final submission much more complicated.
Here are the alternative implementations:
annotate_submitter_points
https://github.com/apluslms/a-plus/blob/8a852f917e815876892ffa82eeca3e08d82d4882/exercise/submission_models.py#L31
- considers delayed feedback
- considers grading mode
- considers mark as final
- works with multiple users at a time
- works with multiple exercises at a time
- not cached
UserExerciseSummary
https://github.com/apluslms/a-plus/blob/8a852f917e815876892ffa82eeca3e08d82d4882/exercise/exercise_summary.py#L13
- does not consider delayed feedback (caller is required to filter points)
- considers grading mode
- considers mark as final
- works with one user at a time
- works with one exercise at a time
- not cached
CachedPoints
https://github.com/apluslms/a-plus/blob/8a852f917e815876892ffa82eeca3e08d82d4882/exercise/cache/points.py#L37
- considers delayed feedback
- considers grading mode
- considers mark as final
- works with one user at a time
- works with multiple exercises at a time
- cached
filter_best_submissions
https://github.com/apluslms/a-plus/blob/8a852f917e815876892ffa82eeca3e08d82d4882/exercise/api/csv/submission_sheet.py#L10
- considers delayed feedback
- considers grading mode
- considers mark as final
- works with multiple users at a time
- works with multiple exercises at a time
- not cached
CachedPoints is used for these things (mostly the first one):
- get user's points/submissions/best submission for a specific exercise (API and otherwise)
- get user's points/submissions/best submission for all modules and exercises in a course instance (in API and instance views)
- get user's points/submissions/best submission for some set of exercises speified by query params (in API)
- get user's points for whole course instance
UserExerciseSummary is used for these things (mostly the first one):
- get user's best submission for an exercise (or a set of exercises using a loop)
- get all user's submissions for an exercise
- get user's points for an exercise (or a set of exercises using a loop)
annotate_submitter_points just annotates the total points for an exercise (i.e. finds the best submission) and is used for the following:
- submitter list view
- all results page
-
CourseAggregateDataViewSet
filter_best_submissions is only used once in CourseSubmissionDataViewSet.
Based on these, I propose that we store each exercise's data separately in the cache instead of having a single dump of the whole instance. The instance would also be cached but it would only contain the ids of the exercises instead of the actual data. If one needs both the exercises of an instance, they can just fetch twice: first to get the instance and then to get the exercises. Doing two fetches seems to be a normal thing for storing collections in memcached and should only have a trivial effect on performance.
This method would make UserExerciseSummary trivial to consolidate in to the cache and works well with how CachedPoints is used currently. Whether CourseSubmissionDataViewSet could be implemented using the cache instead of filter_best_submissions requires a bit more thinking but at the very least we should be able to make a single best_submissions function that can be used by both the cache and CourseSubmissionDataViewSet. annotate_submitter_points would be harder to replace as it works in SQL but I'd take a look at implementing the three use cases using the cache instead. It might even make all results page faster and simpler.
Also, we wouldn't need to invalidate everything after every small change. E.g. a submission would only invalidate the one exercise and the instance instead of all exercises and the instance.
@lainets Sounds good! To elaborate, does the course instance points cache item include any user-specific data or just the exercise ids? Does one exercise points cache item include one user's points in one exercise?
Querying the exercise points cache is easy with the exercise id and the user id, right?
@lainets Sounds good! To elaborate, does the course instance points cache item include any user-specific data or just the exercise ids? Does one exercise points cache item include one user's points in one exercise?
The current code calculates some overall stats for the user that are stored at the instance level. Only caching the exercises would make the resulting architecture simpler but we'll have to test if calculating the aforementioned stats on demand is too slow.
Querying the exercise points cache is easy with the exercise id and the user id, right?
Yes
The current code calculates some overall stats for the user that are stored at the instance level. Only caching the exercises would make the resulting architecture simpler but we'll have to test if calculating the aforementioned stats on demand is too slow.
That's OK. I didn't first realize that there are course-level user stats stored in the cache. I am not against having user-specific course-level points cache. Your suggestion sounds good (both the original suggestion and the later).
Indeed, it is best to test the performance in practice to find the fastest solution.
#1231 fixed some parts of this issue. See the pull request description.