[WIP] Implementing `WorkPackage.at_timestamp`
In the context of the baseline-comparison feature, this pull request adds convenience methods to retrieve historic data of journable objects like work packages.
Retrieve historic state of one journable
WorkPackage.find(1).at_timestamp(1.year.ago)
This historic state of the work package is still a WorkPackage object, but it contains data from the work_package_journals table. It knows that it represents historic data and, therefore, cannot be modified.
work_package = WorkPackage.find(1)
historic_work_package = work_package.at_timestamp(1.year.ago)
historic_work_package.id # => 1 (same as work_package.id)
historic_work_package.historic? # => true
historic_work_package.save # raises `ActiveRecord::ReadOnlyRecord`
Rollback to historic state of one journable
WorkPackage.find(1).at_timestamp(1.year.ago).rollback!
This updates the work-package record with the attributes from the historic record, which does not remove journal entries, but, in fact, creates a new journal entry.
WorkPackage.find(1).update! WorkPackage.find(1).at_timestamp(1.year.ago).attributes
Retrieve historic state of multiple journables
WorkPackage.at_timestamp(1.year.ago)
This returns an ActiveRecord::Relation that collects WorkPackage objects, which contain data from the work_package_journals table. These objects know that they represent historic data and, therefore, cannot be modified.
Filter on historic data
WorkPackage.at_timestamp(1.year.ago).where(assigned_to_id: 1)
# or, equivalently:
WorkPackage.where(assigned_to_id: 1).at_timestamp(1.year.ago)
This returns an ActiveRecord::Relation that represents all work packages that have been assigned to User.find(1) one year ago, regardless of whom they are assigned now.
Plucking on historic data
WorkPackage.at_timestamp(1.year.ago).where(assigned_to_id: 1).pluck :id
This returns the IDs of all WorkPackage records that have been assigned to User.find(1) one year ago.
These IDs are the IDs of the work-package records, not the work-package-journal records. Therefore, when fetching the work-package records from the database using these IDs, this will fetch the records in their current state, not the historic state.
work_package_ids = WorkPackage.at_timestamp(1.year.ago).where(assigned_to_id: 1).pluck :id
work_packages = WorkPackage.where(id: work_package_ids)
work_packages.first.historic? # => false
Sub queries
# Does not work, yet:
sub_query = WorkPackage.at_timestamp(1.year.ago).where(assigned_to_id: 1)
WorkPackage.where(id: sub_query)
This returns the IDs of all WorkPackage records that have been assigned to User.find(1) one year ago. As this is using a sub query, which is an ActiveRecord::Relation, this only hits the database once.
-
[ ] FIXME: This does not work, yet, because the wrapper tries to use the
work_package_journals.idcolumn rather than thework_package.idcolumn or thejournals.journable_idcolumn.As a workaround, you may use this less performant variant, which hits the database twice and uses more memory.
work_package_ids = WorkPackage.at_timestamp(1.year.ago).where(assigned_to_id: 1) WorkPackage.where(id: work_package_ids)
Advances queries
WorkPackage.at_timestamp(1.year.ago).maximum(:estimated_hours)
This finds largest estimated_hours considering all work packages in their states one year ago.
WorkPackage.at_timestamp(1.year.ago).group(:assigned_to_id).count # => {1 => 15, ...}
This counts how many work packages have been assigned to each user one year ago.
WorkPackage.at_timestamp(1.year.ago).group(:assigned_to_id).having('estimated_hours > 3').count # => {1 => 12, ...}
This counts how many work packages have been assigned to each user one year ago, but only considers work packages with hour estimates over 3 hours.
Cavests
Filtering on associated tables
When adding where conditions on associated tables, one might expect to also filter on historic data.
# Does not work:
WorkPackage.at_timestamp(1.year.ago).joins(:time_entries).where(time_entries: {hours: 1.0})
Because WorkPackage.at_timestamp only searches the journal of the work packages, not the journal of the time entries, this does not return work packages that had time entries of 1 hour one year ago, but instead returns the state of the work packages of one year ago that do now have time entries of 1 hour.
Manual selects
Watch out when using the id column in manually specified parts of the query.
The wrapper introduced with at_timestamp internally modifies the active-record relation to query the work-package-journals table rather than the work-packages table. So, for example, when running
WorkPackage.at_timestamp(1.day.ago).select("work_pacakges.id")
# raises `ActiveRecord::StatementInvalid: PG::UndefinedTable`
you would get an error because the work_packges table is not know to that query.
As a workaround, you may use the column journables.id, which is induced by the wrapper:
WorkPackage.at_timestamp(1.day.ago).select("journables.id")
Baseline comparison
Consider this pseudo API call to GET /api/v3/work_packages with the following parameters:
-
author_id: 1specifies the author user of the work packges -
offset: 2specifies the page number of the paginated result -
sortBy: [["status", "asc"]]specifies the sort order to the result -
timestamps: ["2021-09-04T13:21:14.452+00:00", "2022-09-04T13:21:14.452+00:00"]specifies the timestamps at which historic data should be searched
The result should contain
- the work packages that match the filter
author_idat any of the giventimestamps - the work-package data at all given
timestampsin order for the user interface to show what has changed comparing the data at thosetimestamps.
Using the convenience methods introduced by this pull request, this is still a several-steps process:
# given: params[:author_id], params[:offset], params[:sort_by], params[:timestamps]
# Apply filters
work_packages_relation = WorkPackage \
.where(author_id: params[:author_id])
# Get IDs of work packages matching the filters at any of the given timestamps.
# TODO: Use sub queries in this step, which does not work, yet.
work_package_ids = params[:timestamps].collect do |timestamp|
work_packages_relation.at_timestamp(timestamp).pluck(:id)
end.flatten.uniq
# Order the result
ordered_work_packages_relation = WorkPackage \
.where(id: work_package_ids) \
.order(params[:sort_by])
# Paginate the result
paginated_work_packages_relation = \
ordered_work_packages_relation \
.paginate(page: params[:offset])
# Of all the matching work packages on that page, retrieve the historic attributes
# at all of the given timestamps.
# TODO: Use sub queries in this step, which does not work, yet.
paginated_work_packages_by_timestamp = {}
params[:timestamps].each do |timestamp|
paginated_work_packages_by_timestamp[timestamp] = WorkPackage \
.where(id: paginated_work_packages_relation.pluck(:ids)) \
.at_timestamp(timestamp)
end
Remarks:
- I think, we need to retrieve the IDs of the matching work packages at all timestamps and order and paginate them together in order to achieve consistent pagination results.
- FIXME: Using sub queries is more efficient, but does not work, yet.
-
TODO: Check whether retrieving records for all timestamps in one single query, i.e.
WorkPackage.where(...).at_any_of_these_timestamps(params[:timestamps]), and splitting the result up later in ruby would improve performance.
See also
- Implementation Task: https://community.openproject.org/projects/openproject/work_packages/43483
- Epic: https://community.openproject.org/projects/openproject/work_packages/26448