openproject icon indicating copy to clipboard operation
openproject copied to clipboard

[WIP] Implementing `WorkPackage.at_timestamp`

Open fiedl opened this issue 3 years ago • 0 comments

In the context of the baseline-comparison feature, this pull request adds convenience methods to retrieve historic data of journable objects like work packages.

Retrieve historic state of one journable

WorkPackage.find(1).at_timestamp(1.year.ago)

This historic state of the work package is still a WorkPackage object, but it contains data from the work_package_journals table. It knows that it represents historic data and, therefore, cannot be modified.

work_package = WorkPackage.find(1)
historic_work_package = work_package.at_timestamp(1.year.ago)

historic_work_package.id         # => 1  (same as work_package.id)
historic_work_package.historic?  # => true
historic_work_package.save       # raises `ActiveRecord::ReadOnlyRecord`

Rollback to historic state of one journable

WorkPackage.find(1).at_timestamp(1.year.ago).rollback!

This updates the work-package record with the attributes from the historic record, which does not remove journal entries, but, in fact, creates a new journal entry.

WorkPackage.find(1).update! WorkPackage.find(1).at_timestamp(1.year.ago).attributes

Retrieve historic state of multiple journables

WorkPackage.at_timestamp(1.year.ago)

This returns an ActiveRecord::Relation that collects WorkPackage objects, which contain data from the work_package_journals table. These objects know that they represent historic data and, therefore, cannot be modified.

Filter on historic data

WorkPackage.at_timestamp(1.year.ago).where(assigned_to_id: 1)

# or, equivalently:
WorkPackage.where(assigned_to_id: 1).at_timestamp(1.year.ago)

This returns an ActiveRecord::Relation that represents all work packages that have been assigned to User.find(1) one year ago, regardless of whom they are assigned now.

Plucking on historic data

WorkPackage.at_timestamp(1.year.ago).where(assigned_to_id: 1).pluck :id

This returns the IDs of all WorkPackage records that have been assigned to User.find(1) one year ago.

These IDs are the IDs of the work-package records, not the work-package-journal records. Therefore, when fetching the work-package records from the database using these IDs, this will fetch the records in their current state, not the historic state.

work_package_ids = WorkPackage.at_timestamp(1.year.ago).where(assigned_to_id: 1).pluck :id
work_packages = WorkPackage.where(id: work_package_ids)

work_packages.first.historic?  # => false

Sub queries

# Does not work, yet:
sub_query = WorkPackage.at_timestamp(1.year.ago).where(assigned_to_id: 1)
WorkPackage.where(id: sub_query)

This returns the IDs of all WorkPackage records that have been assigned to User.find(1) one year ago. As this is using a sub query, which is an ActiveRecord::Relation, this only hits the database once.

  • [ ] FIXME: This does not work, yet, because the wrapper tries to use the work_package_journals.id column rather than the work_package.id column or the journals.journable_id column.

    As a workaround, you may use this less performant variant, which hits the database twice and uses more memory.

    work_package_ids = WorkPackage.at_timestamp(1.year.ago).where(assigned_to_id: 1)
    WorkPackage.where(id: work_package_ids)
    

Advances queries

WorkPackage.at_timestamp(1.year.ago).maximum(:estimated_hours)

This finds largest estimated_hours considering all work packages in their states one year ago.

WorkPackage.at_timestamp(1.year.ago).group(:assigned_to_id).count  # => {1 => 15, ...}

This counts how many work packages have been assigned to each user one year ago.

WorkPackage.at_timestamp(1.year.ago).group(:assigned_to_id).having('estimated_hours > 3').count  # => {1 => 12, ...}

This counts how many work packages have been assigned to each user one year ago, but only considers work packages with hour estimates over 3 hours.

Cavests

Filtering on associated tables

When adding where conditions on associated tables, one might expect to also filter on historic data.

# Does not work:
WorkPackage.at_timestamp(1.year.ago).joins(:time_entries).where(time_entries: {hours: 1.0})

Because WorkPackage.at_timestamp only searches the journal of the work packages, not the journal of the time entries, this does not return work packages that had time entries of 1 hour one year ago, but instead returns the state of the work packages of one year ago that do now have time entries of 1 hour.

Manual selects

Watch out when using the id column in manually specified parts of the query.

The wrapper introduced with at_timestamp internally modifies the active-record relation to query the work-package-journals table rather than the work-packages table. So, for example, when running

WorkPackage.at_timestamp(1.day.ago).select("work_pacakges.id")
# raises `ActiveRecord::StatementInvalid: PG::UndefinedTable`

you would get an error because the work_packges table is not know to that query.

As a workaround, you may use the column journables.id, which is induced by the wrapper:

WorkPackage.at_timestamp(1.day.ago).select("journables.id")

Baseline comparison

Consider this pseudo API call to GET /api/v3/work_packages with the following parameters:

  • author_id: 1 specifies the author user of the work packges
  • offset: 2 specifies the page number of the paginated result
  • sortBy: [["status", "asc"]] specifies the sort order to the result
  • timestamps: ["2021-09-04T13:21:14.452+00:00", "2022-09-04T13:21:14.452+00:00"] specifies the timestamps at which historic data should be searched

The result should contain

  • the work packages that match the filter author_id at any of the given timestamps
  • the work-package data at all given timestamps in order for the user interface to show what has changed comparing the data at those timestamps.

Using the convenience methods introduced by this pull request, this is still a several-steps process:

# given: params[:author_id], params[:offset], params[:sort_by], params[:timestamps]

# Apply filters
work_packages_relation = WorkPackage \
    .where(author_id: params[:author_id])

# Get IDs of work packages matching the filters at any of the given timestamps.
# TODO: Use sub queries in this step, which does not work, yet.
work_package_ids = params[:timestamps].collect do |timestamp|
  work_packages_relation.at_timestamp(timestamp).pluck(:id)
end.flatten.uniq

# Order the result
ordered_work_packages_relation = WorkPackage \
    .where(id: work_package_ids) \
    .order(params[:sort_by])

# Paginate the result
paginated_work_packages_relation = \
    ordered_work_packages_relation \
    .paginate(page: params[:offset])
    
# Of all the matching work packages on that page, retrieve the historic attributes
# at all of the given timestamps.
# TODO: Use sub queries in this step, which does not work, yet.
paginated_work_packages_by_timestamp = {}
params[:timestamps].each do |timestamp|
  paginated_work_packages_by_timestamp[timestamp] = WorkPackage \
      .where(id: paginated_work_packages_relation.pluck(:ids)) \
      .at_timestamp(timestamp)
end

Remarks:

  • I think, we need to retrieve the IDs of the matching work packages at all timestamps and order and paginate them together in order to achieve consistent pagination results.
  • FIXME: Using sub queries is more efficient, but does not work, yet.
  • TODO: Check whether retrieving records for all timestamps in one single query, i.e. WorkPackage.where(...).at_any_of_these_timestamps(params[:timestamps]), and splitting the result up later in ruby would improve performance.

See also

  • Implementation Task: https://community.openproject.org/projects/openproject/work_packages/43483
  • Epic: https://community.openproject.org/projects/openproject/work_packages/26448

fiedl avatar Sep 04 '22 10:09 fiedl