hepdata icon indicating copy to clipboard operation
hepdata copied to clipboard

records: provide a CLI command to change or remove the INSPIRE ID of a record

Open GraemeWatt opened this issue 5 years ago • 4 comments

Occasional requests are made to change the INSPIRE ID of a record, for example, from a preliminary note (e.g. 1643435) to the corresponding final publication (e.g. 1664330). So far I've dealt with these requests manually by executing a combination of SQL, Python, and CLI commands. This is a cumbersome procedure and it would be better to provide a single CLI command that performs the following steps to replace an old_inspire_id with a new_inspire_id for a given publication_recid.

  1. Replace inspire_id in the HEPSubmission object (or objects if more than one version) with the given publication_recid.
  2. Update last_updated in the HEPSubmission object to the current timestamp (e.g. timezone('utc', now())::timestamp(4)) so that the record will be picked up by the nightly INSPIRE harvesting job.
  3. Replace publication_inspire_id in the DataSubmission objects with the same publication_recid.
  4. Get a list of associated_recid values for all the DataSubmission objects with the same publication_recid.
  5. Update the INSPIRE ID and last updated timestamp in the record metadata, i.e. something like:
from hepdata.modules.records.utils.common import get_record_by_id
from hepdata.modules.submission.api import get_latest_hepsubmission
from invenio_db import db
from datetime import datetime
hepsubmission = get_latest_hepsubmission(publication_recid=publication_recid)
last_updated = datetime.strftime(hepsubmission.last_updated, '%Y-%m-%d %H:%M:%S')
for recid in recids:
    record = get_record_by_id(recid)
    record['inspire_id'] = int(new_inspire_id)
    record['last_updated'] = last_updated
    record.commit()
db.session.commit()

Here recids is a list of all publication_recid values of the HEPSubmission objects and all associated_recid values of the DataSubmission objects.

  1. Call update_record_info from hepdata.modules.records.utils.records_update_utils with new_inspire_id as the first argument and an optional second argument to send an email to the original submission participants.
  2. Delete any cached files in the converted directory with old_inspire_id in the filenames.

GraemeWatt avatar Jan 07 '21 21:01 GraemeWatt

After changing the INSPIRE ID of a HEPData record, the new INSPIRE record will subsequently display a link to the HEPData record with the new INSPIRE ID after the nightly harvesting. However, the old INSPIRE record will still display a broken link ("datasets") to the HEPData record with the old INSPIRE ID. Currently, this broken link needs to be removed manually by making a request to the INSPIRE curation team (Paulina Baranowska).

GraemeWatt avatar Feb 02 '21 11:02 GraemeWatt

It should be checked that the implementation of the CLI command works also for unfinished records or multiple versions of a record (with the latest version being unfinished), see #403.

GraemeWatt avatar Oct 05 '21 18:10 GraemeWatt

Also useful would be a command to remove the INSPIRE ID of an unfinished record in case it has been added by mistake but the correct INSPIRE ID is not yet known (because the corresponding publication is not yet on the arXiv). In this case, the inspire_id of the HEPSubmission object should be set to null and record['inspire_id'] set to None. Publication information associated with the wrongly attached INSPIRE ID should be set back to the default values, perhaps with the title given as a command-line argument. This situation arose today for a CMS record in preparation.

GraemeWatt avatar Jan 07 '22 18:01 GraemeWatt

A complication I just encountered is when the old and new INSPIRE ID have the same publication information, then update_record_info returns 'No update needed', and the last part of update_record_info (indexing and updating metadata stored in DataCite) is unintentionally skipped.

GraemeWatt avatar Apr 10 '23 22:04 GraemeWatt