records: provide a CLI command to change or remove the INSPIRE ID of a record
Occasional requests are made to change the INSPIRE ID of a record, for example, from a preliminary note (e.g. 1643435) to the corresponding final publication (e.g. 1664330). So far I've dealt with these requests manually by executing a combination of SQL, Python, and CLI commands. This is a cumbersome procedure and it would be better to provide a single CLI command that performs the following steps to replace an old_inspire_id with a new_inspire_id for a given publication_recid.
- Replace
inspire_idin the HEPSubmission object (or objects if more than one version) with the givenpublication_recid. - Update
last_updatedin the HEPSubmission object to the current timestamp (e.g.timezone('utc', now())::timestamp(4)) so that the record will be picked up by the nightly INSPIRE harvesting job. - Replace
publication_inspire_idin the DataSubmission objects with the samepublication_recid. - Get a list of
associated_recidvalues for all the DataSubmission objects with the samepublication_recid. - Update the INSPIRE ID and last updated timestamp in the record metadata, i.e. something like:
from hepdata.modules.records.utils.common import get_record_by_id
from hepdata.modules.submission.api import get_latest_hepsubmission
from invenio_db import db
from datetime import datetime
hepsubmission = get_latest_hepsubmission(publication_recid=publication_recid)
last_updated = datetime.strftime(hepsubmission.last_updated, '%Y-%m-%d %H:%M:%S')
for recid in recids:
record = get_record_by_id(recid)
record['inspire_id'] = int(new_inspire_id)
record['last_updated'] = last_updated
record.commit()
db.session.commit()
Here recids is a list of all publication_recid values of the HEPSubmission objects and all associated_recid values of the DataSubmission objects.
- Call
update_record_infofromhepdata.modules.records.utils.records_update_utilswithnew_inspire_idas the first argument and an optional second argument to send an email to the original submission participants. - Delete any cached files in the
converteddirectory withold_inspire_idin the filenames.
After changing the INSPIRE ID of a HEPData record, the new INSPIRE record will subsequently display a link to the HEPData record with the new INSPIRE ID after the nightly harvesting. However, the old INSPIRE record will still display a broken link ("datasets") to the HEPData record with the old INSPIRE ID. Currently, this broken link needs to be removed manually by making a request to the INSPIRE curation team (Paulina Baranowska).
It should be checked that the implementation of the CLI command works also for unfinished records or multiple versions of a record (with the latest version being unfinished), see #403.
Also useful would be a command to remove the INSPIRE ID of an unfinished record in case it has been added by mistake but the correct INSPIRE ID is not yet known (because the corresponding publication is not yet on the arXiv). In this case, the inspire_id of the HEPSubmission object should be set to null and record['inspire_id'] set to None. Publication information associated with the wrongly attached INSPIRE ID should be set back to the default values, perhaps with the title given as a command-line argument. This situation arose today for a CMS record in preparation.
A complication I just encountered is when the old and new INSPIRE ID have the same publication information, then update_record_info returns 'No update needed', and the last part of update_record_info (indexing and updating metadata stored in DataCite) is unintentionally skipped.