data.gov icon indicating copy to clipboard operation
data.gov copied to clipboard

Create harvest source preprocess function in harvest.py

Open rshewitt opened this issue 1 year ago • 1 comments

User Story

In order to process a harvest source, the harvest runner needs a function to fetch harvest source information from the harvest database via a job id

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

  • [ ] GIVEN harvest.py
    WHEN the HarvestSource is instantiated
    THEN harvest source information will be fetched and assigned as attributes to the instance \

Background

  • the harvest runner is only being provided a job id of what to process
  • the runner will need to fetch the necessary harvest source information in order to process the harvest source ( e.g. url, type, organization, etc...)

Security Considerations (required)

[Any security concerns that might be implicated in the change. "None" is OK, just be explicit here!]

Sketch

  • refactor HarvestSource class to only require the job_id
    • refactor all HarvestSource instantiations in the repo
  • create preprocess function to read harvest source record
  • implement inside post_init function
  • create new class inheriting from the critical exception class and raise it appropriately during preprocess

rshewitt avatar Apr 26 '24 20:04 rshewitt

updated part of the sketch. this ticket involves refactoring the HarvestSource class and all tests referencing it so some additional time will be needed.

rshewitt avatar Apr 29 '24 16:04 rshewitt