scrapi
scrapi copied to clipboard
A data processing pipeline that schedules and runs content harvesters, normalizes their data, and outputs that normalized data to a variety of output streams. This is part of the SHARE project, and wi...
Adds preliminary support for a `documentType` field for `crossref` & `plos` harvesters
This is the new commit for the nist harvester fix, with only the proper files in the commit.
New Feature Keeps track of when sources were most recently harvested. A new database model called LastHarvest keeps records of each source and the most recent date when it was...
Here is the changed url query by provider update time. I changed it so it didn't create a new url pattern, but rather used an existing one.
Allow users to query by sources in web API as a query parameter
http://api.openaire.eu/ - [ ] Go through API docs and see which would be the best fit - [x] Add harvester
OAI base url: http://dspace.univ-bouira.dz:8080/oai/request create using auto oai - https://github.com/erinspace/autooai
Add tasks and process for querying the status of URLs, and inforation gathering about contributors in scrapi normalized documents. Addresses [#SHARE-105] for contributor gathering, and improves URL processing by saving...