Feature/276 harvesting metadata from a provided repository url
Added the URL to the hermes harvest command. Now, the command hermes harvest harvest the metadata from the local repository, and hermes harvest --url <URL> allows harvesting metadata from the provided URL, with support for GitHub and GitLab repositories.
(e.g., hermes harvest --url https://github.com/NFDI4Energy/SMECS)
@sferenz Could you please take a look at this pull request and share your feedback?
Harvesting metadata from the provided URL (GitHub/GitLab). Command: hermes harvest --path <URL>
@sferenz Could you please take a look at this pull request and share your feedback?
Thanks for the nice code! Please have a look at the comments :)
@sferenz Thank you for the comments.
@sdruskat This pull request is ready to merge, can you please assign us a reviewer?
Thanks for your work!
I had a first look and I would like to suggest a slightly different approach. I think it would be beneficial to have the
--urlargument that you had (as indicated by your PR description). This would allow us to do the following:
- Create a temporary directory
- download the remote repository given by
--urlto this directory- overwrite
args.pathwith the temporary directory path- run the normal harvesting step
- delete the temporary directory
In this case there is no need to change anything in any of the plugins (I think). Only the base harvest command needs to worry about downloading and then deleting the files.
What do you think?
Thanks! I’ve implemented this approach and am testing it with a few different repositories. Quick note: Cloning large repositories takes too long, it would be good to replace full clones with a shallow clone to check only for CITATION.cff or codemeta.json later.