hermes icon indicating copy to clipboard operation
hermes copied to clipboard

Feature/276 harvesting metadata from a provided repository url

Open Aidajafarbigloo opened this issue 1 year ago • 6 comments

Added the URL to the hermes harvest command. Now, the command hermes harvest harvest the metadata from the local repository, and hermes harvest --url <URL> allows harvesting metadata from the provided URL, with support for GitHub and GitLab repositories.

(e.g., hermes harvest --url https://github.com/NFDI4Energy/SMECS)

Aidajafarbigloo avatar Oct 29 '24 08:10 Aidajafarbigloo

@sferenz Could you please take a look at this pull request and share your feedback?

Aidajafarbigloo avatar Oct 29 '24 08:10 Aidajafarbigloo

Harvesting metadata from the provided URL (GitHub/GitLab). Command: hermes harvest --path <URL>

Aidajafarbigloo avatar Jan 31 '25 10:01 Aidajafarbigloo

@sferenz Could you please take a look at this pull request and share your feedback?

Aidajafarbigloo avatar Jan 31 '25 10:01 Aidajafarbigloo

Thanks for the nice code! Please have a look at the comments :)

@sferenz Thank you for the comments.

Aidajafarbigloo avatar Feb 03 '25 10:02 Aidajafarbigloo

@sdruskat This pull request is ready to merge, can you please assign us a reviewer?

sferenz avatar Apr 14 '25 14:04 sferenz

Thanks for your work!

I had a first look and I would like to suggest a slightly different approach. I think it would be beneficial to have the --url argument that you had (as indicated by your PR description). This would allow us to do the following:

  1. Create a temporary directory
  2. download the remote repository given by --url to this directory
  3. overwrite args.path with the temporary directory path
  4. run the normal harvesting step
  5. delete the temporary directory

In this case there is no need to change anything in any of the plugins (I think). Only the base harvest command needs to worry about downloading and then deleting the files.

What do you think?

Thanks! I’ve implemented this approach and am testing it with a few different repositories. Quick note: Cloning large repositories takes too long, it would be good to replace full clones with a shallow clone to check only for CITATION.cff or codemeta.json later.

Aidajafarbigloo avatar Oct 28 '25 12:10 Aidajafarbigloo