Add "Code Retrieval"
The README says:
Code Retrieval: retrieve and store git repositories as a dataset.
However, borges is not currently integrated and the current commands do not deliver this promise.
I request the borges integration.
Was browsing issues to check if exactly this request has been logged before.
Second Vadim on this one: e.g. on use-case of trying out engine on all repositories of my org, that may not be in PGA yet, I would really love engine doing the cloning part for me as well.
So instead of
Before
$ curl -v "https://api.github.com/orgs/<org-name>/repos?per_page=100" | jq -r '.[].git_url' > org-repos.txt
# and then manually repeating this for <https://api.github.com/organizations/15128793/repos?page=2>; rel="next"
$ borges pack --root-repositories-dir=./siva org-repos.txt
$ srcd init ./siva
$ srcd sql
one could get by with something like
After
$ srcd init github.com/<org-name>/*
$ srcd sql
It would be really nice to have progress-indicator in CLI and I guess feature in general depends on availability of the
- https://github.com/src-d/okrs/#refactor-to-go-borges-library-to-reuse-borges-logic-in-other-projects-ie-lookout-dr
- I belive it's in https://github.com/src-d/go-borges
As a user that is not familiarized with all source{d} projects deeply, and with no knowledge of Go and of hardcore software engineering, I don't feel comfortable using borges myself, so I had to collect src-d repos using tutorials available online, especifically https://medium.com/@kevinsimper/how-to-clone-all-repositories-in-a-github-organization-8ccc6c4bd9df
My final command line solution to clone all repos from an organization was
$ curl -H "Authorization: token AUTH_TOKEN" -s https://$GITHUB_AT:@api.github.com/orgs/src-d/repos\?page\=1\&per_page\=100\&type\=all | jq '.[].ssh_url' | xargs -n 1 git clone
It would be really useful to have the option to clone the repos from an organization using the engine itself.