datasets icon indicating copy to clipboard operation
datasets copied to clipboard

Include the refs belonging to each siva file to the PGA index

Open vmarkovtsev opened this issue 6 years ago • 4 comments

The current PGA index format does not allow to understand under which references a given URL is written. For example, tensorflow/tensorflow belongs to 2 siva files, the first has two heads and the second - tens. I need to have the full references collection in the second CSV column, e.g.

"c19e4a1b8c7f458fa4d6b0978e2a14ef8c2a2ff2.siva[refs/heads/<uuid>,refs/whatever/<uuid>],f41959ccb2d9d4c722fe8fc3351401d53bcf4900.siva[refs/heads/<uuid>,...]"

vmarkovtsev avatar Aug 09 '19 13:08 vmarkovtsev

UUID for each repository in the siva files can be found as remote data. The name of the remote is the UUID that can be used to filter repositories and you can identify them by its endpoint. References can be filtered with this regexp .*\/<uuid>$.

jfontan avatar Aug 09 '19 14:08 jfontan

I will collect the mapping and include it in the dataset because this is a common issue for the team.

vmarkovtsev avatar Aug 09 '19 17:08 vmarkovtsev

Done: heads.csv.gz

cc/ @r0mainK

vmarkovtsev avatar Aug 11 '19 08:08 vmarkovtsev

I need to update the dataset on Monday.

vmarkovtsev avatar Aug 11 '19 08:08 vmarkovtsev