Security-Datasets icon indicating copy to clipboard operation
Security-Datasets copied to clipboard

Adding script to generate json indexes for remote use.

Open ianhelle opened this issue 3 years ago • 0 comments

Contains script (scripts/misc/create_json_index.py) to create consolidated index from yaml metadata. This lets uses pull the metadata from the repo in a single request. The script (by default creates an uncompressed JSON, and a zipped and gzipped versions). The indexes are created in ./data/.index Also adding initial index files to ./data/.index.

I'm thinking that we could add a github action to build new index files triggered by future PRs. This could auto-create a PR but we'd likely need to add one or two custom actions - e.g. https://github.com/marketplace/actions/create-pull-request

Something like this (but this would not work with forks, since it would not have permissions to push to the remote)

on:
  pull_request:
    branches: [master]

jobs:
  build:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: ["3.10"]
    env:
      OUT_PATH: "./datasets/.index"
      IN_PATH: "./datasets"
    steps:
      - name: Build indexes
        run: python -m scripts.misc.create_json_index --input-path ${{env.IN_PATH}} --output-path ${{env.OUT_PATH}} --formats all
      - name: Check if there are changes
        id: changes
        uses: UnicornGlobal/[email protected]
      - name: Add output files to current PR
      - uses: actions/checkout@v3
        run: |
          index-updated=$( git status --short --untracked-files=no | grep "dsets-index\.json$" )
          if [ $index-updated ]
          then
            git config user.name Auto-update-index
            git config user.email <>
            git add ${{env.OUT_PATH}}/*
            git commit -m "Security datasets auto-updated index files."
            git push

ianhelle avatar Aug 23 '22 00:08 ianhelle