hecat icon indicating copy to clipboard operation
hecat copied to clipboard

Generic automation tool around data stored as plaintext YAML files

hecat

A generic automation tool around data stored as plaintext YAML files.

Status: experimental CI

This program uses YAML files to store data about various kind of items (bookmarks, software projects, ...). It is able to import data from various input formats, perform processing tasks (enrich data, run consistency checks...), and export data to other formats.

Modules

  • importers/markdown_awesome: import data from the awesome-selfhosted markdown format
  • importers/shaarli_api: import data from a Shaarli instance using the API
  • processors/github_metadata: enrich software project metadata from GitHub API (stars, last commit date...)
  • processors/awesome_lint: check data against awesome-selfhosted consistency/completeness guidelines
  • processors/download_media: download video/audio files using yt-dlp for bookmarks imported from Shaarli
  • exporters/markdown_singlepage: export data from the awesome-selfhosted-data format to a single markdown document

Installation

# install requirements
sudo apt install python3-venv python3-pip
# create a python virtualenv
python3 -m venv ~/.venv
# activate the virtualenv
source ~/.venv/bin/activate
# install the program
pip3 install git+https://gitlab.com/nodiscc/hecat.git

To install from a local copy instead:

# grab a copy
git clone https://gitlab.com/nodiscc/hecat.git
# install the python package
cd hecat && python3 setup.py install

Usage

$ hecat --help
usage: hecat [-h] [--config CONFIG_FILE]

optional arguments:
  -h, --help            show this help message and exit
  --config CONFIG_FILE  configuration file

If no configuration file is specified, configuration is read from .hecat.yml in the current directory.

Configuration

hecat executes all steps defined in the configuration file. For each step:

steps:
  - name: example step # arbitrary name for this step
    module: processor/example # the module to use, see list of modules above
    module_options: # a dict of options specific to the module, see list of modules above
      option1: True
      option2: some_value

Examples

Import data from awesome-selfhosted, apply processing steps, export to single-page markdown again

# .hecat.yml
# $ git clone https://github.com/awesome-selfhosted/awesome-selfhosted
# $ git clone https://github.com/awesome-selfhosted/awesome-selfhosted-data
steps:
  - name: import awesome-selfhosted README.md to YAML
    module: importers/markdown_awesome
    module_options:
      source_file: awesome-selfhosted/README.md
      output_directory: ./
      output_licenses_file: licenses.yml # optional, default licenses.yml
      overwrite_tags: False # optional, default False

  - name: update github projects metadata
    module: processors/github_metadata
    module_options:
      source_directory: awesome-selfhosted-data
      gh_metadata_only_missing: True # optional, default False

  - name: check data against awesome-selfhosted guidelines
    module: processors/awesome_lint
    module_options:
      source_directory: awesome-selfhosted-data

  - step: export YAML data to single-page markdown
    module: exporters/markdown_singlepage
    module_options:
      source_directory: awesome-selfhosted-data
      output_directory: awesome-selfhosted
      output_file: README.md
      authors_file: AUTHORS.md # optional, default no authors file
      exclude_licenses: # optional, default []
        - 'CC-BY-NC-4.0'
        - '⊘ Proprietary'
        - 'SSPL-1.0'

Import data from a Shaarli instance, download video/audio files identified by specific tags:

# .hecat.yml
# $ python3 -m venv .venv && source .venv/bin/activate && pip3 install shaarli-client
# $ mkdir -p ~/.config/shaarli/ && nano ~/.config/shaarli/client.ini
# $ shaarli get-links --limit=all >| tests/shaarli.json
  - name: import data shaarli from shaarli API JSON
    module: importers/shaarli_api
    module_options:
      source_file: tests/shaarli.json
      output_file: tests/shaarli.yml

  - name: download video files
    module: processors/download_media
    module_options:
      data_file: tests/shaarli.yml
      only_tags: ['video']
      exclude_tags: ['nodl'] # optional, don't download items tagged with any of these tags
      output_directory: 'tests/video'
      download_playlists: False # optional, default False
      skip_when_filename_present: False # optional, default False
      retry_items_with_error: True # optional, default True

  - name: download audio files
    module: processors/download_media
    module_options:
      data_file: tests/shaarli.yml
      only_tags: ['music']
      exclude_tags: ['nodl']
      output_directory: 'tests/audio'
      only_audio: True

Schedule automatic metadata update every hour from Github Actions:

# .github/workflows/update-metadata.yml
on:
  schedule:
    - cron: '22 * * * *'

env:
  GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}}

jobs:
  test_schedule:
    runs-on: ubuntu-latest
    steps:
      - name: checkout
        uses: actions/checkout@v3
      - name: install hecat
        run: |
          python3 -m venv .venv
          source .venv/bin/activate
          pip3 install wheel
          pip3 install --force git+https://github.com/nodiscc/hecat.git@master
      - name: update all metadata from Github API
        run: source .venv/bin/activate && hecat --config .hecat.update_metadata.yml
      - name: commit and push changes
        run: |
          git config user.name awesome-selfhosted-bot
          git config user.email [email protected]
          git add software/ tags/ platforms/ licenses*.yml
          git diff-index --quiet HEAD || git commit -m "[bot] update projects metadata"
          git push
# .hecat.update_metadata.yml
steps:
  - name: update all metadata from Github API
    module: processors/github_metadata
    module_options:
      source_directory: ./
      gh_metadata_only_missing: False

Support

Please submit any questions to https://gitlab.com/nodiscc/hecat/-/issues or https://github.com/nodiscc/hecat/issues

Contributing

This program is in a very early stage of development. Code cleanup, documentation, unit tests, improvements, support for other input/output formats is very welcome at https://gitlab.com/nodiscc/hecat/-/merge_requests or https://github.com/nodiscc/hecat/pulls

Testing

# install pyvenv, pip and make
sudo apt install python3-pip python3-venv make
# run automated tests
make clean test_run

License

GNU GPLv3