datasette icon indicating copy to clipboard operation
datasette copied to clipboard

Validate metadata.json on startup

Open simonw opened this issue 7 years ago • 7 comments

It's easy to misspell the name of a database or table and then be puzzled when the metadata settings silently fail.

To avoid this, let's sanity check the provided metadata.json on startup and quit with a useful error message if we find any obvious mistakes.

simonw avatar May 15 '18 13:05 simonw

This came up in #588 - it would be helpful if this would spot things like "queries" defined against the tables block when they should be defined against a database.

simonw avatar Oct 21 '19 01:10 simonw

Is there already functionality that can be used to validate the metadata.json file? Is there a JSON Schema that defines it? Or a validation that's available via datasette with Python? We're working on automatically building the metadata in CI and when we deploy to cloud run, and it would be nice to be able to check whether the the metadata we're outputting is valid in our tests.

zaneselvans avatar Feb 26 '22 02:02 zaneselvans

Interesting example of why this would be valuable here:

  • https://github.com/simonw/datasette/issues/1798

This YAML file:

title: Some title
description_html: |-
  <p>This is an experiment.</p>
databases:
  off:
    tables:
      products_from_owners:
        title: products_from_owners*

Was loaded as equivalent to this JSON:

{
    "title": "Some title",
    "description_html": "<p>This is an experiment.</p>",
    "databases": {
        "false": {
            "tables": {
                "products_from_owners": {
                    "title": "products_from_owners*"
                }
             }
        }
    }
}

Validation that caught this would have been useful.

simonw avatar Sep 02 '22 00:09 simonw

I'm inclined to consider Pydantic for this, since it is widely used now and can generate really good error messages.

simonw avatar Sep 02 '22 00:09 simonw

@zschira is working with Pydantic while converting between and validating JSON frictionless datapackage descriptors that annotate an SQLite DB (extracted from FERC's XBRL data) and the Datasette YAML metadata so we can publish them with Datasette. Maybe there's some overlap? We've been loving Pydantic.

zaneselvans avatar Sep 02 '22 05:09 zaneselvans

Did some related research work in this issue:

  • https://github.com/simonw/shot-scraper/issues/28

simonw avatar Sep 02 '22 18:09 simonw

Another example of confusion from this today: https://discord.com/channels/823971286308356157/823971286941302908/1121042411238457374

See also https://gist.github.com/BinomeDeNewton/651ac8b50dd5420f8e54d1682eee5fed?permalink_comment_id=4605982#gistcomment-4605982

simonw avatar Jun 21 '23 12:06 simonw