framework icon indicating copy to clipboard operation
framework copied to clipboard

Ability to validate local data files against remote datapackage descriptor?

Open roll opened this issue 2 years ago • 1 comments

Overview

See this comment - https://github.com/frictionlessdata/framework/issues/1416#issuecomment-1431546353.

In v5 the basepath logic was unified so this is not possible anymore:

frictionless validate --basepath local/path/to/csvs \
https://raw.githubusercontent.com/mjacqu/glenglat/main/contribute/datapackage.yaml

The basepath will be just ignored as the data package is remote and it will have https://raw.githubusercontent.com/mjacqu/glenglat/main/contribute/ basepath.

In Python, it's possible to change this behavior (not tested for dialect/schema as a string):

package = Package('https://raw.githubusercontent.com/mjacqu/glenglat/main/contribute/datapackage.yaml')
for resource in package.resources:
  resource.basepath = 'local/path/to/csvs'
  print(resource.normpath)
  # local/path/to/csvs/borehole.csv
  # local/path/to/csvs/measurement.csv

Using this approach we can provide an argument like force-basepath to actions/program.validate (it's better to do this after v6 when CLI logic is rebased on using classes instead of actions)

roll avatar Feb 21 '23 10:02 roll

In essence, I'm hoping to have something like we have for validating a single table against a remote Table Schema, but for a group of tables (related by foreign keys) against a remote Tabular Data Package. This can be achieved custom with the Python API, but it would be beneficial for this functionality to be accessible from the CLI, or better yet, in the Frictionless Application.

Why? I am involved in several projects where many different people need to format their data into a predefined multi-table format (i.e. a remote Tabular Data Package descriptor) and need to validate their data against this schema.

ezwelty avatar Mar 13 '23 10:03 ezwelty