datacontract-cli icon indicating copy to clipboard operation
datacontract-cli copied to clipboard

model field name is convert to "checks for {model name}"

Open mlyonlebo opened this issue 9 months ago • 8 comments

Hi, I have been trying to implement a basic contract for a bigquery table and have been unable to test it because of a perplexing error:

From the logs: ERROR Invalid section header "checks for {model name}"

The contract is pretty boilerplate:

dataContractSpecification: 1.1.0
id: my-data-contract-id
info:
  title: My Data Contract
  version: 0.0.1
  description: This is a data contract for orders
servers:
  production:
    type: bigquery
    project: this-project-id
    dataset: cool_dataset
models:
  cool_table_name:
    type: table
    fields:
      date:
        type: date
        description: The date of the event
        required: false
      country:
        type: string
        description: The country of the event
        required: false
      size:
        type: float
        description: The size of the event
        required: false
      event_name:
        type: string
        description: The name of the event
        required: false

I was poking around in the source code for clues at what I might be doing wrong and I found this function, which is likely responsible for converting the section header from 'cool_table_name' to 'checks for cool_table_name'

Thank you for your help!

mlyonlebo avatar Apr 07 '25 23:04 mlyonlebo

Hi, the server element should include dataset: {model} to tokenize the model name cool_table_name

dmaresma avatar Apr 08 '25 02:04 dmaresma

Is the model not supposed to correspond to the table name?

From the docs:

servers:
  production:
    type: bigquery
    project: datameshexample-product
    dataset: datacontract_cli_test_dataset
models:
  datacontract_cli_test_table: # corresponds to a BigQuery table
    type: table
    fields: ...

mlyonlebo avatar Apr 10 '25 16:04 mlyonlebo

From a search in code : yes the {model} is the token for model_name a.k.a table name

dmaresma avatar Apr 10 '25 17:04 dmaresma

Thanks, so it seems that the contract is correct in this regard. Apologies, but I don't understand your original comment, or how to revise the contract to avoid this error:

ERROR:soda.scan:[09:31:53] Invalid section header "checks for this_cool_model_name"

The transformation seems to be happening here: https://github.com/datacontract/datacontract-cli/blob/becc253a12285bb76a255ed9ceec0f7ce5ccd78c/datacontract/engines/data_contract_checks.py#L73

mlyonlebo avatar Apr 10 '25 17:04 mlyonlebo

The CLI internally converts checks to Soda Checks Language to execute tests for BigQuery. Can you confirm that your table in BigQuery is named this_cool_model_name?

jochenchrist avatar Apr 19 '25 15:04 jochenchrist

@jochenchrist, yes: the model/table name is the same

mlyonlebo avatar Apr 21 '25 16:04 mlyonlebo

Hi, the server element should include dataset: {model} to tokenize the model name cool_table_name

what exactly does this mean?

should server look like this?

servers: 
 [my-project-id]/[my dataset name]:
      type: bigquery
      project: [my-project-id]
      dataset: [my-dataset]

jack-haus avatar Apr 22 '25 20:04 jack-haus

Hi, the server element should include dataset: {model} to tokenize the model name cool_table_name

Not sure what @dmaresma meant here. Your definition of the server looks ok.

Do you have some special characters in table name or project/dataset that would require special quoting or character escaping?

jochenchrist avatar Jul 05 '25 20:07 jochenchrist