datacontract-cli icon indicating copy to clipboard operation
datacontract-cli copied to clipboard

big query quality check query does not include project and dataset in the FROM clause

Open jack-haus opened this issue 9 months ago • 10 comments

Testing my-contract-name.yaml
ERROR:soda.scan:[13:13:27] Query execution error in 10.bigquery.[my-table-name].aggregation[0]: 400 POST https://bigquery.googleapis.com/bigquery/v2/projects/[my-project-id]/queries?prettyPrint=false: Syntax error: Unexpected "[my-table-name]" at [9:6]
SELECT 
  COUNT(CASE WHEN date IS NULL THEN 1 END),
  COUNT(CASE WHEN field-name-1 IS NULL THEN 1 END),
  COUNT(CASE WHEN NOT (field-name-1 IS NULL) AND NOT (LENGTH(country_code) >= 2) THEN 1 END),
  COUNT(CASE WHEN NOT (field-name-1 IS NULL) AND NOT (LENGTH(country_code) <= 2) THEN 1 END),
  COUNT(CASE WHEN field-name-2 IS NULL THEN 1 END),
  COUNT(CASE WHEN reporting_time IS NULL THEN 1 END),
  COUNT(CASE WHEN field-name-3 IS NULL THEN 1 END) 
FROM [my-table-name]

the query is missing the project-id and dataset name in the table's qualified name.

my contract looks like the following

dataContractSpecification: 1.1.0
kind: DataContract
id: my-contract-id
info:
  title: my-contract-title
  version: 0.0.1

# Servers
servers: 
  [my-project-id]/[my-dataset-name]:
      type: bigquery
      project: [my-project-id]
      dataset: [my-dataset]

models:
  [my-table-name]:
    type: table
    fields:
      date: 
        type: date
        required: true
      field-name-1:
        type: string
        required: true
        maxLength: 2
        minLength: 2
      field-name-2:
        type: float
        required: true
      reporting_time:
        type: date
        required: true
      field-name-3:
        type: string
        required: true
       ```

jack-haus avatar Apr 22 '25 20:04 jack-haus

Which version of CLI are you using?

jochenchrist avatar Apr 22 '25 20:04 jochenchrist

@jochenchrist datacontract-cli==0.10.24

jack-haus avatar Apr 22 '25 22:04 jack-haus

Sorry, I cannot reproduce this.

This works for me:

datacontract.yaml

dataContractSpecification: 1.1.0
id: bigquery
info:
  title: bigquery
  version: 0.0.1
  owner: my-domain-team
servers:
  my-dataproduct/bigquery:
    type: bigquery
    project: datameshexample-product
    dataset: datacontract_cli
models:
  complex_table:
    type: table
    fields:
      some_string:
        type: string
        required: true
        maxLength: 2
        minLength: 2
      some_record:
        type: record
        fields:
          some_field_1:
            type: string
          some_field_2:
            type: string
      some_array_of_strings:
        type: array
        items:
          type: string
      some_array_of_records:
        type: array
        items:
          type: record
          fields:
            some_other_field_1:
              type: string
            some_other_field_2:
              type: string
      some_json:
        type: text
        config:
          bigqueryType: json
      some_range_of_timestamp:
        type: record
        config:
          bigqueryType: RANGE<TIMESTAMP>

Command

export DATACONTRACT_BIGQUERY_ACCOUNT_INFO_JSON_PATH=~/Temp/bigquery/datameshexample-product-b74999d7c78e.json
uvx --from 'datacontract-cli[bigquery]==0.10.24' datacontract test datacontract.yaml --logs

jochenchrist avatar Apr 24 '25 07:04 jochenchrist

Do you have any special characters in your table or project name?

jochenchrist avatar Apr 24 '25 07:04 jochenchrist

Do you have any special characters in your table or project name?

no, we just have dashes and numbers.

jack-haus avatar Apr 24 '25 16:04 jack-haus

Next:

  • Please double-check that your service account (as defined in the DATACONTRACT_BIGQUERY_ACCOUNT_INFO_JSON_PATH) has the correct permissions to access the tables
  • Please run the soda command from the error message above directly in the BigQuery web console:
SELECT 
  COUNT(CASE WHEN date IS NULL THEN 1 END),
  COUNT(CASE WHEN field-name-1 IS NULL THEN 1 END),
  COUNT(CASE WHEN NOT (field-name-1 IS NULL) AND NOT (LENGTH(country_code) >= 2) THEN 1 END),
  COUNT(CASE WHEN NOT (field-name-1 IS NULL) AND NOT (LENGTH(country_code) <= 2) THEN 1 END),
  COUNT(CASE WHEN field-name-2 IS NULL THEN 1 END),
  COUNT(CASE WHEN reporting_time IS NULL THEN 1 END),
  COUNT(CASE WHEN field-name-3 IS NULL THEN 1 END) 
FROM [my-table-name]

jochenchrist avatar Apr 24 '25 17:04 jochenchrist

@jochenchrist so here is what I did

i ran the following command:

datacontract import --format bigquery --bigquery-project data-sandbox-354716 --bigquery-dataset hausdemo --bigquery-table [MY-TABLE-NAME] --dialect bigquery

it returned the following

dataContractSpecification: 1.1.0
id: my-data-contract-id
info:
  title: My Data Contract
  version: 0.0.1
models:
  [my-table-name]:
    type: table
    fields:
      [date-field-1]:
        type: date
        required: false
      country_code:
        type: string
        required: false
      [numeric-field-name]:
        type: float
        required: false
      [date-field-2]:
        type: date
        required: false
      [string-field-1]:
        type: string
        required: false

I changed the required properties from false to true, and added the server block to it, and saved it to a file called my_table_name_minimal.yaml

dataContractSpecification: 1.1.0
id: my-data-contract-id
info:
  title: My Data Contract
  version: 0.0.1
servers:
  data-sandbox-354716/hausdemo:
      type: bigquery
      project: data-sandbox-354716
      dataset: hausdemo
models:
  [my-table-name]:
    type: table
    fields:
      [date-field-1]:
        type: date
        required: false
      [string-field-1]:
        type: string
        required: true
      [numeric-field-name]:
        type: float
        required: true
      [date-field-2]:
        type: date
        required: true
      [string-field-2]:
        type: string
        required: true

then I got the following when I ran datacontract test my_table_name_minimal.yaml

(venv) jack@Jack-Enloe-MacBook-Pro contracts % datacontract test my_table_name_minimal.yaml  
Testing my_table_name_minimal.yaml
ERROR:soda.scan:[10:17:47] Query execution error in 2.bigquery.my_table_name.aggregation[0]: 400 POST https://bigquery.googleapis.com/bigquery/v2/projects/data-sandbox-354716/queries?prettyPrint=false: Syntax error: Unexpected "my_table_name" at [7:6]
SELECT 
  COUNT(CASE WHEN date IS NULL THEN 1 END),
  COUNT(CASE WHEN string_col_1 IS NULL THEN 1 END),
  COUNT(CASE WHEN number_col_1 IS NULL THEN 1 END),
  COUNT(CASE WHEN reporting_time IS NULL THEN 1 END),
  COUNT(CASE WHEN string_col_1 IS NULL THEN 1 END) 
FROM my_table_name
ERROR:soda.scan:  | 400 POST https://bigquery.googleapis.com/bigquery/v2/projects/data-sandbox-354716/queries?prettyPrint=false: Syntax error: Unexpected "my_table_name" at [7:6]
ERROR:soda.scan:[10:17:48] Metrics 'missing_count' were not computed for check 'missing_count(date) = 0'
ERROR:soda.scan:[10:17:48] Metrics 'missing_count' were not computed for check 'missing_count(string_col_1) = 0'
ERROR:soda.scan:[10:17:48] Metrics 'missing_count' were not computed for check 'missing_count(number_col_1) = 0'
ERROR:soda.scan:[10:17:48] Metrics 'missing_count' were not computed for check 'missing_count(reporting_time) = 0'
ERROR:soda.scan:[10:17:48] Metrics 'missing_count' were not computed for check 'missing_count(string_col_1) = 0'
ERROR:soda.scan:[10:17:48] Query execution error in 2.bigquery.my_table_name.aggregation[0]: 400 POST https://bigquery.googleapis.com/bigquery/v2/projects/data-sandbox-354716/queries?prettyPrint=false: Syntax error: Unexpected "my_table_name" at [7:6]
SELECT 
  COUNT(CASE WHEN date IS NULL THEN 1 END),
  COUNT(CASE WHEN string_col_1 IS NULL THEN 1 END),
  COUNT(CASE WHEN number_col_1 IS NULL THEN 1 END),
  COUNT(CASE WHEN reporting_time IS NULL THEN 1 END),
  COUNT(CASE WHEN string_col_1 IS NULL THEN 1 END) 
FROM my_table_name
ERROR:soda.scan:  | 400 POST https://bigquery.googleapis.com/bigquery/v2/projects/data-sandbox-354716/queries?prettyPrint=false: Syntax error: Unexpected "my_table_name" at [7:6]
ERROR:soda.scan:[10:17:48] Metrics 'missing_count' were not computed for check 'missing_count(date) = 0'
ERROR:soda.scan:[10:17:48] Metrics 'missing_count' were not computed for check 'missing_count(string_col_1) = 0'
ERROR:soda.scan:[10:17:48] Metrics 'missing_count' were not computed for check 'missing_count(number_col_1) = 0'
ERROR:soda.scan:[10:17:48] Metrics 'missing_count' were not computed for check 'missing_count(reporting_time) = 0'
ERROR:soda.scan:[10:17:48] Metrics 'missing_count' were not computed for check 'missing_count(string_col_1) = 0'
WARNING:root:Engine soda-core has errors. See the logs for details.
╭─────────┬───────────────────────────────────────────────────────┬──────────────────────────────┬────────────────────────────────────────────────────╮
│ Result  │ Check                                                 │ Field                        │ Details                                            │
├─────────┼───────────────────────────────────────────────────────┼──────────────────────────────┼────────────────────────────────────────────────────┤
│         │ Check that field string_col_1 has no missing values   │ my_table_name.string_col_1   │                                                    │
│         │ Check that field string_col_1 has no missing values   │ my_table_name.string_col_1   │                                                    │
│         │ Check that field date has no missing values           │ my_table_name.date           │                                                    │
│         │ Check that field reporting_time has no missing values │ my_table_name.reporting_time │                                                    │
│         │ Check that field number_col_1 has no missing values   │ my_table_name.number_col_1   │                                                    │
│ passed  │ Check that field 'string_col_1' is present            │ my_table_name.string_col_1   │                                                    │
│ passed  │ Check that field string_col_1 has type STRING         │ my_table_name.string_col_1   │                                                    │
│ passed  │ Check that field 'string_col_1' is present            │ my_table_name.string_col_1   │                                                    │
│ passed  │ Check that field string_col_1 has type STRING         │ my_table_name.string_col_1   │                                                    │
│ passed  │ Check that field 'date' is present                    │ my_table_name.date           │                                                    │
│ passed  │ Check that field date has type DATE                   │ my_table_name.date           │                                                    │
│ passed  │ Check that field 'reporting_time' is present          │ my_table_name.reporting_time │                                                    │
│ passed  │ Check that field reporting_time has type DATE         │ my_table_name.reporting_time │                                                    │
│ passed  │ Check that field 'number_col_1' is present            │ my_table_name.number_col_1   │                                                    │
│ passed  │ Check that field number_col_1 has type FLOAT64        │ my_table_name.number_col_1   │                                                    │
│ warning │ Data Contract Tests                                   │                              │ Engine soda-core has errors. See the logs for      │
│         │                                                       │                              │ details.                                           │
╰─────────┴───────────────────────────────────────────────────────┴──────────────────────────────┴────────────────────────────────────────────────────╯
🟠 data contract has warnings. Found the following warnings:
1) my_table_name.date Check that field date has no missing values: None
2) my_table_name.string_col_1 Check that field string_col_1 has no missing values: None
3) my_table_name.number_col_1 Check that field number_col_1 has no missing values: None
4) my_table_name.reporting_time Check that field reporting_time has no missing values: None
5) my_table_name.string_col_1 Check that field string_col_1 has no missing values: None
6) Data Contract Tests: Engine soda-core has errors. See the logs for details.

Should I try a previous version maybe?

my project requirements are the following


(venv) jack@Jack-Enloe-MacBook-Pro data-contracts-prototype % pip freeze          
absl-py==2.2.2
aenum==3.1.15
agate==1.9.1
aiobotocore==2.19.0
aiofiles==24.1.0
aiohappyeyeballs==2.4.0
aiohttp==3.10.5
aioitertools==0.12.0
aiosignal==1.3.1
alembic==1.13.2
altair==4.2.2
annotated-types==0.7.0
ansicolors==1.1.8
antlr4-python3-runtime==4.11.1
anyio==4.4.0
appnope==0.1.4
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
arviz==0.21.0
asana==5.0.10
asn1crypto==1.5.1
astroid==3.3.9
asttokens==2.4.1
astunparse==1.6.3
async-lru==2.0.4
async-timeout==4.0.3
attrs==24.2.0
Authlib==1.3.2
avro==1.12.0
azure-core==1.33.0
azure-identity==1.17.1
babel==2.17.0
backcall==0.2.0
backports.tarfile==1.2.0
beautifulsoup4==4.13.3
bleach==6.2.0
blinker==1.9.0
boto3==1.36.3
botocore==1.36.3
build==1.2.2.post1
CacheControl==0.14.0
cachetools==5.5.0
certifi==2024.8.30
cffi==1.17.0
cfgv==3.4.0
charset-normalizer==3.3.2
cleo==2.1.0
click==8.1.7
click-default-group==1.2.4
cloud-sql-python-connector==1.15.0
cloudevents==1.11.0
cloudpickle==2.2.1
cmdstanpy==1.2.4
colorama==0.4.6
comm==0.2.2
cons==0.4.6
contourpy==1.3.0
convertdate==2.4.0
coverage==7.6.10
cramjam==2.10.0
crashtest==0.4.1
crcmod==1.7
cryptography==43.0.1
curlify==2.2.1
customerlift==0.1.47
cycler==0.12.1
Cython==3.0.11
daff==1.3.46
databricks-sdk==0.49.0
databricks-sql-connector==4.0.2
datacontract-cli==0.10.24
db-dtypes==1.4.2
dbt-adapters==1.14.6
dbt-common==1.22.0
dbt-core==1.9.4
dbt-extractor==0.6.0
dbt-semantic-interfaces==0.7.4
debugpy==1.8.12
decorator==5.1.1
deepdiff==7.0.1
defusedxml==0.7.1
Deprecated==1.2.14
deprecation==2.1.0
dill==0.3.9
distlib==0.3.9
dnspython==2.6.1
docopt==0.6.2
docstring_parser==0.16
duckdb==1.0.0
dulwich==0.21.7
durationpy==0.9
dynaconf==3.2.7
email_validator==2.2.0
entrypoints==0.4
ephem==4.1.5
estimation==0.4.6
et_xmlfile==2.0.0
etuples==0.3.9
exceptiongroup==1.2.2
executing==2.1.0
facebook-business==10.0.1
fastapi==0.115.12
fastjsonschema==2.20.0
fastparquet==2024.11.0
filelock==3.15.4
firebase-admin==5.4.0
Flask==3.1.0
Flask-Cors==3.0.10
Flask-Migrate==4.0.7
Flask-SQLAlchemy==3.1.1
flask-talisman==1.1.0
flatbuffers==25.2.10
fonttools==4.53.1
forecasting==0.1.19
formulaic==1.0.2
fpdf==1.7.2
fqdn==1.5.1
frozenlist==1.4.1
fsspec==2025.2.0
functions-framework==3.8.2
gast==0.6.0
gcloud==0.18.3
gcsfs==2024.10.0
geolift-metadata-store==1.3.21
google-api-core==2.19.2
google-api-python-client==2.143.0
google-auth==2.39.0
google-auth-httplib2==0.2.0
google-auth-oauthlib==1.2.1
google-cloud==0.34.0
google-cloud-aiplatform==1.64.0
google-cloud-appengine-logging==1.4.5
google-cloud-audit-log==0.3.0
google-cloud-bigquery==3.31.0
google-cloud-bigquery-datatransfer==3.17.1
google-cloud-bigquery-storage==2.25.0
google-cloud-core==2.4.3
google-cloud-firestore==2.18.0
google-cloud-kms==2.24.2
google-cloud-logging==3.11.4
google-cloud-pubsub==2.26.1
google-cloud-resource-manager==1.12.5
google-cloud-secret-manager==2.22.1
google-cloud-storage==3.1.0
google-cloud-tasks==2.19.2
google-crc32c==1.6.0
google-pasta==0.2.0
google-resumable-media==2.7.2
googleapis-common-protos==1.65.0
great-expectations==0.18.22
grpc-google-iam-v1==0.14.2
grpcio==1.71.0
grpcio-status==1.62.3
grpcio-tools==1.71.0
gunicorn==23.0.0
h11==0.14.0
h5netcdf==1.3.0
h5py==3.11.0
haus-notifications==1.0.12
hijri-converter==2.3.1
holidays==0.18
httpcore==1.0.5
httplib2==0.22.0
httpx==0.27.2
hypothesis==6.125.2
identify==2.6.3
idna==3.8
importlib-metadata==6.11.0
inflect==7.5.0
iniconfig==2.0.0
installer==0.7.0
interface-meta==1.3.0
ip3country==0.3.0
ipykernel==6.29.5
ipython==8.12.3
ipywidgets==8.1.5
iso3166==2.1.1
isodate==0.6.1
isoduration==20.11.0
isort==6.0.1
itsdangerous==2.2.0
jaraco.classes==3.4.0
jaraco.context==6.0.1
jaraco.functools==4.1.0
jedi==0.19.1
Jinja2==3.1.5
jinja_partials==0.2.1
jmespath==1.0.1
joblib==1.4.2
json5==0.10.0
jsonpatch==1.33
jsonpointer==3.0.0
jsonschema==4.23.0
jsonschema-specifications==2023.12.1
jupyter-events==0.12.0
jupyter-lsp==2.2.5
jupyter_client==8.6.2
jupyter_core==5.7.2
jupyter_server==2.15.0
jupyter_server_terminals==0.5.3
jupyterlab==4.4.0
jupyterlab_pygments==0.3.0
jupyterlab_server==2.27.3
jupyterlab_widgets==3.0.13
keras==3.9.2
keyring==25.6.0
keyrings.google-artifactregistry-auth==1.1.2
kiwisolver==1.4.6
korean-lunar-calendar==0.3.1
kubernetes==32.0.1
leather==0.4.0
libclang==18.1.1
lightgbm==4.6.0
linearmodels==5.4
logical-unification==0.4.6
LunarCalendar==0.0.9
lz4==4.4.4
mailchimp-transactional==1.0.56
makefun==1.15.6
Mako==1.3.5
Markdown==3.8
markdown-it-py==3.0.0
MarkupSafe==2.1.5
marshmallow==3.26.1
mashumaro==3.14
matplotlib==3.10.1
matplotlib-inline==0.1.7
mccabe==0.7.0
mdurl==0.1.2
metaflow==2.15.7
metaflow-netflixext==1.2.3
miniKanren==1.0.3
mistune==3.1.1
ml_dtypes==0.5.1
mmh3==5.1.0
mockito==1.5.4
more-itertools==10.5.0
mpmath==1.3.0
msal==1.32.0
msal-extensions==1.3.1
msgpack==1.0.8
multidict==6.4.3
multimethod==1.10
multipledispatch==1.0.0
mypy-extensions==1.0.0
mysqlclient==2.2.4
namex==0.0.8
narwhals==1.35.0
nbclient==0.10.0
nbconvert==7.16.6
nbformat==5.10.4
nest-asyncio==1.6.0
networkx==3.4.2
nodeenv==1.9.1
notebook==7.4.0
notebook_shim==0.2.4
notion-client==2.2.1
numpy==1.26.4
oauth2client==4.1.3
oauthlib==3.2.2
openpyxl==3.1.5
opentelemetry-api==1.32.1
opentelemetry-exporter-otlp-proto-common==1.32.1
opentelemetry-exporter-otlp-proto-http==1.32.1
opentelemetry-proto==1.32.1
opentelemetry-sdk==1.32.1
opentelemetry-semantic-conventions==0.53b1
opt_einsum==3.4.0
optree==0.15.0
ordered-set==4.1.0
overrides==7.7.0
packaging==24.2
pandas==2.2.3
pandas-gbq==0.19.2
pandera==0.18.3
pandocfilters==1.5.1
papermill==2.6.0
parsedatetime==2.6
parso==0.8.4
pathspec==0.12.1
patsy==0.5.6
pexpect==4.9.0
pickleshare==0.7.5
pillow==10.4.0
pipreqs==0.5.0
pkginfo==1.12.0
platformdirs==4.2.2
plotly==6.0.1
pluggy==1.5.0
poetry==1.8.5
poetry-core==1.9.1
poetry-plugin-export==1.8.0
pre_commit==4.0.1
prometheus_client==0.21.1
prompt_toolkit==3.0.47
propcache==0.2.1
prophet==1.1.1
proto-plus==1.24.0
protobuf==5.29.3
psutil==6.1.1
psycopg2-binary==2.9.10
ptyprocess==0.7.0
pure_eval==0.2.3
py4j==0.10.9.7
pyactiveresource==2.2.2
pyarrow==19.0.1
pyasn1==0.6.0
pyasn1_modules==0.4.0
pycountry==24.6.1
pycparser==2.22
pydantic==2.10.6
pydantic_core==2.27.2
pydata-google-auth==1.8.2
pydbml==1.2.0
Pygments==2.18.0
pyhaus-messaging==0.3.2
pyhdfe==0.2.0
pyiceberg==0.8.1
PyJWT==2.9.0
pylint==3.3.6
pymc==5.22.0
pymc-marketing==0.11.1
PyMeeus==0.5.12
PyMySQL==1.1.1
pyodbc==5.2.0
pyOpenSSL==24.3.0
pyparsing==3.1.4
pyproject_hooks==1.2.0
pyprojroot==0.3.0
pyspark==3.5.5
pytensor==2.30.3
pytest==8.3.2
pytest-cov==6.0.0
pytest-flask==1.3.0
pytest-mock==3.14.0
pytest-reverse==1.8.0
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
python-json-logger==3.2.1
python-multipart==0.0.20
python-slugify==8.0.4
pytimeparse==1.1.8
pytz==2024.1
PyYAML==6.0.2
pyzmq==26.2.0
RapidFuzz==3.10.1
rdflib==7.0.0
redis==4.6.0
referencing==0.35.1
regiontest==15.10.18
requests==2.32.3
requests-mock==1.12.1
requests-oauthlib==2.0.0
requests-toolbelt==1.0.0
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rich==13.8.0
rpds-py==0.20.0
rsa==4.9
ruamel.yaml==0.17.40
ruamel.yaml.clib==0.2.12
s3fs==2025.2.0
s3transfer==0.11.3
scikit-learn==1.6.1
scipy==1.11.4
seaborn==0.13.2
Send2Trash==1.8.3
sentry-sdk==2.20.0
setuptools-git==1.2
setuptools-scm==8.1.0
shapely==2.0.6
shellingham==1.5.4
ShopifyAPI==12.6.0
six==1.16.0
sniffio==1.3.1
snowflake-connector-python==3.12.1
snowplow-tracker==1.1.0
soda-core==3.5.2
soda-core-bigquery==3.5.2
soda-core-contracts==3.5.2
soda-core-duckdb==3.5.2
soda-core-postgres==3.5.2
soda-core-snowflake==3.5.2
soda-core-spark==3.5.2
soda-core-spark-df==3.5.2
soda-core-sqlserver==3.5.2
soda-core-trino==3.5.2
sortedcontainers==2.4.0
soupsieve==2.6
SQLAlchemy==2.0.33
sqlglot==26.15.0
sqlparse==0.5.3
stack-data==0.6.3
stanio==0.5.1
starlette==0.46.2
statsig==0.41.0
statsmodels==0.14.4
strictyaml==1.7.3
sympy==1.13.1
tabulate==0.9.0
tenacity==8.5.0
tensorboard==2.19.0
tensorboard-data-server==0.7.2
tensorflow==2.19.0
tensorflow-io-gcs-filesystem==0.37.1
termcolor==2.4.0
terminado==0.18.1
text-unidecode==1.3
threadpoolctl==3.5.0
thrift==0.20.0
tinycss2==1.4.0
tomli==2.0.1
tomlkit==0.13.2
toolz==0.12.1
torch==2.6.0
tornado==6.4.1
tqdm==4.66.5
traitlets==5.14.3
trino==0.333.0
trove-classifiers==2024.10.21.16
typeguard==4.3.0
typer==0.15.2
types-python-dateutil==2.9.0.20241206
typing-inspect==0.9.0
typing_extensions==4.12.2
tzdata==2024.1
tzlocal==5.2
ua-parser==0.18.0
ua-parser-builtins==0.18.0.post1
uri-template==1.3.0
uritemplate==4.1.1
urllib3==2.2.2
uvicorn==0.34.0
virtualenv==20.28.0
watchdog==6.0.0
wcwidth==0.2.13
webcolors==24.11.1
webencodings==0.5.1
websocket-client==1.8.0
Werkzeug==3.1.3
widgetsnbextension==4.0.13
wrapt==1.16.0
xarray==2025.3.1
xarray-einstats==0.7.0
xattr==1.1.0
xgboost==3.0.0
yarg==0.1.9
yarl==1.9.7
zipp==3.20.1
zstandard==0.23.0
(venv) jack@Jack-Enloe-MacBook-Pro data-contracts-prototype % 

jack-haus avatar Apr 24 '25 17:04 jack-haus

@jochenchrist

this query, from soda will not work

SELECT 
  COUNT(CASE WHEN date IS NULL THEN 1 END),
  COUNT(CASE WHEN field-name-1 IS NULL THEN 1 END),
  COUNT(CASE WHEN NOT (field-name-1 IS NULL) AND NOT (LENGTH(country_code) >= 2) THEN 1 END),
  COUNT(CASE WHEN NOT (field-name-1 IS NULL) AND NOT (LENGTH(country_code) <= 2) THEN 1 END),
  COUNT(CASE WHEN field-name-2 IS NULL THEN 1 END),
  COUNT(CASE WHEN reporting_time IS NULL THEN 1 END),
  COUNT(CASE WHEN field-name-3 IS NULL THEN 1 END) 
FROM [my-table-name]

in order to get soda checks to work directly using soda, I needed to have a checks.yml of the following

checks for data-sandbox-354716.hausdemo.my_table_name:
  - row_count > 0
  - missing_count(date) = 0
  - missing_count(string_col_1) = 0
  - missing_count(number_col_1) = 0
  - missing_count(reporting_time) = 0
  - missing_count(string_col_2) = 0
  - min(number_col_1) >= 0
  - max_length(string_col_1) = 2
  - min_length(string_col_1) = 2

the query produced by datacontract-cl will only work when I add the project, and dataset to the table name

SELECT 
  COUNT(CASE WHEN date IS NULL THEN 1 END),
  COUNT(CASE WHEN field-name-1 IS NULL THEN 1 END),
  COUNT(CASE WHEN NOT (field-name-1 IS NULL) AND NOT (LENGTH(country_code) >= 2) THEN 1 END),
  COUNT(CASE WHEN NOT (field-name-1 IS NULL) AND NOT (LENGTH(country_code) <= 2) THEN 1 END),
  COUNT(CASE WHEN field-name-2 IS NULL THEN 1 END),
  COUNT(CASE WHEN reporting_time IS NULL THEN 1 END),
  COUNT(CASE WHEN field-name-3 IS NULL THEN 1 END) 
FROM data-sandbox-354716.hausdemo.[my-table-name]

jack-haus avatar Apr 24 '25 17:04 jack-haus

the service account referenced by DATACONTRACT_BIGQUERY_ACCOUNT_INFO_JSON_PATH is a bigquery admin

jack-haus avatar Apr 24 '25 17:04 jack-haus

For reference: The issue was related to a table name, starting with a numeric value, e.g. "1abced", which must be escaped according to https://cloud.google.com/bigquery/docs/reference/standard-sql/lexical

Solution would be to enclose all BigQuery identifiers with backticks.

jochenchrist avatar Apr 25 '25 06:04 jochenchrist