datacontract-cli icon indicating copy to clipboard operation
datacontract-cli copied to clipboard

Obsolete controlflow interruption `if server.format == "json" and server.type != "kafka":`

Open dmaresma opened this issue 10 months ago • 4 comments

https://github.com/datacontract/datacontract-cli/blob/252c4e17b7ffb99e7a823d42015a2a08fa595482/datacontract/engines/data_contract_test.py#L53

The check_jsonschema doesn't support azure ou gcp, and it duplicates the check_soda_execute with json that is compatible with azure, s3 etc .., don't respect the DRY

dmaresma avatar Apr 01 '25 12:04 dmaresma

The JSON Schema Check is meant for complex JSON structures (arrays and nested fields). The check_soda_execute will only work on top-level fields.

Would you be interested in contributing better Azure ang GCP support here?

jochenchrist avatar Apr 06 '25 13:04 jochenchrist

Ok I need to check if jsonlines (jsonl) new_lines with gun zip compression is supported (mypayload.jsonl.gz), I'll comeback soon, and I see there is a PR to add azure storage account to jsonschema_check too.

dmaresma avatar Apr 06 '25 14:04 dmaresma

@dmaresma any update?

jochenchrist avatar Aug 02 '25 19:08 jochenchrist

@jochenchrist with the current version of Duckdb the 1.0.0 the duckdb connectivity on Azure fail, the

con.sql(f"""
        CREATE SECRET azure_spn (
            TYPE AZURE,
            PROVIDER service_principal,
            TENANT_ID '{tenant_id}',
            CLIENT_ID '{client_id}',
            CLIENT_SECRET '{client_secret}',
            ACCOUNT_NAME '{storage_account}'
        );
        """)

ddl_query = """CREATE VIEW "product_dim" AS SELECT * FROM read_json('abfss://landing@<azurestorageaccountname>.dfs.core.windows.net/entity=products_uat/year=2025/month=06/day=10/*.jsonl.gz');"""

con.sql(ddl_query)

con.sql("SELECT * FROM product_dim")

return the following error : InvalidInputException: Invalid Input Error: Secret provider 'service_principal' not found for type 'azure'

I bypass the issue when I manually force the upgrade of duckdb (without regression).

if the version of duckdb could be upgraded, YES the `if server.format == "json" is deprecated (only s3 supported and not azure (there a PR for that, but not approved. when the duckdb version as is 1.0.0 there is no support for json on Azure storage account. the https://github.com/datacontract/datacontract-cli/pull/667 should be considered

dmaresma avatar Aug 03 '25 20:08 dmaresma