dataform icon indicating copy to clipboard operation
dataform copied to clipboard

Support for BigQuery User-Defined Functions (UDFs)

Open dirkjonker opened this issue 4 years ago • 7 comments

It would be useful to be able to create User-Defined Functions (UDF) from dataform, and being able to reference them like you reference a table. Right now I create UDFs in the pre_operations section, but I don't have syntax highlighting or ability to test it. Also calling the UDFs from a query is a bit hacky now, as you need to fully qualify the function name (project_name.dataset.function_name) in order to ensure you are calling the right function.

Bonus for being able to unit-test Javascript UDFs!

dirkjonker avatar Nov 25 '21 15:11 dirkjonker

You can kind of do this already:

// my_udf.sqlx
config {
  type: "operations",
  hasOutput: true,
}

create function ${self()}(x INT64) ...

And then you can reference it:

// my_table.sqlx
config { type: "table" }
select ${ref("my_udf")}(column) from ...

This is a bit of a workaround, but it does mean the UDFs are part of the graph, by storing them as permanent functions inside BigQuery.

This doesn't address your points on unit-testing however!

lewish avatar Nov 26 '21 09:11 lewish

@lewish thanks, that's really useful!

Now I'm just missing the syntax highlighting and automatic formatting that respects the Javascript in the UDF string ;)

dirkjonker avatar Nov 30 '21 07:11 dirkjonker

I do get query validation warnings: "Some dependencies don't yet exist in the warehouse". It seems like Dataform doesn't know that the functions exist, even though it can successfully create them. It's just a warning though, so it doesn't block any dependent resources from being created, but it's a false warning which is inconvenient.

dirkjonker avatar Nov 30 '21 11:11 dirkjonker

Just tried this, but dataform did not recognize the function until it was executed and persisted in the database.

sanimesa avatar Dec 20 '22 04:12 sanimesa

I was looking into this as a data engineer at a company using dbt, who helped develop our dbt tooling and maintain it; this would be a somewhat compelling reason to make a switch from dbt, since dbt also lacks this feature. We use UDFs to share functionality across models. If dataform had first class UDF support and a proper dry run feature (dry running incremental models, for example) it'd be an easy sell to make the switch.

davidsr2r avatar Feb 17 '23 00:02 davidsr2r

To expand on this, I would love to see some examples of where the functionality of Javascript UDFs can't be achieved without them.

Having logic computed at runtime in BigQuery removes a lot of the benefits of the determinism of a compiled Dataform graph - so I'm trying to understand the motivations of them better.

Ekrekr avatar Mar 27 '24 10:03 Ekrekr