Support User-defined Functions
Tracking issue for UDFs. Please include examples what you'd like to accomplish to help guide us!
A simple example is a median-- there is no support for it in postgres without window function of that i am aware of. One might want to implement more complex predictive functions later on.
Just a heads up, supporting user-defined aggregates is a much more substantial project than supporting user-defined scalar functions.
We're definitely not going to be supporting the exact CREATE FUNCTION or CREATE AGGREGATE statements that PostgreSQL supports, but we will need to do something similar in spirit. I bring this up to point out that creating an aggregate in PostgreSQL is a lot more complicated, since you need to specify a couple different pieces to describe the initial state of the reduction, how to incorporate each new row into the reduction, and the final state of the reduction.
The streaming nature of Materialize makes generalizing aggregate functions efficiently even harder. A UDF for median is quite likely to be very slow, possibly unacceptably slow.
I discussed the specific case of median with @frankmcsherry just now, and if your requirements would instead allow for an approximate CDF (rather than the exact median), that's something we can look into supporting in Materialize proper.
More, if you are comfortable with an approximate cdf (I use HdrHistograms a lot myself, for example) then you can implement this today in user code. We do something similar with our internal measurements, where we track latencies by the counts for each power-of-two latency bucket. You would just bucket-ize the value of interest, and maintain counts for each bucket.
Happy to see how you are doing this with today's capabilities. That being said, I am thinking ahead and at some point we ll end up with a lot of SQL queries all over the place and I would like to have some code reuse strategy such as module import for example. How do you handle that today ? Especially if you start parameterizing these said queries.
It would be really great to be able to write transform/reduce functions using WASM.
Just an idea and I'm not sure whether it's practical: There could an opportunity for Materialize to provide some kind of composition-based SDK, that would allow for further optimization in some cases.
I've written up the plan of attack for supporting a limited form of SQL UDFs here: #5112.
Request from Hack Day for tracking: https://materializecommunity.slack.com/archives/C031905K62Y/p1645126112621199
Any updates on what the future holds for this feature?
It's still on the roadmap, but likely not anytime soon, sorry!