datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

add `udtf` to `FunctionRegistry

Open universalmind303 opened this issue 2 years ago • 7 comments

Is your feature request related to a problem or challenge?

currently only the session context has udtf function. I think that this should instead be part of the FunctionRegistry like the rest of the udf variants

Describe the solution you'd like

add register_udtf and udtf to FunctionRegistry

Describe alternatives you've considered

No response

Additional context

No response

universalmind303 avatar Feb 12 '24 14:02 universalmind303

I can do this one

Lordworms avatar Feb 12 '24 15:02 Lordworms

I vaguely remember there was some reason table functions were somewhat different, but if the PR works, I think this would be a good idea. It should be straightforward to add

alamb avatar Feb 12 '24 22:02 alamb

As it came up on https://github.com/apache/arrow-datafusion/pull/9226, here is what I think this ticket requires:

  1. Add a function to FunctionRegistry https://github.com/apache/arrow-datafusion/blob/85be1bc5538661b58ca90fb1e042d8d840c9f693/datafusion/execution/src/registry.rs#L26
    /// Register a table UDF with this context
    pub fn register_udtf(&self, name: &str, fun: Arc<dyn TableFunctionImpl>) -> Result<Arc<dyn TableFunctionImpl>> {
       // default return not implemented error 
    }

Implement that trait method for SessionContext

alamb avatar Feb 15 '24 14:02 alamb

I may know why this has not been realized before, since the TableFunctionImpl is in datafusion crate and if you want to add dependencies in execution, it would cause the cycle-depend problem. When we encounter such problem, how could we do? Make TableFunctionImpl an another crate? Hope you could give some hints, thanks! @alamb

Lordworms avatar Feb 18 '24 01:02 Lordworms

I may know why this has not been realized before, since the TableFunctionImpl is in datafusion crate and if you want to add dependencies in execution, it would cause the cycle-depend problem. When we encounter such problem, how could we do? Make TableFunctionImpl an another crate? Hope you could give some hints, thanks! @alamb

Yes, that was it! Thank you @Lordworms

Normally I think we would try and move TableFunctionImpl into another crate, as we did for ScalarUDF -- which is in datafusion_expr

I think TableFunctionImpl isn't hard to move, but TableProvider would be hard as it depends on ExecutionPlan which has a bunch of dependencies on the various physical expr, functions, etc. 🤔

I am not sure how best to proceed without a larger refactor to pull table provider / execution plan into some other crate...

alamb avatar Feb 19 '24 07:02 alamb

I may know why this has not been realized before, since the TableFunctionImpl is in datafusion crate and if you want to add dependencies in execution, it would cause the cycle-depend problem. When we encounter such problem, how could we do? Make TableFunctionImpl an another crate? Hope you could give some hints, thanks! @alamb

Yes, that was it! Thank you @Lordworms

Normally I think we would try and move TableFunctionImpl into another crate, as we did for ScalarUDF -- which is in datafusion_expr

I think TableFunctionImpl isn't hard to move, but TableProvider would be hard as it depends on ExecutionPlan which has a bunch of dependencies on the various physical expr, functions, etc. 🤔

I am not sure how best to proceed without a larger refactor to pull table provider / execution plan into some other crate...

maybe restructure the file in the future? Since I think right now it is hard to do this....

Lordworms avatar Feb 19 '24 14:02 Lordworms

maybe restructure the file in the future? Since I think right now it is hard to do this....

I agree -- this is not a small project. I would recommend maybe not proceeding with this ticket for now unless we have the time and ambition to figure out the new structure. Thank you for looking into it

alamb avatar Feb 20 '24 07:02 alamb