datachain icon indicating copy to clipboard operation
datachain copied to clipboard

Feature request: Automatically cast merge keys

Open tibor-mach opened this issue 1 year ago • 1 comments

Description

Currently, the .merge method of DataChain expects both keys to be of the same type. This makes sense, but it would improve developer's quality of life a lot of e.g. it managed to automatically recast int to str or vice versa when one key is int and the other is str

tibor-mach avatar Oct 14 '24 08:10 tibor-mach

@tibor-mach do you know if the regular databases do this? how the result column type is defined?

shcheklein avatar Oct 14 '24 16:10 shcheklein

@shcheklein Not sure about databases, but in pandas this actually fails just as in DataChain. I was a bit surprised by that since I remembered it to work which was probably because in Spark this does work.

The resulting key is automatically recast as an integer there.

I am less sure this is so critical when pandas does not support this either. Spark has it, but since it is not an ubiquitous feature we can probably consider it "nice to have" now.

tibor-mach avatar Oct 15 '24 13:10 tibor-mach

Let's close this for now then. I think it's better probably to have some features upstream to have the right types in the first place tbh.

shcheklein avatar Oct 17 '24 18:10 shcheklein