Stefan Kandic
Stefan Kandic
@cloud-fan added the info on min/max stats and pushdown, not sure about what you mean with the corresponding parquet type for string with collation, AFAIK there is no such thing
@cloud-fan fixed the test failure, should be ready to merge now
> how about bucket columns? We generate the bucket id from the string value and assume all the semantically-same string values should generate the same bucket id, which isn't true...
@cloud-fan please take a look when you have the time
@cloud-fan all checks passing, can we merge this?
@cloud-fan I looked into HMS code a bit, and it seems that we can't save StructField metadata there, so I guess we will still have to keep converting schema with...
will we have to do the same for pyspark - as `StringType` there only supports 4 initial collations?
@cloud-fan Please take a look when you find the time
@cloud-fan > For internal file source API, I think we can simply update FileFormat#supportDataType in certain formats such as CSV to return false for string with collation. So no new...
@cloud-fan I made some changes per our discussion, let me know what you think