Stefan Kandic

Results 14 comments of Stefan Kandic

@cloud-fan added the info on min/max stats and pushdown, not sure about what you mean with the corresponding parquet type for string with collation, AFAIK there is no such thing

@cloud-fan fixed the test failure, should be ready to merge now

> how about bucket columns? We generate the bucket id from the string value and assume all the semantically-same string values should generate the same bucket id, which isn't true...

@cloud-fan I looked into HMS code a bit, and it seems that we can't save StructField metadata there, so I guess we will still have to keep converting schema with...

will we have to do the same for pyspark - as `StringType` there only supports 4 initial collations?

@cloud-fan > For internal file source API, I think we can simply update FileFormat#supportDataType in certain formats such as CSV to return false for string with collation. So no new...

@cloud-fan I made some changes per our discussion, let me know what you think