paimon icon indicating copy to clipboard operation
paimon copied to clipboard

[Feature][core] Implement more type conversions for IcebergConversions

Open tsreaper opened this issue 1 year ago • 14 comments

Search before asking

  • [X] I searched in the issues and found nothing similar.

Motivation

In #3731 we've introduced IcebergCommitCallback to create Iceberg metadata after commit. We convert Paimon objects to Iceberg byte buffers in IcebergConversions, however it currently only supports a few types and does not support types like timestamp and time.

Solution

Support timestamp and time conversions in IcebergConversions.

Anything else?

No response

Are you willing to submit a PR?

  • [ ] I'm willing to submit a PR!

tsreaper avatar Jul 20 '24 06:07 tsreaper

I'm willing to submit a PR!

dbac avatar Jul 28 '24 10:07 dbac

Do this according to the following specifications? time | Stores microseconds from midnight in an 8-byte little-endian long timestamp | Stores microseconds from 1970-01-01 00:00:00.000000 in an 8-byte little-endian long

then,like this:

case TIME: TIMESTAMP: return ByteBuffer.allocate(8) .order(ByteOrder.LITTLE_ENDIAN) .putLong(0, (long) value);

dbac avatar Jul 28 '24 10:07 dbac

I'm willing to submit a PR!

Thanks for your interest. Please note that there are two types of timestamp in Paimon: TimestampType and LocalZonedTimestampType. TimestampType is timestamptz in Iceberg, while LocalZonedTimestampType is timestamp in Iceberg. You also need to be careful with the precesion of timestamp, because Iceberg has timestamp and timestamp_ns.

All in all you need to write tests carefully, especially for timestamp.

tsreaper avatar Jul 29 '24 03:07 tsreaper

I want to be simple Whether there are similar code references? @tsreaper

dbac avatar Jul 29 '24 13:07 dbac

image I don't know that Object value is string ? or Timestamp ? mabye Timestamp ? then change to buffer?

dbac avatar Jul 29 '24 16:07 dbac

The Paimon object for TimestampType and LocalZonedTimestampType is org.apache.paimon.data.Timestamp. It stores millisecond and nanoOfMillisecond in the object.

For TimestampType, this millisecond is the time starting from 1970-01-01 00:00:00 UTC, so it is timestamptz in Iceberg. For LocalZonedTimestampType, this millisecond is the time starting from 1970-01-01 00:00:00 local timezone, so it is timestamp in Iceberg.

Also you need to check if precision <= 6, if yes it is timestamp or timestamptz; Otherwise if precision > 6 its timestamp_ns or timestamptz_ns.

Don't forget to add complete tests for each timestamp type and different precisions.

tsreaper avatar Jul 30 '24 02:07 tsreaper

image It seems a bit complicated icerberg does not have local-timestamp-micros ,but paimon has local-timestamp-micros, Maybe I need to learn more about paimon image

dbac avatar Jul 30 '24 14:07 dbac

Just give me a little time. I have to fix it

dbac avatar Jul 30 '24 15:07 dbac

"local-timestamp-micros" is not a data type in Paimon. It is a data type in avro. Iceberg's avro reader has to deal with it. However for this issue, you only need to convert between Paimon types and Iceberg types, so you don't need to consider about avro types.

tsreaper avatar Jul 31 '24 02:07 tsreaper

image I have completed it, but nanoseconds cannot be tested, because the avro format in the test case cannot support precisions greater than 6 I don't know if my code idea is correct. Please review the code

dbac avatar Jul 31 '24 08:07 dbac

@tsreaper Can I submit PR?

dbac avatar Aug 01 '24 01:08 dbac

@tsreaper Can I submit PR?

Sure, feel free to submit.

tsreaper avatar Aug 01 '24 02:08 tsreaper

@tsreaper Please help to review my pr, thank you very much

dbac avatar Aug 02 '24 03:08 dbac

@tsreaper next step? With that done, I'm ready to continue contributing

dbac avatar Aug 05 '24 03:08 dbac