parquet-java icon indicating copy to clipboard operation
parquet-java copied to clipboard

PARQUET-2471: Add support for geometry logical type

Open wgtmac opened this issue 1 year ago • 5 comments

This is the PoC of https://github.com/apache/parquet-format/pull/240

Jira

  • [ ] My PR addresses the following Parquet Jira issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR"
    • https://issues.apache.org/jira/browse/PARQUET-XXX
    • In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.

Tests

  • [ ] My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Style

  • [ ] My contribution adheres to the code style guidelines and Spotless passes.
    • To apply the necessary changes, run mvn spotless:apply -Pvector-plugins

Documentation

  • [ ] In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain Javadoc that explain what it does

wgtmac avatar Jun 19 '24 16:06 wgtmac

Thanks @paleolimbot for the quick review! I've addressed your feedback and also added page index support for geometry stats. BTW, it is a convention that we need PoC on another Parquet implementation to move forward the proposal. I plan to add similar things to parquet-cpp. Do you happen to know any JTS-parity on the C++ side? Or do you have a plan to implement the PoC? I guess there should be a lot of existing code on the GeoParquet side.

wgtmac avatar Jun 22 '24 15:06 wgtmac

@wgtmac The JTS-parity on C++ side is GEOS, which is a C++ port of JTS: https://github.com/libgeos/geos

Note that: GEOS is LGPL 2.1, which is NOT compatible with ASF

jiayuasu avatar Jun 23 '24 05:06 jiayuasu

Core sedona contributors @Kontinuation and @zhangfengcdt , can you also review this?

@wgtmac Thanks for putting together this PR. I guess you might need some help from the Sedona community regarding the Java geometry implementation together with JTS. We can work together to implement this POC.

jiayuasu avatar Jun 23 '24 05:06 jiayuasu

Thanks @jiayuasu for your valuable feedback! Yes, it would be great if we can get direct help from geospatial experts!

wgtmac avatar Jun 23 '24 14:06 wgtmac

Do you happen to know any JTS-parity on the C++ side?

As Jia mentioned, it's GEOS, but this is not something the C++ implementation can/should take on as a dependency.

I guess there should be a lot of existing code on the GeoParquet side.

GDAL has quite a lot to draw from ( https://github.com/OSGeo/gdal/tree/master/ogr/ogrsf_frmts/parquet ), although I think pretty much all we need to do is walk WKB for bounding boxes and geometry type ( https://github.com/OSGeo/gdal/blob/12dab86ca2d8b1a65c4c085e137c62294682ac1d/ogr/ogr_wkb.cpp#L387-L584 ).

Or do you have a plan to implement the PoC?

I am happy to give it a shot (I'll make a start this week).

paleolimbot avatar Jun 24 '24 01:06 paleolimbot

Close this in favor of https://github.com/apache/parquet-java/pull/2971

wgtmac avatar Sep 30 '24 02:09 wgtmac