[FEATURE]Add `iplocation` function to PPL for IP address geolocation
Description:
We propose adding an geoip function to OpenSearch's Piped Processing Language (PPL) and SQL to provide built-in IP address geolocation capabilities.
This feature would be similar to functionality used in OpenSearch's geospatial feature, enhancing PPL's ability to enrich log data with geographical information based on IP addresses.
Proposed Functionality:
- The 'geoip' function should take an IP address as input and return geographical information.
- It should support both IPv4 and IPv6 addresses.
- The function should return multiple fields including country, region, city, latitude, longitude, and others as available.
- It should allow users to specify which geolocation fields to include in the output.
- The function should use a regularly updated IP geolocation database for accuracy.
Example Usage:
... | eval geolocation = geoip(ip_field)
This would add a new field 'geolocation' with all available location information for the IP address in 'ip_field'.
... | eval country = geoip(ip_field, "country")
... | eval lat = geoip(ip_field, "lat"), lon = iplocation(ip_field, "lon")
This would add new fields with specific geolocation information.
... | eval location_info = geoip(ip_field, "country,region,city,lat,lon")
This would add a new field 'location_info' with multiple pieces of geolocation data.
Additional considerations
- Allow for registering a DB table that allows resolving the IP to Geo
- Adding a generic way to register the IP to Geo location resolving mechanism / service
- Adding auth tokens for calling such service
Support for PPL iplocation function is required for both:
OpenSearch based PPL engine
- https://github.com/opensearch-project/sql/issues/3038
Spark based PPL engine
- https://github.com/opensearch-project/opensearch-spark/issues/672
Related resources
- https://github.com/opensearch-project/geospatial?tab=readme-ov-file
For the community awareness, I'm working on this.
To provide some status update here, following are the planned items, in order to achieve this IPEnrichment funcationality via. Geo-Staptial plugin:
- [x] 1. Create a client module on GeoSpatial side: https://github.com/opensearch-project/geospatial/pull/700
- [x] 2. Configure CI to publish the module: https://github.com/opensearch-project/geospatial/pull/706
- [ ] 3. Create the
geoipcommand on this project: https://github.com/opensearch-project/sql/pull/3228
Which the first two items are completed and I'm working on the actual geoip on SQL repo.
@YANG-DB The actual implementation is yet to be merged, we will need to re-open this. https://github.com/opensearch-project/sql/pull/3228