[FEATURE]Add `iplocation` function to PPL for IP address geolocation
Description:
We propose adding an geoip function to OpenSearch's Piped Processing Language (PPL) and SQL to provide built-in IP address geolocation capabilities.
This feature would be similar to functionality used in OpenSearch's geospatial feature, enhancing PPL's ability to enrich log data with geographical information based on IP addresses.
Proposed Functionality:
- The 'geoip' function should take an IP address as input and return geographical information.
- It should support both IPv4 and IPv6 addresses.
- The function should return multiple fields including country, region, city, latitude, longitude, and others as available.
- It should allow users to specify which geolocation fields to include in the output.
- The function should use a regularly updated IP geolocation database for accuracy.
Example Usage:
... | eval geolocation = geoip(ip_field)
This would add a new field 'geolocation' with all available location information for the IP address in 'ip_field'.
... | eval country = geoip(ip_field, "country")
... | eval lat = geoip(ip_field, "lat"), lon = iplocation(ip_field, "lon")
This would add new fields with specific geolocation information.
... | eval location_info = geoip(ip_field, "country,region,city,lat,lon")
This would add a new field 'location_info' with multiple pieces of geolocation data.
Additional considerations
- Allow for using the
geospatialopensearch plugin for the ip to geo resolving
Related resources
- https://github.com/opensearch-project/geospatial?tab=readme-ov-file
Am in the process of implementing this
Hi @YANG-DB ,
What was the intended method of leveraging the geospatial plugin?
Following the example of the inclusion of the job-scheduler and ml-commons plugin, I have been trying to import it directly into the project but noticed that the published geospatial plugin on maven has no jar. As such it does not seem possible to directly import the plugin. Is this assumption correct?
If so then, my current plan is to call the endpoint that the geospatial plugin exposes in OpenSearch documented here and communicate with it using the OpenSearchRestClient. Would this be a good path forward? or am I missing something that would make it possible to expose the geospatial plugin?
Thanks!
Hi, @YANG-DB After a few discovery and feasibility checks, we have updated our approach, the below are the high-level plan along with the proposed code changes. Can you have a look and advise?
High-level idea:
- Create a new ActionType with logic on
Geo-Spatialplugin to expose a new action which takes a IP string and return the appropriate Geo detail. - Update existing
SQLplugin accordingly to invoke the call on nodeClient with the newly createdGeo-Spatialaction for thegeoipfunction.
Proposed code changes:
GeoSpatial:
- Create a new
TransportActionand register it accordingly:- Create a new
TransportActionclass , which is similar toGetDatasourceTransportActionand the sole purpose of this Action is to process an incoming IP String, with the given provider, then return the appropriate geoSpatial detail fields. - Update
GeospatialPlugin.getAction( )class to register the newly created action.
- Create a new
- Create a new sub-module with name
geo-spatial-clientwhich has thenodeClientWrapperas the wrapper for the cross-plugin interaction interface, a fewActionTypealong with appropriate wrapper object to form the API signature for the return type. - Update Gradle script to publish geo-spatial-client module as a separate jar.
SQL module:
- Update Gradle setting to import
geo-spatial-clientinto OpenSearch sub-module. - Override the existing
EvalOpeartorprocessing logic:- Create a new
OpenSearchEvalOperatorclass which extends from the existingEvalOperatorwith an additional class propertyNodeClient. - Update
OpernSearchIndexclass to override thevisitEval( )method, and return a newOpenSearchEvalOperatorinstance instead. - Update the
OpenSearchEvalOperatorto perform the following logic when processinggeoipfunction:- Reading the incoming ip string
- Invoke a call on
nodeClientwith appropriate arguments and timeout value - Marshal the response and update the
evalMapaccordingly
- Create a new
we don't even have a way to do basic Ip address lookups, why are you guys working on the next level before even having a basic way to query ip field type??
Hi @kedbirhan, thanks for the feedback and indeed that make sense.
For now we are only proposing the high-level changes required for the functionality but not yet reach to the implementation phase.
I believe by the time we have the design gathered for this ticket, https://github.com/opensearch-project/sql/issues/3145 should already be wrapped to have the IP type support.
Thanks,
@andy-k-improving I really like this idea - can you please create an RFC for the Geospatial plugin suggesting this change? We would need their feedback
@andy-k-improving I really like this idea - can you please create an RFC for the Geospatial plugin suggesting this change? We would need their feedback
@YANG-DB see below for the RFC on Geo spatial side. https://github.com/opensearch-project/geospatial/issues/698 I will proceed to work on the implementation on GeoSpatial side.
The actual PR on SQL repo: https://github.com/opensearch-project/sql/pull/3228
resolved by https://github.com/opensearch-project/sql/pull/3604