eland icon indicating copy to clipboard operation
eland copied to clipboard

Support aggregations on scripted fields

Open stevedodson opened this issue 5 years ago • 3 comments

The following code exposed a issues with scripted fields:

Assigning a new field based on concat of 2 fields failed:

>       ed_flights['new_field'] = ed_flights['DestCountry'] + ed_flights['OriginCountry']
E       TypeError: 'DataFrame' object does not support item assignment

Aggs on scripted fields:

        s = ed_flights['DestCountry'] + ed_flights['OriginCountry']
        print(s.nunique())

This returns 0 rather than the results of:

{
  "aggs": {
    "dc_DestCountry_OriginCountry": {
      "cardinality": {
        "script": {
          "source": "doc['DestCountry'].value+doc['OriginCountry'].value"
        }
      }
    }
  }
}

This is probably due to aggs not works on scripted fields unless script is in the agg. i.e.

GET flights/_search
{
  "script_fields": {
    "dc_DestCountry_OriginCountry": {
      "script": {
        "source": "doc['DestCountry'].value+doc['OriginCountry'].value"
      }
    }
  }, 
  "aggs": {
    "dc_DestCountry_OriginCountry": {
      "cardinality": {
        "field": "dc_DestCountry_OriginCountry"
      }
    }
  }
}

returns 0 results.

stevedodson avatar Aug 25 '20 12:08 stevedodson

@stevedodson / @sethmlarson I have some queries on this

  1. So s = ed_flights['DestCountry'] + ed_flights['OriginCountry'] and using script inside aggs is working for cardinality, for others it is throwing errors, saying
{
  "error" : {
    "root_cause" : [
      {
        "type" : "aggregation_execution_exception",
        "reason" : "Unsupported script value [AUDE], expected a number, date, or boolean"
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "flights",
        "node" : "iTVEH5vHR2mOijrij0Qc5g",
        "reason" : {
          "type" : "aggregation_execution_exception",
          "reason" : "Unsupported script value [AUDE], expected a number, date, or boolean"
        }
      }
    ]
  },
  "status" : 500
  • If we are using certain aggregation on text fields, eland should be throwing error saying, can't apply aggregation on text field or not supported. Is this handled in eland ? or should be implemented ?
  1. If we perform aggregation on two numeric concat fields x = ed_df['DistanceKilometers'] + ed_df['DistanceMiles']
curl -X GET "localhost:9200/flights/_search?pretty" -H 'Content-Type: application/json' -d'
{
    "size": 0,
  "aggs": {
    "cardinality_scripted": {
      "cardinality": {
        "script": {
          "source": "doc[\"DistanceKilometers\"].value+doc[\"DistanceMiles\"].value"
        }
      }
    }, 
    "min_scripted": {
      "sum": {
        "script": {
          "source": "doc[\"DistanceKilometers\"].value+doc[\"DistanceMiles\"].value"
        }
      }
  }
}
}
'

This works for all aggs. - Do we have to include script in every aggregation that is performed, Is this efficient ? Or should we use this Scripted Metric Aggregation ?

Please give me some inputs 😄

V1NAY8 avatar Oct 27 '20 12:10 V1NAY8

@V1NAY8 For some scripts we probably can't aggregate (like you've found with text). We should add test cases for the scripted fields we currently support to make sure we maintain support. This is one area that we need more test case coverage :)

Maybe you can start by adding some test cases and figuring out exactly what currently works and what doesn't then we can go from there?

sethmlarson avatar Oct 27 '20 13:10 sethmlarson

Yes, I can do that. I'll add test cases and try to implement moving scripts inside aggs. 😄

V1NAY8 avatar Oct 27 '20 14:10 V1NAY8