Support aggregations on scripted fields
The following code exposed a issues with scripted fields:
Assigning a new field based on concat of 2 fields failed:
> ed_flights['new_field'] = ed_flights['DestCountry'] + ed_flights['OriginCountry']
E TypeError: 'DataFrame' object does not support item assignment
Aggs on scripted fields:
s = ed_flights['DestCountry'] + ed_flights['OriginCountry']
print(s.nunique())
This returns 0 rather than the results of:
{
"aggs": {
"dc_DestCountry_OriginCountry": {
"cardinality": {
"script": {
"source": "doc['DestCountry'].value+doc['OriginCountry'].value"
}
}
}
}
}
This is probably due to aggs not works on scripted fields unless script is in the agg. i.e.
GET flights/_search
{
"script_fields": {
"dc_DestCountry_OriginCountry": {
"script": {
"source": "doc['DestCountry'].value+doc['OriginCountry'].value"
}
}
},
"aggs": {
"dc_DestCountry_OriginCountry": {
"cardinality": {
"field": "dc_DestCountry_OriginCountry"
}
}
}
}
returns 0 results.
@stevedodson / @sethmlarson I have some queries on this
- So
s = ed_flights['DestCountry'] + ed_flights['OriginCountry']and using script insideaggsis working for cardinality, for others it is throwing errors, saying
{
"error" : {
"root_cause" : [
{
"type" : "aggregation_execution_exception",
"reason" : "Unsupported script value [AUDE], expected a number, date, or boolean"
}
],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
{
"shard" : 0,
"index" : "flights",
"node" : "iTVEH5vHR2mOijrij0Qc5g",
"reason" : {
"type" : "aggregation_execution_exception",
"reason" : "Unsupported script value [AUDE], expected a number, date, or boolean"
}
}
]
},
"status" : 500
- If we are using certain aggregation on text fields, eland should be throwing error saying, can't apply aggregation on text field or not supported. Is this handled in eland ? or should be implemented ?
- If we perform aggregation on two numeric concat fields
x = ed_df['DistanceKilometers'] + ed_df['DistanceMiles']
curl -X GET "localhost:9200/flights/_search?pretty" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"cardinality_scripted": {
"cardinality": {
"script": {
"source": "doc[\"DistanceKilometers\"].value+doc[\"DistanceMiles\"].value"
}
}
},
"min_scripted": {
"sum": {
"script": {
"source": "doc[\"DistanceKilometers\"].value+doc[\"DistanceMiles\"].value"
}
}
}
}
}
'
This works for all aggs. - Do we have to include script in every aggregation that is performed, Is this efficient ? Or should we use this Scripted Metric Aggregation ?
Please give me some inputs 😄
@V1NAY8 For some scripts we probably can't aggregate (like you've found with text). We should add test cases for the scripted fields we currently support to make sure we maintain support. This is one area that we need more test case coverage :)
Maybe you can start by adding some test cases and figuring out exactly what currently works and what doesn't then we can go from there?
Yes, I can do that. I'll add test cases and try to implement moving scripts inside aggs. 😄