Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support aggregations on scripted fields #267

Open
stevedodson opened this issue Aug 25, 2020 · 3 comments · May be fixed by #383
Open

Support aggregations on scripted fields #267

stevedodson opened this issue Aug 25, 2020 · 3 comments · May be fixed by #383
Labels
enhancement New feature or request help wanted Solution is fleshed out and ready to be worked on topic:dataframe Issue or PR about eland.DataFrame
Milestone

Comments

@stevedodson
Copy link
Contributor

The following code exposed a issues with scripted fields:

Assigning a new field based on concat of 2 fields failed:

>       ed_flights['new_field'] = ed_flights['DestCountry'] + ed_flights['OriginCountry']
E       TypeError: 'DataFrame' object does not support item assignment

Aggs on scripted fields:

        s = ed_flights['DestCountry'] + ed_flights['OriginCountry']
        print(s.nunique())

This returns 0 rather than the results of:

{
  "aggs": {
    "dc_DestCountry_OriginCountry": {
      "cardinality": {
        "script": {
          "source": "doc['DestCountry'].value+doc['OriginCountry'].value"
        }
      }
    }
  }
}

This is probably due to aggs not works on scripted fields unless script is in the agg. i.e.

GET flights/_search
{
  "script_fields": {
    "dc_DestCountry_OriginCountry": {
      "script": {
        "source": "doc['DestCountry'].value+doc['OriginCountry'].value"
      }
    }
  }, 
  "aggs": {
    "dc_DestCountry_OriginCountry": {
      "cardinality": {
        "field": "dc_DestCountry_OriginCountry"
      }
    }
  }
}

returns 0 results.

@sethmlarson sethmlarson added help wanted Solution is fleshed out and ready to be worked on topic:dataframe Issue or PR about eland.DataFrame enhancement New feature or request labels Aug 25, 2020
@V1NAY8
Copy link
Contributor

V1NAY8 commented Oct 27, 2020

@stevedodson / @sethmlarson
I have some queries on this

  1. So s = ed_flights['DestCountry'] + ed_flights['OriginCountry'] and using script inside aggs is working for cardinality, for others it is throwing errors, saying
{
  "error" : {
    "root_cause" : [
      {
        "type" : "aggregation_execution_exception",
        "reason" : "Unsupported script value [AUDE], expected a number, date, or boolean"
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "flights",
        "node" : "iTVEH5vHR2mOijrij0Qc5g",
        "reason" : {
          "type" : "aggregation_execution_exception",
          "reason" : "Unsupported script value [AUDE], expected a number, date, or boolean"
        }
      }
    ]
  },
  "status" : 500
  • If we are using certain aggregation on text fields, eland should be throwing error saying, can't apply aggregation on text field or not supported. Is this handled in eland ? or should be implemented ?
  1. If we perform aggregation on two numeric concat fields
    x = ed_df['DistanceKilometers'] + ed_df['DistanceMiles']
curl -X GET "localhost:9200/flights/_search?pretty" -H 'Content-Type: application/json' -d'
{
    "size": 0,
  "aggs": {
    "cardinality_scripted": {
      "cardinality": {
        "script": {
          "source": "doc[\"DistanceKilometers\"].value+doc[\"DistanceMiles\"].value"
        }
      }
    }, 
    "min_scripted": {
      "sum": {
        "script": {
          "source": "doc[\"DistanceKilometers\"].value+doc[\"DistanceMiles\"].value"
        }
      }
  }
}
}
'

This works for all aggs.
- Do we have to include script in every aggregation that is performed, Is this efficient ?
Or should we use this Scripted Metric Aggregation ?

Please give me some inputs 😄

@sethmlarson
Copy link
Contributor

@V1NAY8 For some scripts we probably can't aggregate (like you've found with text). We should add test cases for the scripted fields we currently support to make sure we maintain support. This is one area that we need more test case coverage :)

Maybe you can start by adding some test cases and figuring out exactly what currently works and what doesn't then we can go from there?

@V1NAY8
Copy link
Contributor

V1NAY8 commented Oct 27, 2020

Yes, I can do that. I'll add test cases and try to implement moving scripts inside aggs. 😄

@sethmlarson sethmlarson added this to the Next milestone Jan 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Solution is fleshed out and ready to be worked on topic:dataframe Issue or PR about eland.DataFrame
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants