ES|QL: Support aggregation commands on histogram fields #103060

dgieselaar · 2023-12-06T14:49:50Z

Description

Currently, running an ES|QL aggregation command on a histogram field results in an error. Aggregations should be supported on histogram fields, similar to how _search supports aggregations on histogram fields.

Use cases

In APM we use histograms to store latency distribution data (in transaction.duration.histogram). The aggregations we currently run on this field are: avg, pxx, sum, value_count.

The text was updated successfully, but these errors were encountered:

elasticsearchmachine · 2023-12-06T14:50:15Z

Pinging @elastic/es-ql (Team:QL)

elasticsearchmachine · 2023-12-06T14:50:15Z

Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL)

nik9000 · 2023-12-06T15:13:51Z

Do you have any examples of things? I can guess, but it'd be nice to have an example of the kinds of STATS you expect.

One problem with the STATS here is that ESQL allows a lot more slicing that _search does so it'd be easier to put the query into a state where it wouldn't have the data. I'm kind of imagining something like FROM foo | WHERE hostname = 'blah' | STATS PERCENTILES(bytes_out) where hostname is a field that got removed in a downsampling operation. I suppose that thing's just not supported. I guess we'd get it for free by the field just not being there. Though maybe the error message should be different? I dunno.

dgieselaar · 2023-12-06T15:30:35Z

@nik9000 AVG, SUM, MIN, MAX, Pxx. I'm not sure if I follow your example?

elasticsearchmachine · 2024-01-02T19:50:14Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

luigidellaquila · 2024-01-03T15:00:38Z

I think first of all we'll have to support histogram field type (read and output at least).
Since a histogram field is practically an object containing two arrays, I can imagine it returned as a JSON.
Supporting new field types has some cost by itself and is not trivial.

After that, we can start defining the behavior for the single agg functions, starting from min, max, count and avg.
I guess it won't be much different from how _search implements them, eg. for the sample data reported here

PUT my-index-000001
{
  "mappings" : {
    "properties" : {
      "my_histogram" : {
        "type" : "histogram"
      },
      "my_text" : {
        "type" : "keyword"
      }
    }
  }
}

PUT my-index-000001/_doc/1
{
  "my_text" : "histogram_1",
  "my_histogram" : {
      "values" : [0.1, 0.2, 0.3, 0.4, 0.5], 
      "counts" : [3, 7, 23, 12, 6] 
   }
}

PUT my-index-000001/_doc/2
{
  "my_text" : "histogram_2",
  "my_histogram" : {
      "values" : [0.1, 0.25, 0.35, 0.4, 0.45, 0.5], 
      "counts" : [8, 17, 8, 7, 6, 2] 
   }
}

I guess the ESQL usage will be something like:

from my-index* | stats max = max(my_histogram), count = count(my_histogram) by my_text;

my_text     | max    | count
histogram_1 | 0.5    | 51
histogram_2 | 0.5    | 48

where max(my_histogram) is calculated on the "values", while count(my_histogram) is the sum of the "counts".
We will have to define the behavior of each single aggregation function, but at a first look it seems pretty natural at least for the basic aggs, and we can start from this as a guideline.

Wondering if it makes sense to allow histogram fields in other commands apart from STATS. Maybe they can be used in EVAL for simple assignment (no manipulation, at least in a first phase) and KEEP/DROP, but it's hard for me to imagine how to use them in commands like SORT, ENRICH and so on.

elasticsearchmachine · 2024-01-24T09:49:00Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

not-napoleon · 2024-05-02T17:39:26Z

I suggest we wait to implement histogram support until we encode the algorithm in the field (see #108208). This will let us choose the appropriate sketch for percentiles against the histogram, at a minimum, and may influence the implementation of other aggregations.

dgieselaar added >enhancement needs:triage Requires assignment of a team area label :Analytics/ES|QL AKA ESQL labels Dec 6, 2023

elasticsearchmachine added Team:QL (Deprecated) Meta label for query languages team and removed needs:triage Requires assignment of a team area label labels Dec 6, 2023

wchaparro added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jan 2, 2024

elasticsearchmachine removed the Team:QL (Deprecated) Meta label for query languages team label Jan 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ES|QL: Support aggregation commands on histogram fields #103060

ES|QL: Support aggregation commands on histogram fields #103060

dgieselaar commented Dec 6, 2023 •

edited

elasticsearchmachine commented Dec 6, 2023

elasticsearchmachine commented Dec 6, 2023

nik9000 commented Dec 6, 2023

dgieselaar commented Dec 6, 2023

elasticsearchmachine commented Jan 2, 2024

luigidellaquila commented Jan 3, 2024

elasticsearchmachine commented Jan 24, 2024

not-napoleon commented May 2, 2024

ES|QL: Support aggregation commands on histogram fields #103060

ES|QL: Support aggregation commands on histogram fields #103060

Comments

dgieselaar commented Dec 6, 2023 • edited

Description

Use cases

elasticsearchmachine commented Dec 6, 2023

elasticsearchmachine commented Dec 6, 2023

nik9000 commented Dec 6, 2023

dgieselaar commented Dec 6, 2023

elasticsearchmachine commented Jan 2, 2024

luigidellaquila commented Jan 3, 2024

elasticsearchmachine commented Jan 24, 2024

not-napoleon commented May 2, 2024

dgieselaar commented Dec 6, 2023 •

edited