[RFC] PPL Field Ordering

## 1. Summary

PPL allows specifying field order using names and wildcards.
Current Calcite PPL implementation preserve field order as row type, but it would not work when dynamic fields are introduced by https://github.com/opensearch-project/sql/issues/4984 since all the dynamic fields will be stored in single `_MAP` field.
This RFC proposes a method to produce query result with expected field order even when the result contains dynamic fields.

## 2. Motivation

Users expect deterministic output data layout, especially when exporting data.

We would like to replicate the Splunk behavior so that existing Splunk users can easily onboard OpenSearch PPL.

## 3. Expected behavior of field ordering

- Default: fields sorted UTF-8 lexicographically.
- `fields`: replaces ordering; affects later fields.
- `stats` / `timechart`: define their own output order.
- `join`: ignores right-side ordering.
- `rename`: modify field order (replace fields in field order)

## 4. Proposed Solution

1. Decide a field order based on the query.
  - Field order is a list of string, which contains specific field name and/or wild card notation (e.g. `abc`, `*`, `prefix*`, etc.)
  - The last command which emit field order will decide the query's field order. (The field order specified earlier command will be ignored.)
  - Only a few commands emit field order (`fields`, `stats`, `timechart`, `rename`)

2. Once query execution is done, reorganize the result by using the field order from step 1
  - Come up with the list of all the available fields in dynamic fields
  - Reorder fields based on the field order from step 1
  - Fields will be sorted lexicographically for wild card pattern and remaining fields

Examples:
```
source=idx | eval a=1, b=2 | fields a, * | eval c=3 | fields b, *, a
# Field order => [b, *, a]
# Result schema => [b, c, a]

source=idx | eval a=1, b=2, bc=3, bd=4 | fields a, bd, b*
# Field order => [a, bd, b*]
# Result schema => [a, bd, b, bc]

source=idx | eval a=1, b=2, bc=3, bd=4 | fields *, a
# Field order => [*, a]
# Result schema => [b, bc, bd, a]

source=idx | eval a=1, b=2, bc=3, bd=4 | fields b*, a | eval bx=5
# Field order => [b*, a]
# Result schema => [b, bc, bd, bx, a]

source=idx | eval a=1, b=2 | stats count as c by b, a 
# Field order => [b, a, c]
# Result schema => [b, a, c]

source=idx | eval a=1, b=2 | stats count as c by b, a | fields *
# Field order => [*]
# Result schema => [a, b, c]
```

## 5. Difference from current spec

This method causes discrepancy with current Calcite PPL logic where:
1. Current logic uses field order in the input schema as default
2. Current logic append new fields at the end when a command adds fields (some command specify fields)
3. Current logic applies wild card to the fields available when `fields` command is applied

Examples:
```
# suppose idx schema = [id, name, address]

source=idx | eval a=1
# current logic -> [id, name, address, a]
# new logic -> [a, address, id, name]

source=idx | fields a*, name, * | eval a=1
# current logic -> [address, name, id, a]  (`a` comes at the end as it was added after `fields` command)
# new logic -> [a, address, name, id] (`a` comes first as it matches `a*`)
```

**Option1**: Apply new logic to all the query
  - Consistent behavior among all the queries
  - It could change the result layout for existing query

**Option2(recommended)**: Apply new logic only when result contains dynamic fields
  - It only affect the queries executable with dynamic fields feature (No surprise to existing users).
  - Could be confusing as it change behavior when dynamic field is introduced.
  - Apply new logic to all the query in the next major version

## 6. Alternative Solutions considered

- Store field order information in each data record
  - Not efficient


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] PPL Field Ordering #5045

1. Summary

2. Motivation

3. Expected behavior of field ordering

4. Proposed Solution

5. Difference from current spec

6. Alternative Solutions considered

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] PPL Field Ordering #5045

Description

1. Summary

2. Motivation

3. Expected behavior of field ordering

4. Proposed Solution

5. Difference from current spec

6. Alternative Solutions considered

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions