Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add generic fallback implementation for synthetic source #108222

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

lkts
Copy link
Contributor

@lkts lkts commented May 2, 2024

This PR uses infrastructure from #107567 to implement a fallback implementation of synthetic source for field mappers that don't support it natively. In that case we will store source of such field as is in a separate stored field.

Pending work:

  • Tests for more fields that now support synthetic source
  • There are differences between native synthetic source that is usually generated using doc values and fallback. The fallback is "too precise", f.e. if field value is an empty array, there is no way to know that when using doc values. When source is stored as is, it will be precisely an empty array. That leads to assumptions done in synthetic source tests not matching reality. We may want to create a different set of tests or try to replicate the behavior.
  • Unification of copy_to handling. copy_to is currently not allowed on fields when synthetic source is used, should this be applied uniformly?
  • Documentation

Note that this PR does not cover nested as it is not a FieldMapper however it can be done using similar approach separately.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine
Copy link
Collaborator

Hi @lkts, I've created a changelog YAML for you.

*/
@Override
public SourceLoader.SyntheticFieldLoader syntheticFieldLoader() {
return SourceLoader.SyntheticFieldLoader.NOTHING;
Copy link
Contributor Author

@lkts lkts May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nothing because ObjectMapper already nicely adds all ignored source fields to the source.

@@ -219,7 +229,138 @@ protected Object generateRandomInputValue(MappedFieldType ft) {

@Override
protected SyntheticSourceSupport syntheticSourceSupport(boolean ignoreMalformed) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started adding tests here because it looks like the most complex type. I am not sure how should we proceed - if this PR contains tests for all fields that now have support, it will be quite big.


@Override
protected BlockReaderSupport getSupportedReaders(MapperService mapper, String loaderFieldName) {
// TODO are we okay that we don't support synthetic source here?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of block loader stuff is still quite weak, i am not sure what's expected here.

protected boolean supportsCopyTo() {
// TODO this is so that `testSyntheticSourceInvalid` does not complain
// in theory geo_shape could support copy_to
// should we fail to construct synthetic source in `FieldMapper` if there is copy_to ?
Copy link
Contributor Author

@lkts lkts May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question here.

@lkts
Copy link
Contributor Author

lkts commented May 3, 2024

Here is another interesting observation. Currently geo_shape mapper (all of them) defines a block loader that only works with _source. It does not look like that infrastructure supports synthetic source to me. So i think that means that while we can "seamlessly" support synthetic source for geo_shape we will end up breaking ESQL.

@lkts
Copy link
Contributor Author

lkts commented May 3, 2024

I am thinking we should opt-in into this one field at a time by setting supportsSyntheticSourceNatively (maybe should rename this too).

@lkts lkts requested a review from a team as a code owner May 3, 2024 22:00
@lkts lkts force-pushed the feature/synthetic_source_catch_all branch from cb1626e to d14b959 Compare May 3, 2024 22:02
@lkts lkts requested review from martijnvg and kkrik-es and removed request for a team May 3, 2024 22:02
* If a mapper opts in to use fallback synthetic source implementation.
* @return true or false
*/
protected boolean fallbackSyntheticSourceOptIn() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If implementations overwrite this method, why would these implementations implement synthetic source natively? It is the same as supportsSyntheticSourceNatively() returning false?

Maybe we can just have supportsSyntheticSourceNatively()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a temporary flag just to be able to introduce support on a per-field basis instead of one big PR. We will remove this when we enable synthetic source support for the last field.

@lkts
Copy link
Contributor Author

lkts commented May 13, 2024

@elasticmachine update branch

*
* @return {@link SyntheticSourceMode}
*/
protected SyntheticSourceMode syntheticSourceMode() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be temporary until all fields support synthetic source natively or via fallback mechanism? If so maybe include this in the java docs here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method will remain but maybe FALLBACK variant will be removed once we apply fallback everywhere. Just for context - we need this separate piece of information because syntheticFieldLoader throws exceptions and we can't call it to determine if mapper supports synthetic source natively. If we do that, that exception will be wrapped in document parsing exception which is different from how it's currently thrown.

@lkts lkts requested review from martijnvg and kkrik-es May 17, 2024 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants