Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency between Eland mapping and Elastic Mapping when appending to an index with . in the column names #418

Open
Ashton-Sidhu opened this issue Dec 8, 2021 · 1 comment · May be fixed by #424
Labels
bug Something isn't working help wanted Solution is fleshed out and ready to be worked on

Comments

@Ashton-Sidhu
Copy link
Contributor

Eland version: 7.14.1b1
Elasticsearch version: 7.15.1

Issue

If you have a Pandas Dataframe with the columns file.hash.sha256, event.id, process.name, label and do:

ed.pandas_to_eland(
    df,
    es_dest_index=index,
    es_if_exists="append",
    es_refresh=True,
    use_pandas_index_for_es_ids=False
)

it will succeed the first time. However if you take the same dataframe with the same data and try to insert it a second time, you will get the following error:

 File "/Users/sidhuas/protections-cloud/tools/artifacts/rapid-exception-list/rapid_exception_list.py", line 389, in add_shas_to_rapid_exception_list
    ed.pandas_to_eland(
  File "/Users/sidhuas/.pyenv/versions/3.9.1/envs/cloudprotection/lib/python3.9/site-packages/eland/etl.py", line 179, in pandas_to_eland
    verify_mapping_compatibility(
  File "/Users/sidhuas/.pyenv/versions/3.9.1/envs/cloudprotection/lib/python3.9/site-packages/eland/field_mappings.py", line 921, in verify_mapping_compatibility
    raise ValueError(
ValueError: DataFrame dtypes and Elasticsearch index mapping aren't compatible:
- 'event' is missing from DataFrame columns
- 'file' is missing from DataFrame columns
- 'process' is missing from DataFrame columns
- 'event.id' is missing from ES index mapping
- 'file.hash.sha256' is missing from ES index mapping
- 'process.name' is missing from ES index mapping

If you print out the eland index vs. the elastic index you get the following:

Eland:

{
   "mappings":{
      "properties":{
         "file.hash.sha256":{
            "type":"keyword"
         },
         "process.name":{
            "type":"keyword"
         },
         "event.id":{
            "type":"keyword"
         },
         "event.module":{
            "type":"keyword"
         },
         "label":{
            "type":"double"
         }
      }
   }
}

Elastic (created when Eland appends for the first time):

{
   "mappings":{
      "properties":{
         "event":{
            "properties":{
               "id":{
                  "type":"keyword"
               }
            }
         },
         "file":{
            "properties":{
               "hash":{
                  "properties":{
                     "sha256":{
                        "type":"keyword"
                     }
                  }
               }
            }
         },
         "label":{
            "type":"double"
         },
         "process":{
            "properties":{
               "name":{
                  "type":"keyword"
               }
            }
         }
      }
   }
}

This makes it hard to use Eland when using the Elastic Common Schema

Expected Behaviour

The data should be appended to the index without issue.

@sethmlarson
Copy link
Contributor

This looks like a bug to me, thanks for opening! Specifically I think we need to handle nested properties inside of eland.field_mappings.verify_mapping_compatibility().

@sethmlarson sethmlarson added bug Something isn't working help wanted Solution is fleshed out and ready to be worked on labels Dec 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Solution is fleshed out and ready to be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants