Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document already exist (http 409) cases document improvements. #16117

Open
mashhurs opened this issue May 1, 2024 · 1 comment
Open

Document already exist (http 409) cases document improvements. #16117

mashhurs opened this issue May 1, 2024 · 1 comment

Comments

@mashhurs
Copy link
Contributor

mashhurs commented May 1, 2024

Tell us about the issue

Description:
There are various situation where ES may reject the event with document already exist. Purpose of this issue to collect such cases and add a short documentation (under the whichever suitable place, in troubleshooting or support doc or es-output) as we are getting same question over and over.

  • Ingestion from agent
    Possibly two cases I can think of now:
    • events are datastream with integration and integration has a fingerprint processor which sets the document _id. For example, tenable_sc integration may have logs-tenable_sc.vulnerability-{version} & logs-tenable_sc.plugin-{version} ingest pipelines which have fingerprint sets the _id:
    {
     "fingerprint": {
       "fields": [
         "json.lastSeen",
         "json.pluginID",
         "json.ip",
         "json.uuid",
         "json.firstSeen",
         "json.lastSeen",
         "json.exploitAvailable",
         "json.vulnPubDate",
         "json.patchPubDate",
         "json.pluginPubDate",
         "json.pluginModDate",
         "json.pluginText",
         "json.dnsName",
         "json.macAddress",
         "json.operatingSystem",
         "json.pluginInfo"
       ],
       "target_field": "_id",
       "ignore_missing": true
     }
    },
    

Example log when Logstash receives a rejected event:

 [2024-04-24T14:13:25,988][WARN ][logstash.outputs.elasticsearch][Elastic-Agent-to-Logstash].[6ef18a6008d3cea8f01e0cd409c22213845cff0829c5f76b02d18a30a22c024d] Failed action {:status=>409, :action=>["create", {:_id=>nil, :_index=>"metrics-windows.service-default", :routing=>nil}, {"host"=>{"mac"=>["...", "..."], "name"=>"redacted-name", "ip"=>["1", "2", "3"], "architecture"=>"x86_64", "id"=>"aae15", "os"=>{"name"=>"Windows Server 2016 Datacenter", "platform"=>"windows", "type"=>"windows", "kernel"=>"10.0.14393.6897 (rs1_release.240404-1613)", "family"=>"windows", "build"=>"14393.6897", "version"=>"10.0"}, "hostname"=>"redacted-host"}, "service"=>{"type"=>"windows"}, "elastic_agent"=>{"version"=>"8.11.3", "id"=>"48423d1c-5a87-46ef-b6a8-baf90b515e63", "snapshot"=>false}, "metricset"=>{"name"=>"service", "period"=>60000}, "event"=>{"duration"=>211345500, "module"=>"windows", "dataset"=>"windows.service"}, "cloud"=>{"service"=>{"name"=>"redacted-name"}, "instance"=>{"id"=>"redacted-id", "name"=>"redacted-name"}, "provider"=>"openstack", "machine"=>{"type"=>"t"}, "availability_zone"=>"zone"}, "@timestamp"=>2024-04-24T14:12:24.991Z, ...., "type"=>"metricbeat", "id"=>"48423d1c-5a87-46ef-b6a8-baf90b515e63", "ephemeral_id"=>"fdd26638-3865-4e63-961a-69d8f419538a", "version"=>"8.11.3"}, "@version"=>"1"}], :response=>{"create"=>{"status"=>409, "error"=>{"type"=>"version_conflict_engine_exception", "reason"=>"[abcd][{agent.id=my-agent-id, cloud.availability_zone=zone, cloud.instance.id=i-id, windows.service.pid=996, windows.service.state=Running}@2024-04-24T14:12:24.991Z]: version conflict, document already exists (current version [1])", "index_uuid"=>"wBkYL16ZSo2fJ7tAC3zbbw", "shard"=>"0", "index"=>".ds-metrics-windows.service-default-2024.04.22-000065"}}}}
  • Logstash is having a backpressure where it cannot acknowledge the events to agent, as a result agent timeouts and resends the event. In a reality events might be indexed already in the ES. Quick resolution would be extending agent timeout but may depend on the situation.

  • etc.

URL:

Example: https://www.elastic.co/guide/en/logstash/current/introduction.html

Anything else?

@robbavey
Copy link
Member

robbavey commented May 3, 2024

There is another potential cause for these 409 conflicts.

When integrations write to a TSDS enabled index, the document id is defined as "a hash of the document’s dimensions and @timestamp".

The document's dimensions are defined in the integration, and when events are sent at a frequency > 1 per millisecond, and the dimensions are insufficient to disambiguate those events, a version conflict will arise.
This has already been seen in the integrations for the elastic agent and mysql, and I suspect there are more that can cause the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants