Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log raw events to a separate log file #4549

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

belimawr
Copy link
Contributor

@belimawr belimawr commented Apr 9, 2024

What does this PR do?

This commit introduces a new logger core, used when collecting logs from sub process, that can be configured through logging.event_data and is used to log any message that contains the whole event or could contain any sensitive data. This is accomplished by adding log.type: event to the log entry. The logger core is responsible for filtering the log entries and directing them to the correct files.

Why is it important?

Some Beats outputs will log raw event data on certain types of errors, events can contain sensitive information that should not be present in the log files. This PR address this problem by logging event data to a separate log file.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

Author's Checklist

The diagnostics command

The diagnostics collect command will by default collect all events log files, there is a flag to opt out of this behaviour that can both: be passed via CLI or Fleet Action.

The logs command

The logs command can also read the events log files, it creates two streams for reading logs, one for the events logs and another for the normal logs, they share the same settings, but the line count is independent, i.e, if you set -n 10, each stream will read 10 lines.

When reading a number of lines, not in follow mode, both streams will get mixed up and the entries might not be completely ordered by time.

When in follow mode and as new lines are added, the streams will be correctly ordered.

How to test this PR locally

  1. Package the Elastic-Agent
  2. Replace the Filebeat binary by the binary built from Log raw events to a separate log file beats#38767
  3. Create /tmp/flog.log with a few lines, the data is not important
  4. Start the Elastic-Agent with the following configuration (adjust if needed)
outputs:
  default:
    type: elasticsearch
    hosts:
      - https://localhost:9200
    username: elastic
    password: changeme
    preset: balanced
    ssl.verification_mode: none

inputs:
  - type: filestream
    id: your-input-id
    streams:
      - id: your-filestream-stream-id
        data_stream:
          dataset: generic
        paths:
          - /tmp/flog.log

# Disable monitoring so there are less Beats running and less logs being generated.
agent.monitoring:
  enabled: false
  logs: false
  metrics: false
  pprof.enabled: false
  use_output: default
  http: # Needed if you already have an Elastic-Agent running on your machine
    enabled: false
    port: 7002 

agent.grpc: # Needed if you already have an Elastic-Agent running on your machine
  address: localhost
  port: 7001

# This just reduces the amount of logs.
agent.logging.metrics.enabled: false

To create ingest failures the easiest way is to close the write index from the datastream, to do that go to Kibana -> Dev Tools

To get the backing index for a datastream:

GET /_data_stream/logs-generic-default

This will return something like:

{
  "data_streams": [
    {
      "name": "logs-generic-default",
      "timestamp_field": {
        "name": "@timestamp"
      },
      "indices": [
        {
          "index_name": ".ds-logs-generic-default-2024.01.22-000001",
          "index_uuid": "0pq-XIYfSjuUQhTxlJKJjQ",
          "prefer_ilm": true,
          "ilm_policy": "logs",
          "managed_by": "Index Lifecycle Management"
        }
      ]
    }
  ]
}

Take note of the index_name .ds-logs-generic-default-2024.01.22-000001.
Close this index:

POST .ds-logs-generic-default-2024.01.22-000001/_close
  1. Add more data to the file /tmp/flog.log
  2. In the folder you're running the Elastic-Agent, look for a log file in data/elastic-agent-<hash>/logs/events the file name is something like elastic-agent-events-data-20240125.ndjson. You should see a log entry like this one:
{
  "log.level": "warn",
  "@timestamp": "2024-01-25T14:48:51.115+0100",
  "message": "Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Date(2024, time.January, 25, 14, 48, 46, 614819591, time.Local), Meta:{\"input_id\":\"your-input-id\",\"raw_index\":\"logs-generic-default\",\"stream_id\":\"your-filestream-stream-id\"}, Fields:{\"agent\":{\"ephemeral_id\":\"a06806a9-f18d-4ffa-bee1-debcc15f7cf5\",\"id\":\"0ff4eb46-71e1-4c49-a921-3b984b303c0f\",\"name\":\"millennium-falcon\",\"type\":\"filebeat\",\"version\":\"8.13.0\"},\"data_stream\":{\"dataset\":\"generic\",\"namespace\":\"default\",\"type\":\"logs\"},\"ecs\":{\"version\":\"8.0.0\"},\"elastic_agent\":{\"id\":\"0ff4eb46-71e1-4c49-a921-3b984b303c0f\",\"snapshot\":false,\"version\":\"8.13.0\"},\"event\":{\"dataset\":\"generic\"},\"host\":{\"architecture\":\"x86_64\",\"containerized\":false,\"hostname\":\"millennium-falcon\",\"id\":\"851f339d77174301b29e417ecb2ec6a8\",\"ip\":[\"42.42.42.42\",,\"ec8a:fc90:d347:6316:116e:8a27:f731:08ff\"],\"mac\":[\"95-A2-37-0D-71-73\",],\"name\":\"millennium-falcon\",\"os\":{\"build\":\"rolling\",\"family\":\"arch\",\"kernel\":\"6.7.0-arch3-1\",\"name\":\"Arch Linux\",\"platform\":\"arch\",\"type\":\"linux\",\"version\":\"\"}},\"input\":{\"type\":\"filestream\"},\"log\":{\"file\":{\"device_id\":\"34\",\"inode\":\"172876\",\"path\":\"/tmp/flog.log\"},\"offset\":1061765},\"message\":\"154.68.172.7 - ritchie3302 [25/Jan/2024:14:10:52 +0100] \\\"HEAD /supply-chains/metrics/platforms HTTP/1.1\\\" 502 13383\"}, Private:(*input_logfile.updateOp)(0xc000fc6d20), TimeSeries:false}, Flags:0x1, Cache:publisher.EventCache{m:mapstr.M(nil)}} (status=400): {\"type\":\"index_closed_exception\",\"reason\":\"closed\",\"index_uuid\":\"0pq-XIYfSjuUQhTxlJKJjQ\",\"index\":\".ds-logs-generic-default-2024.01.22-000001\"}, dropping event!",
  "component": {
    "binary": "filebeat",
    "dataset": "elastic_agent.filebeat",
    "id": "filestream-default",
    "type": "filestream"
  },
  "log": {
    "source": "filestream-default"
  },
  "log.origin": {
    "file.line": 461,
    "file.name": "elasticsearch/client.go",
    "function": "github.com/elastic/beats/v7/libbeat/outputs/elasticsearch.(*Client).bulkCollectPublishFails"
  },
  "log.type": "event",
  "ecs.version": "1.6.0",
  "log.logger": "elasticsearch"
}

Note the "log.type": "event" and that this log entry is not present in other log files or the logs that go to stdout/stderr.

Related issues

## Use cases
## Screenshots
## Logs

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...

Copy link
Contributor

mergify bot commented Apr 9, 2024

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b event-logger-for-process upstream/event-logger-for-process
git merge upstream/main
git push upstream event-logger-for-process

Copy link
Contributor

mergify bot commented Apr 9, 2024

This pull request does not have a backport label. Could you fix it @belimawr? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@mergify mergify bot added the backport-skip label Apr 9, 2024
@belimawr belimawr added skipped-test This change disables one or multiple tests Team:Elastic-Agent Label for the Agent team labels Apr 9, 2024
@belimawr belimawr changed the title Use event logger from logp Log raw events to a separate log file Apr 12, 2024
Copy link
Contributor

mergify bot commented Apr 15, 2024

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b event-logger-for-process upstream/event-logger-for-process
git merge upstream/main
git push upstream event-logger-for-process

Copy link
Contributor

mergify bot commented Apr 23, 2024

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b event-logger-for-process upstream/event-logger-for-process
git merge upstream/main
git push upstream event-logger-for-process

@belimawr belimawr force-pushed the event-logger-for-process branch 2 times, most recently from 4141bab to 5a51e88 Compare April 25, 2024 13:04
@belimawr belimawr marked this pull request as ready for review April 26, 2024 20:13
@belimawr belimawr requested review from a team as code owners April 26, 2024 20:13
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@pierrehilbert
Copy link
Contributor

@bturquet could we please have someone to review here?

@pierrehilbert pierrehilbert requested review from blakerouse and removed request for michel-laterman April 29, 2024 18:00
@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Apr 29, 2024
belimawr and others added 27 commits May 29, 2024 08:47
ZipLogsWithPath needs to exclude the folder 'events', so we cannot
suffix it with '/'. Comments explaining the situation are also added.

zipLogsAndAssertFiles now calls t.Helper().
Stop calling `paths.SetTop()` in the diagnostics test. This
refactoring makes arguments and fields instead of modifying the global
state starting from the test in
`internal/pkg/diagnostics/diagnostics_test.go` up to the cobra
command.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip skipped-test This change disables one or multiple tests Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants