feat(stdlib): Add `parse_auditd` vrl function by jorgehermo9 · Pull Request #1010 · vectordotdev/vrl

jorgehermo9 · 2024-08-30T18:06:27Z

Closes #66

I have a few things left (that I'm aware of):

Changelog
Documentation PR in Vector repo
Add this function to benchmarks
linux-auditd-parse crate should have a license compatible with Vector (see https://github.com/hillu/linux-audit-parser-rs/issues/1)
Better examples
Contribute to linux-auditd-parse for a few things that are left as TODOs

I added a few TODOs in the code for some things I had doubts. I'll appreciate feedback on those.

Currently, logs are parsed into something similar to

{
    "body": {
        "auid": 1000,
        "format": "enriched",
        "kernel": "6.10.4-arch2-1",
        "op": "start",
        "pid": 1240242,
        "res": "success",
        "ses": 2,
        "uid": 0,
        "ver": "4.0.2"
    },
    "enrichment": {
        "AUID": "vrl",
        "UID": "root"
    },
    "sequence": 6439,
    "timestamp": "2024-08-23T14:27:54.618Z",
    "type": "DAEMON_START"
}

Which is a bit more structured from what has been proposed in #66, but I'm open to any suggestion about this format

jszwedko

Awesome! I'm happy to see this work started as it is an oft-requested feature. I left a few comments on the TODOs. I also marked this as draft for now until it is ready for final review. It seems like the biggest blocker at the moment is the licensing of the linux-audit-parser-rs crate.

jszwedko · 2024-09-05T23:47:27Z

+fn parse_auditd(bytes: Value) -> Resolved {
+    let bytes = bytes.try_bytes()?;
+    // check if bytes ends with newline, otherwise append it
+    // TODO: make the parser accept bytes without newline in the linux_audit_parser crate (to-be-contributed)


Agreed, it seems like the parser should work without trailing newlines.

depends on https://github.com/hillu/linux-audit-parser-rs/pull/3

jszwedko · 2024-09-05T23:50:58Z

+    // Should we use UTC as the timezone? The logs are generated with the system's
+    // timezone... What is the correct behavior? Maybe the system where vector is running
+    // have a different timezone than the system that generated the logs... so it is
+    // not correct to assume that the current system's timezone is the correct one
+    // (with TimeZone::timestamp_millis_opt)
+    // TODO: we should document that the timestamp is parsed into UTC timezone.
+    // TODO: Maybe we should accept in a parameter a custom timezone to parse the timestamp?


I think we should rely on the timezone that is injected into the VRL runtime. See an example in the parse_syslog function:

vrl/src/stdlib/parse_syslog.rs

Lines 6 to 19 in c6f22a1

pub(crate) fn parse_syslog(value: Value, ctx: &Context) -> Resolved {

let message = value.try_bytes_utf8_lossy()?;

let timezone = match ctx.timezone() {

TimeZone::Local => None,

TimeZone::Named(tz) => Some(*tz),

};

let parsed = syslog_loose::parse_message_with_year_exact_tz(

&message,

resolve_year,

timezone,

Variant::Either,

)?;

Ok(message_to_value(parsed))

}

This timezone is configurable in Vector (e.g. on the remap transform: https://vector.dev/docs/reference/configuration/transforms/remap/#timezone).

jszwedko · 2024-09-05T23:51:35Z

+    let (enrichment, body): (ObjectMap, ObjectMap) = parsed
+        .body
+        .into_iter()
+        // TODO: improve this auditd crate with a IntoIter implementation for Body and not only


Agreed 👍

depends on https://github.com/hillu/linux-audit-parser-rs/pull/2

jszwedko · 2024-09-05T23:55:28Z

+            // TODO: should we store hexadecimals as its integer value or as an hexadecimal string?
+            // TODO: Uppercase hexa or lowercase hex format?


Good question. Do you happen to know how users would typically want to consume these values or what fields typically contain them? I think the answer to this question likely depends on what sorts of operations users need to do on these values (if any).

I don't really know...In the raw log lines this is shown as a0=55b26d44a6a0 so maybe it makes sense to display it as "a0": "0x55b26d44a6a0" in the output value in my opinion. But I would love to know what requesters think about this...

jorgehermo9 · 2024-10-18T17:14:16Z

Hi @jszwedko, I'm planning to address the review's comments now that the licensing issue was discussed.

How can we address the feature-flag thing to comply with the license as you depicted in https://github.com/hillu/linux-audit-parser-rs/issues/1#issuecomment-2386577606? I've added it as an optional dependency in this PR and added to the stdlib feature flag

vrl/Cargo.toml

Line 87 in 95d6951

"dep:linux-audit-parser",

but I don't think it is enough for what you said in that comment. What solution do you propose?

jorgehermo9 · 2024-10-21T11:54:25Z

+                        "grantors" => "pam_unix,pam_permit,pam_time",
+                        "acct" => "vrl",
+                        "exe" => "/usr/bin/sudo",
+                        "hostname" => Value::Null,


Fields that are missing in the audit log (for example, hostname="?") are parsed into Null values.

They are shown like this in vector:

{ "body": { "auid": 4294967295, "msg": { "addr": null, "comm": "systemd", "exe": "/usr/lib/systemd/systemd", "hostname": null, "res": "success", "terminal": null, "unit": "wpa_supplicant" }, "pid": 1, "ses": 4294967295, "uid": 0 }, "enrichment": { "AUID": "unset", "UID": "root" }, "sequence": 165, "timestamp": "2024-08-30T18:14:22.240Z", "type": "SERVICE_STOP" }

Should we filter out that fields, or leave the entry with a null value, so people know that it is present but with no value?

jorgehermo9 · 2024-10-21T12:00:16Z

+    log.insert("body".into(), Value::from(body));
+
+    // TODO: should we downcase the keys of the enrichment so that they match its body counterparts?
+    if !enrichment.is_empty() {


In auditd, the association between fields and its enriched value is done with capital letters (see this documentation).

But that limitation is because the log format is plain and it really seems like a workaround.

As we can have a more structured representation and the enrichment fields are separated in another map, maybe we can downcase the keys and have something like this instead:

(deleted some object keys to simplify the example)

{ "body": { "auid": 4294967295, "uid": 0 }, "enrichment": { "auid": "unset", "uid": "root" } }

so people can link the body fields and the enrichment fields easily, as this way, they enriched counterparts will exactly match the raw keys, instead of capitalizing the keys before looking them up in the enrichment.

What do you think? I don't really know how people would really consume this so I would appreciate some feedback from people who would actually use this (#66 requesters)

jorgehermo9 · 2024-10-21T12:02:56Z

hillu/linux-audit-parser-rs@8f39651 Added support for nested parsing the msg filed. Now the events are parsed into something similar to

{
    "body": {
        "auid": 4294967295,
        "msg": {
            "addr": null,
            "comm": "systemd",
            "exe": "/usr/lib/systemd/systemd",
            "hostname": null,
            "res": "success",
            "terminal": null,
            "unit": "systemd-update-utmp"
        },
        "pid": 1,
        "ses": 4294967295,
        "uid": 0
    },
    "enrichment": {
        "AUID": "unset",
        "UID": "root"
    },
    "sequence": 174,
    "timestamp": "2024-08-30T18:14:22.454Z",
    "type": "SERVICE_STOP"
}

note that the msg field is now more similar to what is requested in #66

Tested everything with this config

[sources.file]
type = "file"
include = ["/var/log/audit/audit.log"]
data_dir = "./data"

[transforms.auditd]
type = "remap"
inputs = ["file"]
source = """
. = parse_auditd!(.message)
"""

[sinks.console]
type = "console"
inputs = ["auditd"]
encoding.codec = "json"

And it parsed a bunch of auditd logs with no errors at all! (in order to test it, you would need to enable the auditd systemd service)

jorgehermo9 · 2024-10-21T15:47:42Z

+                    })
+                    .collect::<Result<ObjectMap, _>>()?,
+            ),
+            // There are a few values that `linux-audit-parser` does not return in its parsing


depends on https://github.com/hillu/linux-audit-parser-rs/pull/4

jszwedko · 2024-10-24T00:56:01Z

Hi @jszwedko, I'm planning to address the review's comments now that the licensing issue was discussed.

How can we address the feature-flag thing to comply with the license as you depicted in hillu/linux-audit-parser-rs#1 (comment)? I've added it as an optional dependency in this PR and added to the stdlib feature flag

vrl/Cargo.toml

Line 87 in 95d6951

"dep:linux-audit-parser",

but I don't think it is enough for what you said in that comment. What solution do you propose?

Hey! Thanks for raising this again. After thinking about it some more, I'd prefer to avoid including LGPL dependencies in Vector, even as optional, since they run the risk of someone accidentally including them without realizing it (or us expanding usage without realizing it). I'd (unfortunately 🙁) prefer to see us find another dependency to parse auditd logs. We could even just leverage the existing support for grok parsing by using specific patterns to parse auditd logs. Example: https://gist.github.com/artbikes/2313040

jszwedko · 2024-10-24T00:56:42Z

Apologies, I know this has been a bit of a long road to get to this point 😓 I'm curious what you think.

jorgehermo9 · 2024-10-24T18:57:17Z

Hi @jszwedko !

I'd prefer to avoid including LGPL dependencies in Vector, even as optional, since they run the risk of someone accidentally including them without realizing it

I totally agree with that. It also was one of my main concerns about this...

prefer to see us find another dependency to parse auditd logs

I don't think there is something in rust yet. If I have enough time, I may end up writing my own parser with a MPL-compatible license. Maybe I can inspire from https://github.com/elastic/go-libaudit/blob/main/auparse/auparse.go as It is licensed as Apache2.0 and I think rewriting it (or inspiring from it) in rust could be feasible.

However, this may delay things a little bit, but I'll give it a try when I find enough time.

Apologies, I know this has been a bit of a long road to get to this point 😓 I'm curious what you think.
Don't worry! Almost all VRL-related work is already done and it won't change a lot just by changing the underlying parser library. I think the same as you and I totally understand it.

Thanks! I may give a bump about this in a few weeks/month if I manage to implement the parser, but pausing it the moment.

jszwedko · 2024-10-24T20:26:03Z

@jorgehermo9 thanks for understanding. I do think we might be able to get by with grok parsing instead of a native Rust parser. It'll be less efficient, but might be easier to get going. Worth considering anyway :)

jorgehermo9 · 2025-04-17T12:57:38Z

Hi @jszwedko I've been working in a Apache2.0 OR MIT crate on my own, and will update this PR once I have my crate more polished (parsing auditd is much harder than it may seem at a first glance).

Just to be sure, would this be OK from your side?

jszwedko · 2025-04-17T15:00:48Z

Hi @jszwedko I've working in a Apache2.0 OR MIT crate on my own, and will update this PR once I have my crate more polished (parsing auditd is much harder than it may seem at a first glance).

Just to be sure, would this be OK from your side?

Nice! That would be welcome. 💯 🚀

edomaur · 2026-04-22T09:51:21Z

I'm wondering, what's the status on this ?

jorgehermo9 · 2026-04-22T11:58:10Z

I'm still developing https://github.com/jorgehermo9/auditd-parser. I had to stop contributing to open source for a few months due to personal life... But I'm planning to start working in my parser library soon, and then including it here! parsing auditd is harder that it might seem, and I had to deep dive in some linux kernel internals, that's why it was taking me that much time...

So, I think I'll ship a basic version of the parser first. Because having the parser 100% valid is very hard, and only target regular linux kernel, as for example, some flavours such as rhel have different auditd log syntax in some cases... So -> my main focus is to land an initial 0.1 implementation of the parsing, for basic/most common logs, and then request users to fill issues with logs that the parser fails to parse, and work iteratively to gradually improve log coverage. Expect more news in the next ~2 months

jorgehermo9 added 11 commits August 25, 2024 20:19

feat(stdlib): Initial implementation of parse_auditd function

2adda59

feat(stdlib): Initial implementation of auditd functiopn

091d59b

feat(stdlib): Initial implementation of auditd functiopn

987fcd2

feat(stdlib): Initial implementation of auditd functiopn

42e44f6

feat(stdlib): Initial implementation of auditd functiopn

890a9c3

feat(stdlib): Initial implementation of auditd functiopn

c86d208

docs: fix example

3c515a6

test(stdlib): improve test for parse_auditd

a64fe7e

Merge with main

4c9da9b

test(stdlib): remove comments from tests

f690472

chore: update LICENSE-3rdparty.csv

95d6951

jorgehermo9 marked this pull request as ready for review August 30, 2024 18:10

jszwedko marked this pull request as draft September 5, 2024 23:46

jszwedko reviewed Sep 5, 2024

View reviewed changes

jorgehermo9 mentioned this pull request Oct 18, 2024

Request to add a VRL function to parse logs generate by auditd vectordotdev/vector#21546

Closed

jorgehermo9 added 3 commits October 21, 2024 13:28

feat: update linux-audit-parser version

0bd04e2

style: add whitespace to test name

7f52c6a

Merge with main

5dd8351

jorgehermo9 commented Oct 21, 2024

View reviewed changes

pront added the meta: awaiting author Pull requests that are awaiting their author. label Mar 4, 2025

	pub(crate) fn parse_syslog(value: Value, ctx: &Context) -> Resolved {
	let message = value.try_bytes_utf8_lossy()?;
	let timezone = match ctx.timezone() {
	TimeZone::Local => None,
	TimeZone::Named(tz) => Some(*tz),
	};
	let parsed = syslog_loose::parse_message_with_year_exact_tz(
	&message,
	resolve_year,
	timezone,
	Variant::Either,
	)?;
	Ok(message_to_value(parsed))
	}

		// TODO: should we store hexadecimals as its integer value or as an hexadecimal string?
		// TODO: Uppercase hexa or lowercase hex format?

Conversation

jorgehermo9 commented Aug 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jszwedko left a comment

Choose a reason for hiding this comment

Uh oh!

jszwedko Sep 5, 2024

Choose a reason for hiding this comment

Uh oh!

jorgehermo9 Oct 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jszwedko Sep 5, 2024

Choose a reason for hiding this comment

Uh oh!

jszwedko Sep 5, 2024

Choose a reason for hiding this comment

Uh oh!

jorgehermo9 Oct 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jszwedko Sep 5, 2024

Choose a reason for hiding this comment

Uh oh!

jorgehermo9 Oct 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jorgehermo9 commented Oct 18, 2024

Uh oh!

jorgehermo9 Oct 21, 2024

Choose a reason for hiding this comment

Uh oh!

jorgehermo9 Oct 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jorgehermo9 commented Oct 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jorgehermo9 Oct 21, 2024

Choose a reason for hiding this comment

Uh oh!

jszwedko commented Oct 24, 2024

Uh oh!

jszwedko commented Oct 24, 2024

Uh oh!

jorgehermo9 commented Oct 24, 2024

Uh oh!

jszwedko commented Oct 24, 2024

Uh oh!

jorgehermo9 commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jszwedko commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

edomaur commented Apr 22, 2026

Uh oh!

jorgehermo9 commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jorgehermo9 commented Aug 30, 2024 •

edited

Loading

jorgehermo9 Oct 20, 2024 •

edited

Loading

jorgehermo9 Oct 20, 2024 •

edited

Loading

jorgehermo9 Oct 21, 2024 •

edited

Loading

jorgehermo9 Oct 21, 2024 •

edited

Loading

jorgehermo9 commented Oct 21, 2024 •

edited

Loading

jorgehermo9 commented Apr 17, 2025 •

edited

Loading

jszwedko commented Apr 17, 2025 •

edited

Loading