feat(stdlib): Add parse_auditd vrl function#1010
feat(stdlib): Add parse_auditd vrl function#1010jorgehermo9 wants to merge 14 commits intovectordotdev:mainfrom
parse_auditd vrl function#1010Conversation
jszwedko
left a comment
There was a problem hiding this comment.
Awesome! I'm happy to see this work started as it is an oft-requested feature. I left a few comments on the TODOs. I also marked this as draft for now until it is ready for final review. It seems like the biggest blocker at the moment is the licensing of the linux-audit-parser-rs crate.
| fn parse_auditd(bytes: Value) -> Resolved { | ||
| let bytes = bytes.try_bytes()?; | ||
| // check if bytes ends with newline, otherwise append it | ||
| // TODO: make the parser accept bytes without newline in the linux_audit_parser crate (to-be-contributed) |
There was a problem hiding this comment.
Agreed, it seems like the parser should work without trailing newlines.
There was a problem hiding this comment.
| // Should we use UTC as the timezone? The logs are generated with the system's | ||
| // timezone... What is the correct behavior? Maybe the system where vector is running | ||
| // have a different timezone than the system that generated the logs... so it is | ||
| // not correct to assume that the current system's timezone is the correct one | ||
| // (with TimeZone::timestamp_millis_opt) | ||
| // TODO: we should document that the timestamp is parsed into UTC timezone. | ||
| // TODO: Maybe we should accept in a parameter a custom timezone to parse the timestamp? |
There was a problem hiding this comment.
I think we should rely on the timezone that is injected into the VRL runtime. See an example in the parse_syslog function:
vrl/src/stdlib/parse_syslog.rs
Lines 6 to 19 in c6f22a1
This timezone is configurable in Vector (e.g. on the remap transform: https://vector.dev/docs/reference/configuration/transforms/remap/#timezone).
| let (enrichment, body): (ObjectMap, ObjectMap) = parsed | ||
| .body | ||
| .into_iter() | ||
| // TODO: improve this auditd crate with a IntoIter implementation for Body and not only |
There was a problem hiding this comment.
| // TODO: should we store hexadecimals as its integer value or as an hexadecimal string? | ||
| // TODO: Uppercase hexa or lowercase hex format? |
There was a problem hiding this comment.
Good question. Do you happen to know how users would typically want to consume these values or what fields typically contain them? I think the answer to this question likely depends on what sorts of operations users need to do on these values (if any).
There was a problem hiding this comment.
I don't really know...In the raw log lines this is shown as a0=55b26d44a6a0 so maybe it makes sense to display it as "a0": "0x55b26d44a6a0" in the output value in my opinion. But I would love to know what requesters think about this...
|
Hi @jszwedko, I'm planning to address the review's comments now that the licensing issue was discussed. How can we address the feature-flag thing to comply with the license as you depicted in https://github.com/hillu/linux-audit-parser-rs/issues/1#issuecomment-2386577606? I've added it as an optional dependency in this PR and added to the stdlib feature flag Line 87 in 95d6951 |
| "grantors" => "pam_unix,pam_permit,pam_time", | ||
| "acct" => "vrl", | ||
| "exe" => "/usr/bin/sudo", | ||
| "hostname" => Value::Null, |
There was a problem hiding this comment.
Fields that are missing in the audit log (for example, hostname="?") are parsed into Null values.
They are shown like this in vector:
{
"body": {
"auid": 4294967295,
"msg": {
"addr": null,
"comm": "systemd",
"exe": "/usr/lib/systemd/systemd",
"hostname": null,
"res": "success",
"terminal": null,
"unit": "wpa_supplicant"
},
"pid": 1,
"ses": 4294967295,
"uid": 0
},
"enrichment": {
"AUID": "unset",
"UID": "root"
},
"sequence": 165,
"timestamp": "2024-08-30T18:14:22.240Z",
"type": "SERVICE_STOP"
}Should we filter out that fields, or leave the entry with a null value, so people know that it is present but with no value?
| log.insert("body".into(), Value::from(body)); | ||
|
|
||
| // TODO: should we downcase the keys of the enrichment so that they match its body counterparts? | ||
| if !enrichment.is_empty() { |
There was a problem hiding this comment.
In auditd, the association between fields and its enriched value is done with capital letters (see this documentation).
But that limitation is because the log format is plain and it really seems like a workaround.
As we can have a more structured representation and the enrichment fields are separated in another map, maybe we can downcase the keys and have something like this instead:
(deleted some object keys to simplify the example)
{
"body": {
"auid": 4294967295,
"uid": 0
},
"enrichment": {
"auid": "unset",
"uid": "root"
}
}so people can link the body fields and the enrichment fields easily, as this way, they enriched counterparts will exactly match the raw keys, instead of capitalizing the keys before looking them up in the enrichment.
What do you think? I don't really know how people would really consume this so I would appreciate some feedback from people who would actually use this (#66 requesters)
|
hillu/linux-audit-parser-rs@8f39651 Added support for nested parsing the {
"body": {
"auid": 4294967295,
"msg": {
"addr": null,
"comm": "systemd",
"exe": "/usr/lib/systemd/systemd",
"hostname": null,
"res": "success",
"terminal": null,
"unit": "systemd-update-utmp"
},
"pid": 1,
"ses": 4294967295,
"uid": 0
},
"enrichment": {
"AUID": "unset",
"UID": "root"
},
"sequence": 174,
"timestamp": "2024-08-30T18:14:22.454Z",
"type": "SERVICE_STOP"
}note that the msg field is now more similar to what is requested in #66 Tested everything with this config [sources.file]
type = "file"
include = ["/var/log/audit/audit.log"]
data_dir = "./data"
[transforms.auditd]
type = "remap"
inputs = ["file"]
source = """
. = parse_auditd!(.message)
"""
[sinks.console]
type = "console"
inputs = ["auditd"]
encoding.codec = "json"And it parsed a bunch of auditd logs with no errors at all! (in order to test it, you would need to enable the |
| }) | ||
| .collect::<Result<ObjectMap, _>>()?, | ||
| ), | ||
| // There are a few values that `linux-audit-parser` does not return in its parsing |
There was a problem hiding this comment.
Hey! Thanks for raising this again. After thinking about it some more, I'd prefer to avoid including LGPL dependencies in Vector, even as optional, since they run the risk of someone accidentally including them without realizing it (or us expanding usage without realizing it). I'd (unfortunately 🙁) prefer to see us find another dependency to parse auditd logs. We could even just leverage the existing support for grok parsing by using specific patterns to parse auditd logs. Example: https://gist.github.com/artbikes/2313040 |
|
Apologies, I know this has been a bit of a long road to get to this point 😓 I'm curious what you think. |
|
Hi @jszwedko !
I totally agree with that. It also was one of my main concerns about this...
I don't think there is something in rust yet. If I have enough time, I may end up writing my own parser with a MPL-compatible license. Maybe I can inspire from https://github.com/elastic/go-libaudit/blob/main/auparse/auparse.go as It is licensed as Apache2.0 and I think rewriting it (or inspiring from it) in rust could be feasible. However, this may delay things a little bit, but I'll give it a try when I find enough time.
Thanks! I may give a bump about this in a few weeks/month if I manage to implement the parser, but pausing it the moment. |
|
@jorgehermo9 thanks for understanding. I do think we might be able to get by with grok parsing instead of a native Rust parser. It'll be less efficient, but might be easier to get going. Worth considering anyway :) |
|
Hi @jszwedko I've been working in a Just to be sure, would this be OK from your side? |
Nice! That would be welcome. 💯 🚀 |
|
I'm wondering, what's the status on this ? |
|
I'm still developing https://github.com/jorgehermo9/auditd-parser. I had to stop contributing to open source for a few months due to personal life... But I'm planning to start working in my parser library soon, and then including it here! parsing auditd is harder that it might seem, and I had to deep dive in some linux kernel internals, that's why it was taking me that much time... So, I think I'll ship a basic version of the parser first. Because having the parser 100% valid is very hard, and only target regular linux kernel, as for example, some flavours such as rhel have different auditd log syntax in some cases... So -> my main focus is to land an initial 0.1 implementation of the parsing, for basic/most common logs, and then request users to fill issues with logs that the parser fails to parse, and work iteratively to gradually improve log coverage. Expect more news in the next ~2 months |
Closes #66
I have a few things left (that I'm aware of):
linux-auditd-parsecrate should have a license compatible with Vector (see https://github.com/hillu/linux-audit-parser-rs/issues/1)linux-auditd-parsefor a few things that are left as TODOsI added a few TODOs in the code for some things I had doubts. I'll appreciate feedback on those.
Currently, logs are parsed into something similar to
{ "body": { "auid": 1000, "format": "enriched", "kernel": "6.10.4-arch2-1", "op": "start", "pid": 1240242, "res": "success", "ses": 2, "uid": 0, "ver": "4.0.2" }, "enrichment": { "AUID": "vrl", "UID": "root" }, "sequence": 6439, "timestamp": "2024-08-23T14:27:54.618Z", "type": "DAEMON_START" }Which is a bit more structured from what has been proposed in #66, but I'm open to any suggestion about this format