Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates needed for software framework. #3717

Open
JustinAzoff opened this issue May 2, 2024 · 0 comments
Open

Updates needed for software framework. #3717

JustinAzoff opened this issue May 2, 2024 · 0 comments
Assignees

Comments

@JustinAzoff
Copy link
Contributor

This is mostly notes for me to remember what to fix, but I see a few issues lately with the software framework, particularly related to http.

Azure versions

We have ignored_user_agents for Browsers, but not for servers. There is a Microsoft cloud proxy thing that sets the version to the region/instance id, like so:

ECAcc (mil/6C98)
ECAcc (mil/6C22)
ECAcc (mil/6C40)
ECAcc (mil/6C60)
ECAcc (mil/6C28)
ECAcc (mil/6C45)
ECAcc (mil/6C22)
ECAcc (mil/6C45)

See https://learn.microsoft.com/en-us/azure/cdn/cdn-verizon-http-headers

Example Via request header

Via: HTTP/1.1 ECD (dca/1A2B)

This causes almost every single one of these requests to trigger a new HTTP::SERVER.

Proxy load

In a change I made a while ago, I moved the version parsing to the proxies, which did reduce the worker load quite a bit, but the software framework found function still sends every found software up to the proxies. Something like this in found could help:

        if (info?$unparsed_version) {
            if ([info$host, info$unparsed_version] in found_cache)
                return T;
            add found_cache[info$host, info$unparsed_version];
        }

where found_cache is a set[addr, string] with create_expire set to something reasonable. It would be great if that could sync up with the

global tracked: table[addr] of SoftwareSet &create_expire=1day;

Multiple browsers

The software framework assumes that for each software type, a host has one and only version of that software type. This makes sense for things like ssh server, but now with things like electron apps and chrome/edge/safari it's not uncommon for a single host to be making multiple concurrent http requests with alternating user-agents. Or a host could be running two different http servers for two different API services. Every time the host flip-flops it triggers new software log entries that don't actually contain new information.

Looking on one network, 70% of the last 1,000,000 software log entries are duplicates.

@JustinAzoff JustinAzoff self-assigned this May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant