Capacity planning is missing from documentation #103

mkgvt · 2021-11-16T21:54:39Z

The topic of capacity planning is missing from the documentation. Or at least I cannot find it. Right after "getting started", the next thing someone needs to do is to decide what resources to purchase in order to process some rate of network bandwidth: "how many workers (cores) and how much RAM is needed to process 10 Gbps without dropped packets?". Also, what are the gotchas and best practices that should be followed (with some discussion of why so intuition can be quickly built up). I expect the information is out in the community but it isn't readily accessible. The documentation needs to be improved by those who have experience so that those of us without experience can avoid costly mistakes and successfully deploy.

0xxon · 2021-11-17T10:34:34Z

I certainly agree that this is a missing topic in the documentation. However - there is a reason why this is topic is missing in the manual. And that reason is that it is very hard (or even impossible) to make generalized recommendations. There are a lot of moving parts that can heavily influence the hardware requirements.

For one, this depends on your network traffic - and the protocol mix that you see on it. Some network traffic is easy to analyze, other network traffic requires complicated parsing for each packet (e.g. DNS). So - if you have, for example, a DNS-heavy network - you might suddenly need significantly more hardware. If you have something like HTTP - average request size makes a difference. And, obviously, additional scripts that you might want to add have a performance impact. The amount of RAM that you require similarly has a couple of dependencies,

Then - different hardware has different performance characteristics - so you can't even just go by CPU clock speed (e.g.) - older CPUs with the same clock speed can be significantly slower, enabled/disabled attack mitigations can impact this, etc. Similarly, the way in which you acquire your packets has some impact on speed - and not all ways of acquiring packets are suitable for all environments. The OS you are deploying also can impact the speed.

On top of everything - with new hardware coming out, drivers being updated, etc. - all of this changes all the time - so it has to be updated regularly. Since we (as the developers of Zeek) mostly also don't run huge deployments of Zeek we also are often unaware of hardware requirements on modern hardware with "average" traffic - so it's hard for us to do that.

In the past, some people have posted performance numbers on our mailing list - however, I am not aware of that happening recently.

Actually @rsmmr - since we now have the testing subgroup - do you think it would be possible for them to anonymously post their hardware setups somewhere, to give people an indication of what is used in the real world? That we could point to from the documentation.

mkgvt · 2021-11-17T13:24:33Z

On Wed, Nov 17, 2021 at 5:34 AM Johanna ***@***.***> wrote: Actually @rsmmr <https://github.com/rsmmr> - since we now have the testing subgroup - do you think it would be possible for them to anonymously post their hardware setups somewhere, to give people an indication of what is used in the real world? That we could point to from the documentation.

Thanks for the thoughtful reply Johanna. A collection of rules of thumb and example cases would be really nice. Or deployment white papers describing various installations, the decision process used, and the outcome. A body of practical, rubber meeting the road use cases like that would help illuminate the decision process. I struggled to specify hardware four years ago when we set up equipment at my institution and I am struggling again now that I've been tasked with specifying equipment for deployment at sister institutions. Gross over-building is one obvious solution but as always the budget is not infinite. Under-building is also bad. Seeing what worked (or not) for other deployments would be very useful. I'd be glad to document what we did and why if there was a place to put it. Hopefully others will do the same so we can over time accrue the information that is currently lacking. Mark

…

timwoj · 2022-02-22T19:59:29Z

I transferred this issue over to the zeek-docs repo, since it makes more sense over here.

0xxon added the Area: Documentation label Nov 29, 2021

timwoj transferred this issue from zeek/zeek Feb 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Capacity planning is missing from documentation #103

Capacity planning is missing from documentation #103

mkgvt commented Nov 16, 2021

0xxon commented Nov 17, 2021

mkgvt commented Nov 17, 2021 via email

timwoj commented Feb 22, 2022

Capacity planning is missing from documentation #103

Capacity planning is missing from documentation #103

Comments

mkgvt commented Nov 16, 2021

0xxon commented Nov 17, 2021

mkgvt commented Nov 17, 2021 via email

timwoj commented Feb 22, 2022