Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-Node Firewall - LibreQoS and ECMP #458

Open
reinaldosaraiva opened this issue Feb 5, 2024 · 9 comments
Open

Multi-Node Firewall - LibreQoS and ECMP #458

reinaldosaraiva opened this issue Feb 5, 2024 · 9 comments

Comments

@reinaldosaraiva
Copy link

Greetings LibreQoS team!

In our lab we have a couple servers with 4x 25Gbps ports each (dual port nics) with ECMP configured in order to load balance traffic between the servers. I wonder if it's possible to use two interfaces for egress and the other two interfaces for ingress. I'm also scratching my head on how to deal with policy setup in this scenario where each server is independent and actively routing traffic.

Please advise.

@thebracket
Copy link
Collaborator

LibreQos (and most other shaping solutions) can't share state between two servers - so to use two servers, you need to be using some routing rules (note: LibreQoS is not a router, it's just a transparent bridge) to ensure that any given client will always prefer the same route (and fail to the other one if it is down). The reason for this is that traffic is so dynamic - by the time one server would finish telling the other "Joe is using 100 Mbps right now", that may not be true anymore. It literally changes microsecond to microsecond, and there's just no practical way to sync data that fast.

Out of the box setup doesn't do bonded interfaces. (You can't bind XDP to a bonded device, it only binds to a physical device - that's a kernel limitation).

What you can do is to put veth in the middle, and have LibreQoS shape/bridge across the veth pair. We don't have a recipe for this (yet), but I know at least one user has pulled it off. Basically:

bond0 <-> bridge <-> LQOS on a veth pair <-> bridge <-> bond1

I'm not sure how that'll perform, it seems like it'd need some pretty hefty CPU.

@dtaht
Copy link
Collaborator

dtaht commented Feb 5, 2024

I am not sure if we are talking past each other or not. Here are some scenarios where LibreQos is useful in a DC environment. Say you want really low TCP latency with conventional congestion controls. LibreQos adds about 100-200us on the path, but you can enable ECN and set libreqos to a target of about cake rtt 5ms to get to a typical instream latency of under 500us (this is at least 10x better than what you will get through a typical switch) and still retain full throughput without packet loss or retries. I have tried values as low as 2ms actually - and it is REALLY hard to measure down below a ms. but I was pretty sure I was getting close to full throughput, no packetet loss at about 200us delay. This is for servers talking to each other within a datacenter.

I know of someone fooling with AI workloads in this way. Google has long configured ecn support into their fq_codel instance for both RFC3168 and L4S(DCTCP) style ECN.

Another place where ECMP (how do you calculate the cost?) is indeed valuable through libreqos boxes is to load balance flows when you are pushing that amount of data through it, and also be able to analyze the results. But unless you are trying to step down individual flows to another rate along the way, (e.g. ecn), just spattering packets through the switch saves that 100us.

It can also be configured as a hot spare so if you lose one it falls over to the other, but there would be a blip of oh, 3 seconds, before it starts to recover, as, per herbert above, copying queue state over live is measured in msec, as is (best case) a BFD failover. We mostly just recommend routing around one box, but if you have two, and want a hot spare, goferit.

@dtaht
Copy link
Collaborator

dtaht commented Feb 5, 2024

@thebracket - we could get closer to a fast failover than we do, just running off a mirrored port and sinkholing the output, and switching over immediately if the other port goes down... but would still lose a significant number of packets during the switchover.

/me puts his feet up on his old tandem box...

@thebracket
Copy link
Collaborator

@thebracket - we could get closer to a fast failover than we do, just running off a mirrored port and sinkholing the output, and switching over immediately if the other port goes down... but would still lose a significant number of packets during the switchover.

/me puts his feet up on his old tandem box...

That would require that we had some knowledge of the status of the other box?

@dtaht
Copy link
Collaborator

dtaht commented Feb 5, 2024

yes. gotta lie about your mac address too. https://en.wikipedia.org/wiki/Virtual_Router_Redundancy_Protocol

@reinaldosaraiva
Copy link
Author

reinaldosaraiva commented Feb 5, 2024

Hi,
We are evaluating the integration of LibreQoS into our network infrastructure and would like to request additional information regarding the compatibility and implementation of your tool within our specific environment. Our network architecture is Layer 3-based with multiple edge firewalls operating in conjunction with spine and edge switches, utilizing ECMP for load balancing and redundancy.

Our current firewalls are equipped with BGP and MPT tables, and are performing QoS functions. Our intention is to integrate LibreQoS directly into these devices to manage quality of service efficiently, without the need for additional hardware.

Could you inform us on the following questions:

Can LibreQoS be integrated into an environment with multiple firewalls operating ECMP and BGP without compromising existing functionality or network performance?

Are there specific hardware requirements or limitations that we should consider before implementing LibreQoS on our firewalls?

Is there any specific guidance for configuring LibreQoS in an environment with multiple firewall connections to manage QoS effectively?

What would be the recommended procedure for testing LibreQoS in our environment prior to a full-scale implementation?
Hardware: 04 Nodes - SR650 V3 (ThinkSystem) - Type 7D76 - 64 Cores - RAM 128 GB
image (4)

image (5)
firewa-l3

@dtaht
Copy link
Collaborator

dtaht commented Feb 5, 2024

Thank you for sharing your topology. LibreQos is presently optimized for a standalone eBPF bridge running on bare metal, where we easily achieve 25Gbits for 10,000 subscribers in an ISP environment at 50% of cpu across 16 Xeon cores. Many of our users run under VMs also. While we have produced versions internally of non-ebpf code, the performance hit prior to linux 6.1 on most platforms was pretty extreme, as was leveraging veth. We get our performance by routing a subscribe to a core, which further emulates the underlying topology past the QoS point. Both of these techniques are a bit incompatible with conventional firewalling in linux today. How are you doing firewalling? We do not route, either. It's just a pass-through bridge, so in our preferred configuration there would be a separate box entirely behind each of your firewalls, and you would lose the ability to precisely control the per subscriber bandwidth.

To try and answer your questions more fully perhaps a videoconference would help. We also take consulting dollars. There is plenty of demand for fully integrated solutions like this, but the eBPF dependency gets in the way.

To attempt to answer your questions:

Could you inform us on the following questions:

Can LibreQoS be integrated into an environment with multiple firewalls operating ECMP and BGP without compromising >existing functionality or network performance?

We can push 25Gbit easily on a separate $1500 box. Depending on you shaping needs, you might merely be able to apply cake standalone on your queues on your boxes without the need for anything else - not even shaped, just responding to BQL backpressure. Have you tried that? Cake can push about 10Gbit/core/tx ring in that case.

Are there specific hardware requirements or limitations that we should consider before implementing LibreQoS on our >firewalls?

Plenty!

Is there any specific guidance for configuring LibreQoS in an environment with multiple firewall connections to manage >QoS effectively?

Do you need per subscriber shaping? If not, just cake by itself is enough.

What would be the recommended procedure for testing LibreQoS in our environment prior to a full-scale >implementation?
Hardware: 04 Nodes - SR650 V3 (ThinkSystem) - Type 7D76 - 64 Cores - RAM 128 GB

It only takes an hour to get an instance up and running. Stick one in behind all that stuff. The 64 core epyc boxes we have been playing with can crack 60Gbits of shaping.

@thebracket
Copy link
Collaborator

Some questions:

  • Can you describe your basic setup? i.e. an ISP, Wireless ISP (WISP), data-center? Approx # of users and the largest plan they utilize?
  • What are you running on your firewalls? Hopefully Linux (with a recent kernel); we only directly support Ubuntu Server, but some very skilled Linux users have got it going on Debian also.
  • Are you running on bare-metal or VMs? Any container solutions to worry about?
  • What traffic levels are you looking at (both total and per box)?
  • How would you rate your Linux skills? As I mentioned before, LibreQoS is a pure bridge solution - so you're going to have quite a bit of work to integrate it into an existing setup.

If I'm reading the diagram correctly, your current setup passing through a box is roughly:

  1. Traffic arrives at bond0.254 (presumably steered by BGP)
  2. You apply NFTables firewall rules (possibly including NAT?)
  3. You route data out through n1-n4, with ECMP for load balancing

Now, LibreQoS is a pure, transparent bridge and doesn't work if you have IP addresses set on either of its interfaces. It also cannot bind to bond devices directly - the kernel doesn't support that.

So to make this work, you have to not have your IP address, BGP session and NFTables rules on an interface that runs LibreQoS - and you need to not have LibreQoS on the bond.

So if you're not using VMs, you wind up needing:

bond0.254 -> br_internal -> veth_internal (LibreQoS) -> veth_external -> br_external

You can create your veth pair and have LibreQoS mount that. Then bridge the "client facing/southbound" network between the bond and veth_internal - and put your IPs, BGP, etc. on br_external so there's an endpoint to do firewall and routing.

Then you'll want some BGP weights to encourage the same client to go through the same firewall each time (which you need anyway or your firewall state tables will never be right).

@reinaldosaraiva
Copy link
Author

Some questions:

  • Can you describe your basic setup? i.e. an ISP, Wireless ISP (WISP), data-center? Approx # of users and the largest plan they utilize?

Fairly simple data-center with some heavy weight customers hungry for bandwidth in a all-you-can eat buffet.

  • What are you running on your firewalls? Hopefully Linux (with a recent kernel); we only directly support Ubuntu Server, but some very skilled Linux users have got it going on Debian also.

Ubuntu Jammy on Lenovo SR650 V3 chassis w/ 2x Intel(R) Xeon(R) Gold 6448Y, 1TB RAM, 2x 25Gbps dual ConnectX-6 LX, couple ssds for boot and a few NVMes for local storage.

  • Are you running on bare-metal or VMs? Any container solutions to worry about?

Currently on bare-metal, still evaluating VMs with SR-IOV. No containers.

  • What traffic levels are you looking at (both total and per box)?

Currently 40Gbps in a couple small datacenters but plans to run multiple 100Gbps links in the horizon.

  • How would you rate your Linux skills? As I mentioned before, LibreQoS is a pure bridge solution - so you're going to have quite a bit of work to integrate it into an existing setup.

///

Me and my team have some good years of experience with Linux, perhaps it's fairly safe to say you can talk to us freely. We'll try to catch up.

  • If I'm reading the diagram correctly, your current setup passing through a box is roughly:
  • Traffic arrives at bond0.254 (presumably steered by BGP)

  • You apply NFTables firewall rules (possibly including NAT?)

  • You route data out through n1-n4, with ECMP for load balancing

In our ECMP lab setup we have the following:

172.31.255.0/24 nhid 59 proto bgp metric 20
nexthop via 192.168.100.9 dev ens5f1np1 weight 1
nexthop via 192.168.100.13 dev ens5f0np0 weight 1
172.31.251.0/24 nhid 59 proto bgp metric 20
nexthop via 192.168.100.9 dev ens5f1np1 weight 1
nexthop via 192.168.100.13 dev ens5f0np0 weight 1

172.31.252.0/23 nhid 56 proto bgp metric 20
nexthop via 192.168.100.1 dev ens3f1np1.850 weight 1
nexthop via 192.168.100.5 dev ens3f0np0.851 weight 1

172.31.254.0/24 nhid 62 proto bgp metric 20
nexthop via 192.168.100.33 dev ens3f1np1.854 weight 1
nexthop via 192.168.100.37 dev ens3f0np0.855 weight 1

Traffic arrives from customer (1) ens3f1np1.850 or ens3f0np0.851 } or from customer (2) ens3f1np1.854 or ens3f0np0.855 then goes to outside (3) via ens5f1np1 or ens5f0np0. Since we have pair of devices configured exactly same way (same ASN, different neighbors) with ECMP we have traffic arriving from one box and returning from another since traffic is assymetric.

We're using NTFTables with very simple stateless configuration. We're not using conntrack or doing NAT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants