Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tc nat can do the same job #9

Open
chenhaiq opened this issue Sep 25, 2019 · 5 comments
Open

tc nat can do the same job #9

chenhaiq opened this issue Sep 25, 2019 · 5 comments

Comments

@chenhaiq
Copy link

The DADDR iptables plugin, iptables -t mangle -A INPUT -m dscp --dscp 1 -j DADDR --set-daddr=192.168.0.2, can be replaced by tc, so no plugin is need to use l3dsr:

tc qdisc add dev eth0 root handle 1: htb
tc qdisc add dev eth0 ingress
tc filter add dev eth0 parent ffff: protocol ip prio 1 u32 match u32 0x00040000 0x00ff0000 at 0 action nat ingress 192.168.0.3 192.168.0.2

where the u32 0x00040000 0x00ff0000 at 0 match Tos 0x4, which is dscp 1.
192.168.0.3 is read server ip, and 192.168.0.2 is vip.

@qbarnes
Copy link
Collaborator

qbarnes commented Sep 25, 2019

Thank you for the report!

Back when L3DSR was being implemented, we investigated the idea of using tc, but it was rejected. I no longer recall why, and in skimming my notes, I've been unable to find that reasoning. However, that was back nearly 10 some odd years ago in the RHEL 4 days for us. That reasoning should be (re-)discovered and see if it is still applicable (and either way documented).

What test cases and use cases have you tried your approach with so far and with what kernels? Have you tried it in combination with other tc and iptables rules to see how it interacts?

@chenhaiq
Copy link
Author

I tried in ubuntu 1804+ kernel 4.15. There are 3 combinations:

  1. iptables -t mangle -A INPUT -m dscp --dscp 1 -j DADDR --set-daddr=192.168.0.2 works;

  2. tc filter add dev eth0 parent ffff: protocol ip prio 1 u32 match u32 0x00040000 0x00ff0000 at 0 action nat ingress 192.168.0.3 192.168.0.2 works exactly the same with Linux module only works with ip_conntrack loaded #1 ;

  3. iptables -t nat -A PREROUTING -m dscp --dscp 1 -j DNAT --to-destination 192.168.0.2 does not work. I think this is why you wrote an iptables plugin.

Do you know why iptables nat does not work in this case? I can see that the destination address was changed from iptables log, but the application still responds real server ip address.

@qbarnes
Copy link
Collaborator

qbarnes commented Sep 29, 2019

I feel like a software archeologist going back and digging into this old information!

The very first efforts to scope out the functionality required for implementing L3DSR was done by a different group that I wasn't part of. Before contacting me in May 2008, they had already concluded that the NAT approach was not the route to go because of concerns over it being too CPU and/or memory intensive for our Yahoo! production workloads, and it being a possible DoS (denial of service) vector. That's why they brought me in to do the iptables module work, because of my kernel experience.

As for why your item 3 doesn't work, I remember toying with the idea a decade ago, but NAT was a monstrosity with a lot of temperamental quirks trying to push it to do something the designers didn't intend. I didn't have that good a grasp at all it does to the networking stack to make it work, but I have vague recollections of it being "too smart" and helpful holding on to and monitoring too much networking state information for repurposing it. Did you try testing your item 3 with just TCP traffic, or did you try with UDP or ICMP? If NAT works with the latter, then you know it's due to it not seeing the reverse TCP traffic that went straight to the client thinking it got lost.

With doing item 2, have you done any latency or throughput performance testing or load testing comparing 1 with 2 yet?

@chenhaiq
Copy link
Author

I have tested item 3 with ICMP. iptables NAT does not work either.

PING 192.168.0.2 (192.168.0.2) 56(84) bytes of data.
64 bytes from 192.168.0.3: icmp_seq=1 ttl=64 time=0.575 ms

I have not tested performance yet.
I actually learned the idea of l3dsr from the implementation in fd.io VPP. It is a very good idea to use DSCP instead of overlay tunnel.

@svootukuru21
Copy link

I tried in ubuntu 1804+ kernel 4.15. There are 3 combinations:

  1. iptables -t mangle -A INPUT -m dscp --dscp 1 -j DADDR --set-daddr=192.168.0.2 works;
  2. tc filter add dev eth0 parent ffff: protocol ip prio 1 u32 match u32 0x00040000 0x00ff0000 at 0 action nat ingress 192.168.0.3 192.168.0.2 works exactly the same with Linux module only works with ip_conntrack loaded #1 ;
  3. iptables -t nat -A PREROUTING -m dscp --dscp 1 -j DNAT --to-destination 192.168.0.2 does not work. I think this is why you wrote an iptables plugin.

Do you know why iptables nat does not work in this case? I can see that the destination address was changed from iptables log, but the application still responds real server ip address.

Can you please let know how second step works ..Response will be appreciated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants