Does this still work? #35

nelg · 2021-02-16T22:53:04Z

Hi,

I've had issues with this not working, although it used to work.

It seems that when it deletes the default route:

# switch the default route to eth1
ip route del default dev eth0

The nat instance then looses all internet connectivity.

Does this still work for you?

The text was updated successfully, but these errors were encountered:

nelg · 2021-02-16T23:48:16Z

To get my one working, I ended up making the changes as per nelg@e4a0b33

DesAWSume · 2021-05-01T22:57:49Z

Hi nelg

If I just wanna setup a Linux 2 NAT instance, don't wanna IaC to provision all other infra. which commands I should run to be able to have a Amazon Linux 2 NAT working?

Thanks in advance.

szromek · 2021-05-05T19:47:27Z

@nelg I am experiencing the same issue as you did and your fix seems to solve the problem. Could you provide @int128 with a PR that could be tested, merged and published to Terraform registry, so the whole module would be operational again?

nelg · 2021-05-29T06:46:05Z

@nelg I am experiencing the same issue as you did and your fix seems to solve the problem. Could you provide @int128 with a PR that could be tested, merged and published to Terraform registry, so the whole module would be operational again?

Sure, will do

nelg · 2021-05-29T06:53:09Z

Here is the PR #37

arjitj2 · 2022-01-22T16:15:17Z

This issue and your fix solved 5+ hours of debugging work for me. Thank you and I hope it gets merged soon.

int128 · 2022-01-23T04:41:14Z

It seems NAT connection is lost after the NAT instance is rebooted.

ip route del default dev eth0 command is needed to change the default route to eth1 to fix the source IP, because an EIP of eth0 will be changed when the instance is recreated by Auto Scaling Group.

I noticed the route table is broken after reboot as follows:

## When an instance is created

ssm-user@ip-172-18-138-43 bin]$ ip ro
default via 172.18.128.1 dev eth1 metric 10001
169.254.169.254 dev eth0
172.18.128.0/20 dev eth0 proto kernel scope link src 172.18.138.43
172.18.128.0/20 dev eth1 proto kernel scope link src 172.18.132.145

ssm-user@ip-172-18-138-43 bin]$ sudo reboot

## After reboot

ssm-user@ip-172-18-138-43 bin]$ ip ro
default via 172.18.128.1 dev eth0
default via 172.18.128.1 dev eth1 metric 10001
169.254.169.254 dev eth0
172.18.128.0/20 dev eth0 proto kernel scope link src 172.18.138.43
172.18.128.0/20 dev eth1 proto kernel scope link src 172.18.132.145

Finally I could fixed this problem by removing the config of eth0:

sudo rm /etc/sysconfig/network-scripts/ifcfg-eth0

I will add it to the script.

int128 · 2022-01-29T00:00:36Z

I think #42 resolved the issue. Please let me know if the issue still occurs.

nelg · 2022-04-06T23:59:29Z

I have tested version 2.0.1 release on terraform registry, and it doesn't work.. still have eth0 as the default route, so the instance can't send traffic to the internet.

which version should I test?

nelg · 2022-04-12T23:28:39Z

I'm quite keen to get a version of this published on the registry that works. Rather than me publishing a copy of your one, can we work together to get it working, if you have time sometime in the next couple of weeks.

My solution is working for us, but it's not perfect and ends up with 2 default routes, and two interfaces in the same subnet.
The two ENI's attached, 1 has a public IP and a private IP, the other just has a private IP. We have to route out the one that has a public IP to get to the internet.

JulianCBC · 2022-07-22T14:53:29Z

Yeah, this latest fix is bogus.

I built this module from the example in README.md and this is my NAT instance's networking details after a reboot:

sh-4.2$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 06:e8:a4:c9:de:f6 brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
    link/ether 06:33:c7:03:41:92 brd ff:ff:ff:ff:ff:ff
    inet 10.0.128.88/24 brd 10.0.128.255 scope global dynamic eth1
       valid_lft 3401sec preferred_lft 3401sec
    inet6 fe80::433:c7ff:fe03:4192/64 scope link
       valid_lft forever preferred_lft forever
sh-4.2$ ip route
default via 10.0.128.1 dev eth1 metric 10001
10.0.128.0/24 dev eth1 proto kernel scope link src 10.0.128.88
sh-4.2$ sudo iptables -t nat -L
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination

Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination
MASQUERADE  all  --  anywhere             anywhere

Long story short, it appears this module is broken, I tried downgrading to 2.0.0, but after that I couldn't even connect to the EC2 instance via SSM to debug this.

nelg · 2022-07-22T21:18:22Z

When you tried using it, did you have an eip assigned to the nat instance? It has to be created externally to the module, then passed in. I had problems when I didn't assign an EIP.

…

On Sat, 23 Jul 2022, 2:53 AM Julian Calaby, ***@***.***> wrote: Yeah, this latest fix is bogus. I built this module from the example in README.md and this is my NAT instance's networking details *after a reboot*: sh-4.2$ ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 06:e8:a4:c9:de:f6 brd ff:ff:ff:ff:ff:ff 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000 link/ether 06:33:c7:03:41:92 brd ff:ff:ff:ff:ff:ff inet 10.0.128.88/24 brd 10.0.128.255 scope global dynamic eth1 valid_lft 3401sec preferred_lft 3401sec inet6 fe80::433:c7ff:fe03:4192/64 scope link valid_lft forever preferred_lft forever sh-4.2$ ip route default via 10.0.128.1 dev eth1 metric 1000110.0.128.0/24 dev eth1 proto kernel scope link src 10.0.128.88 sh-4.2$ sudo iptables -t nat -L Chain PREROUTING (policy ACCEPT) target prot opt source destination Chain INPUT (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination Chain POSTROUTING (policy ACCEPT) target prot opt source destination MASQUERADE all -- anywhere anywhere Long story short, it appears this module is broken, I tried downgrading to 2.0.0, but after that I couldn't even connect to the EC2 instance via SSM to debug this. — Reply to this email directly, view it on GitHub <#35 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAY4PN3FAU7YM5G4MY4ERLVVKYXHANCNFSM4XXI7VQA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

JulianCBC · 2022-07-23T13:53:26Z

Yep, had to reorganise stuff so I could, but I did have an EIP on the NAT instance when I did my first round of testing with version 2.0.1.

My initial testing of this module failed to produce a working internet connection on the NAT instance or an instance on a private subnet, so it looks like something's misconfigured or missing. For the record, it's possible that the "something missing" is entirely my fault.

My understanding is that NAT gateways work like this: private host -> network interface -> NAT -> route table -> internet. So therefore how we get to the internet shouldn't matter, which makes the act of deleting the eth0 configuration script and therefore leaving that interface unconfigured after a reboot seem bogus as it shouldn't matter. That said, all my previous hacking has used separate interfaces for the input and output sides of the NAT gateway, so it's quite possible it'll all work on one interface and leaving eth0 unconfigured is correct.

I suspect that I've made a mistake somewhere here, but I also know that the NAT gateway should have had internet access in my testing, and the fact that it doesn't is concerning. I'm going to try a couple of other options then maybe return to this depending on the outcome. fck-nat seems promising if I can figure out a simple way to Terraformise it's setup.

(Another thing that stood out is that the ENI handling needs to be smarter: we should be able to detect whether it's already connected or somehow still in-use (e.g. after an instance is terminated) and respond appropriately.)

JulianCBC · 2022-07-25T02:02:35Z

I've been thinking about this over the past couple of days and worked out why deconfiguring eth0 and requiring an EIP felt so wrong to me, and what I did wrong to break my instance of this module.

Essentially the bit I was missing here is that we need to have a public IP address so we can send stuff through an internet gateway and that the floating ENI (eth1) doesn't get one by default, so we need to assign an EIP to it so it has a public IP and can therefore connect out, otherwise we kill our internet connection when we deconfigure eth0.

This makes sense with the current use cases:

Port forwarding: as we need a static public IP, we need an EIP.
NAT with an EIP: as eth1 has a public IP, all our connectivity can all be done on eth1 so therefore we don't need eth0.

The reason why it wasn't working for me initially is because if the EIP isn't available before the EC2 instance starts, it doesn't get the routes it needs and is therefore cut off from the internet.

I'd really like this module to work without an EIP, so I'm going to hack together a patch to always use eth0 for output which should make this more reliable and drop the EIP requirement unless people are doing DNAT. (DNAT should still work even if we're using eth0 for our default route.)

JulianCBC · 2022-07-26T02:02:57Z

Ok:

Fix for this not working at all: Fix NAT not working 2022-07 #51
Changes to use eth0 for the upstream connection: Use eth0 for output #52 (Note that this is on top of the previous change)

@int128 these changes are probably overkill and I haven't tested DNAT, but they Work For Me so they should be mergeable.

int128 · 2022-07-27T00:49:04Z

This module uses eth1 with the EIP to pin the source IP address.
If eth0 is used, the source IP address may fluctuate.

I think your change breaks the fixed IP feature. How do you think?

nelg · 2022-07-27T01:31:57Z

This is what I have been using, which seems to be ok, at least not enough I've had problems.

module "nat" {
  source = "github.com/int128/terraform-aws-nat-instance?ref=5a3d3f41568d8af145e291067f1e6e9d71fb36fd"
  enabled                     = var.nat_gw ? false : true
  name                        = "natgw"
  vpc_id                      = module.vpc.vpc_id
  public_subnet               = module.vpc.public_subnets[0]
  private_subnets_cidr_blocks = module.vpc.private_subnets_cidr_blocks
  private_route_table_ids     = var.nat_gw ? [] : module.vpc.private_route_table_ids
}

resource "aws_eip" "nat" {
  network_interface = module.nat.eni_id
  tags = {
    "Name" = "nat-instance-main"
  }
}

JulianCBC · 2022-07-27T01:45:36Z

This module uses eth1 with the EIP to pin the source IP address. If eth0 is used, the source IP address may fluctuate.

I think your change breaks the fixed IP feature. How do you think?

I guess it depends on your use case.

If you need all your NATed traffic to come from a constant IP, then yeah, this breaks that, but this should be a pretty niche use-case and NAT instances should be pretty long-running and therefore have a relatively constant IP address, just not one known in advance.

If DNAT port forwarding is enabled, it should still work as long as the services inside the private subnet aren't expecting to be able to tell remote services something like "hey, connect to whatever my IP is, but on port 1234", where port 1234 has previously been opened using DNAT. Again, this should be a pretty niche use-case and I think that most common services that do this, e.g. active FTP, already have special case handling in Linux.

I guess that in my opinion, a constant source IP address isn't required for well over 90% of use cases, so this will be fine and removes the need for an EIP, reducing costs and resource usage.

But yeah, we can't ignore those niche cases, so maybe this should be switchable then? No EIP required for the common use cases, and tell the module it'll have an EIP if you absolutely need certainty about the source IP.

nelg · 2022-07-27T01:51:09Z

If you need all your NATed traffic to come from a constant IP, then yeah, this breaks that, but this should be a pretty niche use-case and NAT instances should be pretty long-running and therefore have a relatively constant IP address, just not one known in advance.

I think this case needs to be supported, it's not that uncommon to have a white listed external IP.

I guess that in my opinion, a constant source IP address isn't required for well over 90% of use cases, so this will be fine and removes the need for an EIP, reducing costs and resource usage.

As per the AWS docs:
An Elastic IP address doesn’t incur charges as long as all the following conditions are true:

The Elastic IP address is associated with an EC2 instance.
The instance associated with the Elastic IP address is running.
The instance has only one Elastic IP address attached to it.
The Elastic IP address is associated with an attached network interface. For more information, see Network interface basics.

So, having the Elastic IP I don't think is adding any costs, because the NAT instance exists all of the time.

JulianCBC · 2022-07-27T04:19:23Z

I think this case needs to be supported, it's not that uncommon to have a white listed external IP.

I agree that there are situations where it's needed, so I'll make it configurable.

As per the AWS docs: An Elastic IP address doesn’t incur charges as long as all the following conditions are true:

The Elastic IP address is associated with an EC2 instance.

The instance associated with the Elastic IP address is running.

The instance has only one Elastic IP address attached to it.

The Elastic IP address is associated with an attached network interface. For more information, see Network interface basics.

So, having the Elastic IP I don't think is adding any costs, because the NAT instance exists all of the time.

True, but you're limited to 5 of them without jumping through hoops with AWS support - I had to change how I was doing stuff in my VPC because I was using all 5 before I deployed this, so for people in situations where stuff you can't use is using most of your allocation or you want more than 5 VPCs with NAT instances, it'd be nice to not require one.

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-addresses-eip.html#using-instance-addressing-limit

Udit-Sharma2020 · 2022-11-17T14:41:13Z

I am using this module then, EIP is not attching to nat instance and the snat service is failing

When i analyzed the repo i find out that

This NAT module has a runonce.sh script and a snat.sh
Now when the Launch template is created, the user data section has a execution command to exec this runonce.sh script

Now this runonce.sh script is responsible to attach the ENI to the same nat instance and then start the snat service
which in turn calls the /opt/nat/snat.sh script that configures NAT configuration.

But this is not working as per expected
runonce.sh is not getting executed.

DragonStuff mentioned this issue Sep 15, 2021

fix(nat-instance): merge in fix from branch to mainline TableCheck-Labs/terraform-aws-nat-instance#1

Merged

int128 mentioned this issue Jan 23, 2022

Prevent setting default route to eth0 after reboot #42

Merged

JulianCBC mentioned this issue Jul 28, 2022

Remove EIP requirement for HA mode AndrewGuenther/fck-nat#13

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does this still work? #35

Does this still work? #35

nelg commented Feb 16, 2021 •

edited

Loading

nelg commented Feb 16, 2021

DesAWSume commented May 1, 2021 •

edited

Loading

szromek commented May 5, 2021

nelg commented May 29, 2021

nelg commented May 29, 2021

arjitj2 commented Jan 22, 2022

int128 commented Jan 23, 2022

int128 commented Jan 29, 2022

nelg commented Apr 6, 2022

nelg commented Apr 12, 2022

JulianCBC commented Jul 22, 2022

nelg commented Jul 22, 2022 via email

JulianCBC commented Jul 23, 2022

JulianCBC commented Jul 25, 2022

JulianCBC commented Jul 26, 2022

int128 commented Jul 27, 2022

nelg commented Jul 27, 2022 •

edited

Loading

JulianCBC commented Jul 27, 2022

nelg commented Jul 27, 2022 •

edited

Loading

JulianCBC commented Jul 27, 2022

Udit-Sharma2020 commented Nov 17, 2022

Does this still work? #35

Does this still work? #35

Comments

nelg commented Feb 16, 2021 • edited Loading

nelg commented Feb 16, 2021

DesAWSume commented May 1, 2021 • edited Loading

szromek commented May 5, 2021

nelg commented May 29, 2021

nelg commented May 29, 2021

arjitj2 commented Jan 22, 2022

int128 commented Jan 23, 2022

int128 commented Jan 29, 2022

nelg commented Apr 6, 2022

nelg commented Apr 12, 2022

JulianCBC commented Jul 22, 2022

nelg commented Jul 22, 2022 via email

JulianCBC commented Jul 23, 2022

JulianCBC commented Jul 25, 2022

JulianCBC commented Jul 26, 2022

int128 commented Jul 27, 2022

nelg commented Jul 27, 2022 • edited Loading

JulianCBC commented Jul 27, 2022

nelg commented Jul 27, 2022 • edited Loading

JulianCBC commented Jul 27, 2022

Udit-Sharma2020 commented Nov 17, 2022

nelg commented Feb 16, 2021 •

edited

Loading

DesAWSume commented May 1, 2021 •

edited

Loading

nelg commented Jul 27, 2022 •

edited

Loading

nelg commented Jul 27, 2022 •

edited

Loading