Troubleshooting AWS Network Connectivity: Security Groups and NACLs
AWS Training Architect II in Content
Troubleshooting basic network connectivity issues is an important skill. This troubleshooting scenario is an opportunity to assess skills in this area.
Our scenario is this: a junior administrator has deployed a VPC and instances but there are a few things wrong.
Instance3 is not able to connect to the internet and, and the junior admin can't determine why.
Being a senior administrator, it's your responsibility to troubleshoot the issue and ensure that the instance has connectivity to the internet, so that you can ping and log into the instance with SSH.
Troubleshooting AWS Network Connectivity: Scenario 1
Welcome aboard! There's no time to waste in this lab. We've got an NACL instance (
Instance3) that won't connect to the internet, and we need it to. But we don't even know yet what's wrong with it, so a bit of troubleshooting is in order before we can get to fixing the problem.
We need to make sure we are in the
us-east-1 region, and we've got to use the login credentials provided on the hands-on lab page.
We're going to start by looking at
Instance3. Think of it as the center of an onion. Once we're done there, we'll work our way out through the other layers of AWS services and products until we've figured out what's broken.
Troubleshooting the instance itself
Let's get into EC2 from the AWS dashboard, and look at the details of
Instance3. Once we're in there looking, we'll notice that it's only got a private IP address. It needs a public IP address, so we've got to assign one.
Up in the Actions menu, go to Networking > Manage IP Addresses. In this pop-up window, we can click on Assign new IP, but that's going to give it another private IP address. So instead, we've got to click Assign an Elastic IP, so that the address we get is public. In the next window, click the blue Allocate button, then Close. We're not done though. Yes, we have a public IP allocated, but it's not assigned to
Back up in the Actions menu, head into Associate Address. Pick
Instance3 from the Instance dropdown, click the blue Associate button, then click the incoming blue Close button.
Now if we look at the EC2 instance details, we'll see it's got a public IP. But if we ping that public IP from a terminal, none of our packets are going through. Something is still discombobulated. Since we know the instance itself is fine though, let's leave
ping command running in this terminal, then step out a little and look at the next onion layer.
Instance3 details, we can see that it's associated with a security group that has
EC2SecurityGroup3 in its name. Let's navigate to Security Groups in the left-hand menu and find the one named something like
EC2SecurityGroup3. Highlight it so that the details show up on the bottom of the screen. In the Inbound tab, we can see that SSH and ICMP (ping) traffic is allowed. In the Outbound tab, all traffic is allowed. This means that our Security Group rule is fine, and isn't what broke our
Instance3 connection to the internet.
Let's move out to the next onion layer. Be careful though. We don't want to create those tear-inducing vapors we get with real onions...
Moving along, let's take a peek at how we've got this instance's subnet configured. Back on the screen showing our instances, we can see that
Instance3 has a private IP. We actually saw this before, remember?
In a new browser tab, let's head over to the VPC Dashboard, and get into our subnets. Look for the one that
Instance3 is in. For instance, if the private IP of
Instance3 is 10.3.1.49, then we're going to click on the subnet with a CIDR of 10.3.1.0/24. Let's say that this is called
PublicSubnet4, just so we have a name we can refer back to.
Once we click on that subnet, its details show up down in the lower part of the screen.
In that section of the subnet page (in the Description tab) we can see the NACL this is assigned to. Let's click on that and have a gander at the rules.
The Inbound tab, once we get in will show... Whoah! That's a problem. All inbound traffic is set to DENY. We see the same thing in the Outbound tab. This means that the NACL is blocking all traffic in and out of the instance, despite what the security group says.
We've got a few options here. We can edit this NACL to allow the traffic, make a new NACL, or kick this subnet over so that it uses a different NACL (one with the rules we want already set up) altogether. In a production environment, we'd probably want to go with the first or second option. Here in this lab though, just to get things up and running quicker, we're going with Plan C, swapping the subnet over to an existing NACL.
If we look through the others, we'll find one that allows SSH, All ICMP (ping), and ephemeral ports. The outbound rules are pretty much the same (except that they block SSH), and this is close enough for what we want. Remember the name (like,
Public-3-NACL) and head back into where we can edit the
PublicSubnet4 subnet's properties.
In the Network ACL tab (in the lower half of the screen), click on the Edit network ACL association button. There's a *Change to dropdown here. Find and select
Public-3-NACL in the list, click the blue Save button, then click the blue Close** button.
This should be working now. Let's peek at the ping command we left running in that terminal earlier.
This is terrible. We're not done yet. Our pings are still timing out.
Well, we've still got a couple other layers of the onion we can look at...
We should still have the subnet window open for
PublicSubnet4. Let's get into the Route Table tab and see what's shaking there.
Aha! Another problem has reared its ugly head. We've got a route table that doesn't allow any traffic to the internet. We've got to either add a route to this table, or add a different route table that has the appropriate paths specified in it.
In the real world, editing this route table may have some far-reaching repercussions. What if there's another instance using it that we don't want having internet access? Let's not edit this one. We'll go looking for for one that works already.
Click on Edit route table association, then find one with access to the public. Something like
Public3-RT. We'll see, once we select it, whether or not it's routing to the public and our private subnet. Click on the blue Save button once we've got the right route table selected, then the blue Close button.
Take a quick peek back at the pinging terminal. What do we see? Responses! Kill the
ping command, and try SSH with the lab's username and password. Success!
We did it! This EC2 instance couldn't find the internet, even if it had GPS and a copilot, until we figured out what ailed it and fixed the various connectivity obstacles. This is the kind of thing AWS admins run into a lot out in the wild, and learning how to troubleshoot problems is a big step along the way to a certification. Congratulations!