Skip to main content

Troubleshooting AWS Network Connectivity: Security Groups and NACLs

Hands-On Lab

 

Photo of Tia  Williams

Tia Williams

AWS Training Architect II in Content

Length

01:00:00

Difficulty

Intermediate

Troubleshooting basic network connectivity issues is an important skill. This troubleshooting scenario is an opportunity to assess skills in this area. Our scenario is this: a junior administrator has deployed a VPC and instances but there are a few things wrong. Instance3 is not able to connect to the internet and, and the junior admin can't determine why. Being a senior administrator, it's your responsibility to troubleshoot the issue and ensure that the instance has connectivity to the internet, so that you can ping and log into the instance with SSH.

What are Hands-On Labs?

Hands-On Labs are scenario-based learning environments where learners can practice without consequences. Don't compromise a system or waste money on expensive downloads. Practice real-world skills without the real-world risk, no assembly required.

Troubleshooting AWS Network Connectivity: Scenario 1

Welcome aboard! There's no time to waste in this lab. We've got an NACL instance (Instance3) that won't connect to the internet, and we need it to. But we don't even know yet what's wrong with it, so a bit of troubleshooting is in order before we can get to fixing the problem.

Prerequisites

We need to make sure we are in the us-east-1 region, and we've got to use the login credentials provided on the hands-on lab page.

We're going to start by looking at Instance3. Think of it as the center of an onion. Once we're done there, we'll work our way out through the other layers of AWS services and products until we've figured out what's broken.

Troubleshooting the instance itself

Let's get into EC2 from the AWS dashboard, and look at the details of Instance3. Once we're in there looking, we'll notice that it's only got a private IP address. It needs a public IP address, so we've got to assign one.

Up in the Actions menu, go to Networking > Manage IP Addresses. In this pop-up window, we can click on Assign new IP, but that's going to give it another private IP address. So instead, we've got to click Assign an Elastic IP, so that the address we get is public. In the next window, click the blue Allocate button, then Close. We're not done though. Yes, we have a public IP allocated, but it's not assigned to Instance3 yet.

Back up in the Actions menu, head into Associate Address. Pick Instance3 from the Instance dropdown, click the blue Associate button, then click the incoming blue Close button.

Now if we look at the EC2 instance details, we'll see it's got a public IP. But if we ping that public IP from a terminal, none of our packets are going through. Something is still discombobulated. Since we know the instance itself is fine though, let's leave ping command running in this terminal, then step out a little and look at the next onion layer.

Security Groups

In the Instance3 details, we can see that it's associated with a security group that has EC2SecurityGroup3 in its name. Let's navigate to Security Groups in the left-hand menu and find the one named something like EC2SecurityGroup3. Highlight it so that the details show up on the bottom of the screen. In the Inbound tab, we can see that SSH and ICMP (ping) traffic is allowed. In the Outbound tab, all traffic is allowed. This means that our Security Group rule is fine, and isn't what broke our Instance3 connection to the internet.

Let's move out to the next onion layer. Be careful though. We don't want to create those tear-inducing vapors we get with real onions...

Subnet Configuration

Moving along, let's take a peek at how we've got this instance's subnet configured. Back on the screen showing our instances, we can see that Instance3 has a private IP. We actually saw this before, remember?

In a new browser tab, let's head over to the VPC Dashboard, and get into our subnets. Look for the one that Instance3 is in. For instance, if the private IP of Instance3 is 10.3.1.49, then we're going to click on the subnet with a CIDR of 10.3.1.0/24. Let's say that this is called PublicSubnet4, just so we have a name we can refer back to.

Once we click on that subnet, its details show up down in the lower part of the screen.

NACL

In that section of the subnet page (in the Description tab) we can see the NACL this is assigned to. Let's click on that and have a gander at the rules.

The Inbound tab, once we get in will show... Whoah! That's a problem. All inbound traffic is set to DENY. We see the same thing in the Outbound tab. This means that the NACL is blocking all traffic in and out of the instance, despite what the security group says.

We've got a few options here. We can edit this NACL to allow the traffic, make a new NACL, or kick this subnet over so that it uses a different NACL (one with the rules we want already set up) altogether. In a production environment, we'd probably want to go with the first or second option. Here in this lab though, just to get things up and running quicker, we're going with Plan C, swapping the subnet over to an existing NACL.

If we look through the others, we'll find one that allows SSH, All ICMP (ping), and ephemeral ports. The outbound rules are pretty much the same (except that they block SSH), and this is close enough for what we want. Remember the name (like, Public-3-NACL) and head back into where we can edit the PublicSubnet4 subnet's properties.

In the Network ACL tab (in the lower half of the screen), click on the Edit network ACL association button. There's a *Change to dropdown here. Find and select Public-3-NACL in the list, click the blue Save button, then click the blue Close** button.

This should be working now. Let's peek at the ping command we left running in that terminal earlier.

This is terrible. We're not done yet. Our pings are still timing out.

Well, we've still got a couple other layers of the onion we can look at...

Route Table

We should still have the subnet window open for PublicSubnet4. Let's get into the Route Table tab and see what's shaking there.

Aha! Another problem has reared its ugly head. We've got a route table that doesn't allow any traffic to the internet. We've got to either add a route to this table, or add a different route table that has the appropriate paths specified in it.

In the real world, editing this route table may have some far-reaching repercussions. What if there's another instance using it that we don't want having internet access? Let's not edit this one. We'll go looking for for one that works already.

Click on Edit route table association, then find one with access to the public. Something like Public3-RT. We'll see, once we select it, whether or not it's routing to the public and our private subnet. Click on the blue Save button once we've got the right route table selected, then the blue Close button.

Take a quick peek back at the pinging terminal. What do we see? Responses! Kill the ping command, and try SSH with the lab's username and password. Success!

Conclusion

We did it! This EC2 instance couldn't find the internet, even if it had GPS and a copilot, until we figured out what ailed it and fixed the various connectivity obstacles. This is the kind of thing AWS admins run into a lot out in the wild, and learning how to troubleshoot problems is a big step along the way to a certification. Congratulations!