Skip to main content

Create a Load Balanced VM Scale Set in Azure

Hands-On Lab


Photo of

Training Architect





Welcome to this Azure learning activity, where we will be creating and configuring Load Balancing for a Virtual Machine Scale Set (VMSS). The goal of this lesson is to gain knowledge and experience with: Dynamic and elastic compute using a VMSS High availability using a Load Balancer Good luck and enjoy the Learning Activity!

What are Hands-On Labs?

Hands-On Labs are scenario-based learning environments where learners can practice without consequences. Don't compromise a system or waste money on expensive downloads. Practice real-world skills without the real-world risk, no assembly required.

Create a Load-Balanced VM Scale Set in Azure


Welcome to the lab, where we'll get into Azure and create a load-balanced VM scale set. All of the login credentials are provided in the Linux Academy lab page.

Note: If we want to connect to the VMSS instances, for test purposes, we can set azureuser as a username. For a password, use the one that is provided in the credentials section, but add '123' in order to meet the character limit.

Create a VM Scale Set

Once we're logged into the Azure Portal, click on the green + sign to create a new resource. Search for VM Scale Set, and click on Virtual machine scale set from the list of search results. Now we can click Create.

On the next screen, use these settings:

  • Virtual machine scale set name: vmscaleset01
  • Operating system disk image: Ubuntu 16.04 LTS
  • Subscription: The default is already selected
  • Resource group: The default is already selected/existing
  • Location: South Central US
  • Username: azureuser
  • Password: Use the password you have been provided in the credentials section
  • Instance count: 0 (we will change this later)
  • Instance size: A0 (You may need to turn off all filters in order to see this size)
  • Use managed disks: Yes
  • Autoscale: Disabled (we will change this later)
  • Load balancing option: None (we will change this later)
  • Virtual network: VNET1
  • Subnet: subnet1
  • Public IP address per instance: On

We can click Create when we've got everything set, and then we'll have to wait a bit for it to spin up.

Configure the VM Extension

We want every server, as auto scaling spins it up to deal with a load, to be identical to the rest of the servers. We can do that by running a script on each one as it comes alive.

In the All Resources page, click on vmscaleset01 that we just created, and then navigate to Extensions in the left-hand menu. In this page, click Add, then choose Custom Script For Linux on the left-hand list. Let's click on Create down at the bottom of the screen.

We need to provide a script (containing the commands we want run). We can grab that file here:, and then browse for it by clicking the folder icon next to the Script files text box. Once we click OK, our scale set will start updating and including the new extension we set up. It's really just updating the OS, and installing the Apache web server.

We've got no instances, since we set them to zero when we configured the scale set initially. Let's set our instance count to 1, just to see what happens, before we go autoscaling things. Navigate to Scaling (in the left-hand menu under Settings), and drag the Instance count slider we find there over from 0 to 1. Click Save, in the upper left of the screen, and in a few minutes we'll be able to see something sitting in the Instances screen. Remember though, it make take a little while for the instance to boot and run our shell script.

Once it's up and running, let's check to see if our extension worked. Click on our new instance from the Instances screen, grab its public IP address, and paste it into a web browser. We should see Ubuntu's default Apache server page.

Configure Autoscale

Back in the Scaling window, we see our slider set to 1. Let's click Enable autoscaling to get things really moving. Here is how we want to configure these options:

  • Autoscale setting name: autoscalesetting01
  • Resource group: Should already be populated

In the Default section of this page, we're going to set Scale mode to Scale based on a metric, then click Add a rule to set up our scaling out procedure. Configure the web form on the right-hand side of the screen like this:

  • Scale rule
    • Metric source: Current resource (vmscaleset01)
  • Criteria
    • Time aggregation: Average
    • Metric name: Percentage CPU
    • Time grain statistic: Average
    • Operator: Greater than
    • Threshold: 70
    • Duration: 5
  • Action
    • Operation: Increase count by
    • Instance count: 1
    • Cool down (minutes): 5

At the bottom of this options section, we can click Add. We've got things set so that when our load increases, we can fire up more servers to deal with it. But now we've got to set things so that our server count decreases when we don't really need them any more.

So again, let's click Add a rule. We'll configure these settings almost exactly like the first batch, but we're going to set the Operator to Less than, the Threshold to 10, and the Operation to Decrease count by. Once we click Add again, we're almost done.

This setup could scale out all day, and end up costing lots of money, so let's stick a limit on the total number of instances allowed. Down under the Add a rule link, set the Instance limits: Minimum will be 1, Maximum will be 5, and the Default will be 1.

Now we can click Save in the upper part of the screen.

Stress Testing

We want to see if we can make this scale set actually scale. Using Instant Terminal in Linux Academy, or our own terminal program, let's log into the existing instance with SSH, as azureuser, and stress it out a bit. Remember that when we're prompted for a password we're using the regular lab password, but with 123 tacked onto the end.

Once we're in, we can run stress --cpu 1. We can watch CPU usage spike in the stats of of our scale set. It will take a few minutes though.

If we get into our Instances screen again, once the stress test has been running a while, we'll see that our rule for scaling out has been triggered, and there are more instances being spun up. And back in the stats, we're seeing the average CPU usage, the stats for the scale set as a whole, is going down.

If we edit our rules so that the maximum number of servers is 2, our scale set will start deleting instances, and average CPU usage will go up.

Configure Load Balancing

A scale set is great, but we can't expect users to know which server they should be aiming for when others are under a load. Now we've got to create a load balancer, so that users can just aim for one spot. The load balancer will automatically direct their requests to the server best suited (lower load, healthy Apache instance, etc.) to meet their needs.

Create a Load Balancer

Click on the + in the main Azure dashboard menu, and look for "load balancer" in the search box. Press Enter, then click on the Load Balancer published by Microsoft at the top of the list. Then click the Create button that shows up at the bottom of the screen. We'll land at a web form to fill out, one that will create the load balancer. Let's set the options like this:

  • Name: publb01
  • Type: Public
  • SKU: Basic
  • Public IP: Use existing
    • publb-1-pip should show up in the dropdown below, automatically
  • The Subscription, Resource group, and Location are already set to what we need.

We can click Create.

That will take a little while to deploy, and when it's done we can get into it by clicking Go to resource in the "Deployment suceeded" dialog that pops up, or we can go to All Resources in the main menu and find our load balancer, publb01, in that screen.

Create a Backend Pool

Now that we're in publb01, click on Backend Pools in the left-hand menu, then click Add. In the configuration screen, give it a Name of backendpool01, leave IPv4 selected in the IP version section, and select Virtual machine scale set in the Associated to dropdown. Ours will show up in the list below that, so select it. Then let's click OK at the bottom of the screen to continue along.

Create a Health Probe

Once our backend pool has finished deploying, which may take a few minutes, we need to set up a health probe. We'll find it in the left-hand menu, right under Backend pools. Once we're in here, we can click Add and start configuring:

  • Name: healthprobe01
  • Protocol: HTTP (This will test is our instance is responding to HTTP requests)
  • Port: 80
  • Path: / (forward slash)
  • Interval: 5
  • Unhealthy threshold: 2


Once the health probe is done deploying, click on Load balancing rules in the left-hand menu, and then click Add. We'll fill this form out like so:

  • Name: loadbalance01
  • IP Version: IPv4
  • Frontend IP address: LoadBalancerFrontEnd
  • Protocol: TCP
  • Port: 80
  • Backend port: 80
  • Backend pool: backendpool01
  • Health probe: healthprobe01
  • Session persistence: None
  • Idle timeout: 4
  • Floating IP: Disabled

We can click OK, wait for it to deploy, then move on to another test.


In order to test, we need to know the public IP of our public load balancer. But it doesn't get one yet, because our scale set no longer has one. We need to take another step before this all happens, by forcing the scale set to use the new configurations we made with a load balancer, backend pool, and health probe. Back in our All resources section, find our scale set, and get into Instances. Click on our single running instance, then click on the Upgrade button near the top of the screen.

Now, back in All resources, we can click on our publb01-pip, and on the next screen we should see our public IP address. Let's copy it, paste it into a web browser, and then bask in the glory of the default Ubuntu Apache server page that loads.


We've done it! We created a scale set and sat it behind a public load balancer so that when users come to our website in droves, new web servers will be created to help handle the load. They'll shut down when they're no longer needed, and all the while our load balancer will be directing user requests to the most appropriate server instance. Congratulations!