Whether you are researching load
balancer's for your own needs or the needs of your employer, you will inevitably come across HAProxy. As such, you may be asking what it is and how it can benefit you or your company.
When researching load balancers, you will find your options usually fall into one of two categories; Hardware based vs. Software Based. In the hardware realm you will find options such as F5’s BIG-IP, Citrix NetScaler and Kemp Technologies that offer dedicated appliances that are running proprietary software.
On the opposite side of that you have software based solutions where you are able to use commodity hardware that fits your needs independent of the load balancing software being used. In this realm,
you will find solutions such as NGINX and HAProxy with the latter being the focus of this guide.
The goal of the team behind HAProxy is to provide a “free, very fast and reliable solution” for load-balancing TCP and HTTP-based applications. Because of this HAProxy is considered by many to be the de facto standard when it comes to software-based load balancing and is
currently being used by sites such as GitHub, Reddit, Twitter and Tumblr to name a few.
It has been designed to run on Linux, Solaris, FreeBSD, OpenBSD as well as AIX platforms. While it’s designed to run on most x86-64 hardware that has limited resources, it will perform best when provided enterprise-grade hardware such as 10+ Gig NIC’s and Xeon class CPU’s
Below are a few of the key terms and concepts you should understand when working with HAProxy. When working with load balancers, these are the key concepts that will apply to all solutions.
Frontend, within the context of HAProxy, dictates where and how incoming traffic is routed to machines behind HAProxy. They allow you to setup rules (ACL’s) that will “watch” for specific URL syntax inbound and outbound of the load balancer and intelligently route a user’s traffic as needed.
Furthermore, the frontend is where you will configure what IP and ports that HAProxy will listen for traffic on as well as configuring HTTPS on those respective ports.
This frontend example shows that we are only opening port 80 for incoming requests and redirecting all traffic to the default backend called appX-backend.
As the name suggests, a backend within the context of HAProxy is a group of resources that is home to your data or applications. These backend resources are where traffic will get routed by the rules you have configured in your frontends.
Backend resource groups can consist of a single server or multiple servers but in the context of load balancers, it is assumed that a minimum of two are being used. The more backend resources you can add to a group; the lower the load on each individual resource will be while increasing the number of users you can serve at any given time.
Lastly, just as with the frontend, you will also configure the IP and ports that your backend resources are listening for requests on.
server appX_01 192.168.2.2:8080 check
server appX_02 192.168.2.3:8080 check
In this example, the backend is called appX-backend and it contains two servers that are being accessed using roundrobin which is explained later in this guide.
ACL (Access Control List)
In the context of HAProxy, ACLs are the backbone of more in-depth and complex configurations that contain multiple frontends as well as multiple backends and need very precise routing.
With ACLs, you have the ability to parse through requests and do a multitude of different actions such as rewriting and redirecting all traffic requests as needed. Because HAProxy has the ability to load balance over Layer 4 or Layer 7 in the OSI model, you can effectively configure it to handle a number of different uses at the same time with multiple frontends and backend.
acl url_appX path_beg -i /appX/
use_backend appX-backend if url_appX
This example is rather simple in that it evaluates the incoming request for the resource path which is the context immediately after the first / such as http://example.com/appX. In this scenario, if the incoming request is http://example.com/appX, the request is sent to the backend called appX-backend whereas all other requests will default to the appZ-backend.
HAProxy comes with a fairly large number of options when it comes to choosing the method in which you want requests to be served to your backend resources. Below are a few of the more common ones as well as a short description of how each will work.
Each server is used in a never-ending line, starting with the first one listed in a given backend until the end of that list is reached at which point the next request will go back to the first resource again. By default, HAProxy will use this algorithm if one is not specified when building a backend.
Each resource in a given backend is evaluated to determine which one has the least number of active connections. The resource with the lowest number will receive the next request. The developers of HAProxy state that this algorithm is a great option for connections that are expected to last a long time such as LDAP and SQL but not for HTTP.
Files & Folders
Below are a number of the directories you can expect to find in a default HAProxy installation on Debian-based OS but regardless of the host OS, the files will be essentially the same with relation to what they do. While there are a number of files and directories listed, the one that will matter the most is the haproxy.cfg file located at /etc/haproxy/haproxy.cfg.
- Default location of the binary.
- Built-in documentation.
- init script used to control the HAProxy process/service.
- File that is sourced by both the initscript for haproxy.cfg file location.
- Default location of the haproxy.cfg file which determines all functions of HAProxy.
HAProxy functions are completely controlled by the haproxy.cfg file. This file is where you will build your frontends and backends as well as other various settings which are described below. For example, here is an example of a basic haproxy.cfg file.
log 127.0.0.1 local0
timeout connect 5000ms
timeout client 50000ms
timeout server 50000ms
rspadd X-Forwarded-Host:\ http:\\\\example.com
server appZ_01 IP_OF_MACHINE_01:8080 check
server appZ_02 IP_OF_MACHINE_02:8080 check
server appZ_03 IP_OF_MACHINE_03:8080 check
server appZ_04 IP_OF_MACHINE_04:8080 check
With this example, you can see how the different pieces discussed are brought together to create a working configuration. While the introduction of the global and default fields is new, the syntax should be similar enough to the frontend and backend that you are able to determine what their functions are. With that said, below are a few explanations for a few of these new lines that may not seem immediately obvious if you have never seen them before.
As mentioned previously, HAProxy has the ability to load balance using layer 4 or 7 in the OSI model. With that in mind and by setting the mode to http, HAProxy can now inspect HTTP headers for all requests and modify and redirect per each request. The other option is to use tcp as the mode which allows HAProxy to route per IP and port; ignoring all HTTP headers.
HAProxy has the ability to specify timeouts for various aspects of standard HTTP/TCP protocols. In the example, the configuration file has been configured to limit connection (time to wait for a successful connection attempt), client (maximum client side inactivity time) and server (maximum server side inactivity time) time frames.
By setting various timeout options, HAProxy can efficiently manage the amount of resources it needs to continually consume which can be freed for new connections. Because there are a large number of timeouts available, referencing the documentation will provide a much more thorough overview of what is possible.
Enables the use of X-Forwarded-For headers by HAProxy.
This option allows you to configure how
HAProxy handles connections from the server side. HAProxy by default runs in keep-alive mode which means that connections are kept open and in an idle state. By using this option, we can force HAProxy to close these connections so that they are not consuming resources.
As the forwardfor option enables the use of the X-Forwarded-For header, it must now be configured and that is what rspadd is used for. One of the more common uses of this is when hosting various applications behind HAProxy and you want to ensure that your end users only see responses coming from HAProxy and not the machines behind it which can occasionally occur with Asynchronous processes.
To bring the above information into further context, below are a few basic example configurations to help illustrate how one would setup HAProxy for different scenarios.
This configuration can be used when you need to route all traffic to one machine until it goes offline - at which time, HAProxy will automatically begin to route traffic to the second machine.
server appZ_01 192.168.2.2:8080 check
server appZ_02 192.168.2.3:8080 check backup
With this configuration, you can route a given incoming request to different backends as needed. If you run multiple applications behind HAProxy, this will allow you to add additional backend resources as needed to each individual application instead of increasing both
at the same time.
rspadd X-Forwarded-Host:\ http:\\\\example.com
acl url_appY path_beg -i /appY/
use_backend appY-backend if url_appY
server appY_01 192.168.2.2:8080 check
server appY_02 192.168.2.3:8080 check
server appZ_01 192.168.2.4:8080 check
server appZ_02 192.168.2.5:8080 check backup
The following resources may help with additional understanding of the ways in which HAProxy can be configured and used.
HAProxy Documentation - http://www.haproxy.org/#docs
HAProxy References - http://www.haproxy.org/they-use-it.html