Introduction to HAProxy

Introduction

Whether you are researching load
balancer's for your own needs or the needs of your employer, you will inevitably come across HAProxy. As such, you may be asking what it is and how it can benefit you or your company.

When researching load balancers, you will find your options usually fall into one of two categories; Hardware based vs. Software Based. In the hardware realm you will find options such as F5’s BIG-IP, Citrix NetScaler and Kemp Technologies that offer dedicated appliances that are running proprietary software. 

On the opposite side of that you have software based solutions where you are able to use commodity hardware that fits your needs independent of the load balancing software being used. In this realm,
you will find solutions such as NGINX and HAProxy with the latter being the focus of this guide. 

The goal of the team behind HAProxy is to provide a “free, very fast and reliable solution” for load-balancing TCP and HTTP-based applications. Because of this HAProxy is considered by many to be the de facto standard when it comes to software-based load balancing and is
currently being used by sites such as GitHub, Reddit, Twitter and Tumblr to name a few.

It has been designed to run on Linux, Solaris, FreeBSD, OpenBSD as well as AIX platforms. While it’s designed to run on most x86-64 hardware that has limited resources, it will perform best when provided enterprise-grade hardware such as 10+ Gig NIC’s and Xeon class CPU’s
or similar.


Terminology

Below are a few of the key terms and concepts you should understand when working with HAProxy. When working with load balancers, these are the key concepts that will apply to all solutions. 


Frontend

Frontend, within the context of HAProxy, dictates where and how incoming traffic is routed to machines behind HAProxy. They allow you to setup rules (ACL’s) that will “watch” for specific URL syntax inbound and outbound of the load balancer and intelligently route a user’s traffic as needed.

Furthermore, the frontend is where you will configure what IP and ports that HAProxy will listen for traffic on as well as configuring HTTPS on those respective ports. 

Example frontend:

frontend http-in
    bind *:80
    default_backend appX-backend 

This frontend example shows that we are only opening port 80 for incoming requests and redirecting all traffic to the default backend called appX-backend. 


Backend 

As the name suggests, a backend within the context of HAProxy is a group of resources that is home to your data or applications. These backend resources are where traffic will get routed by the rules you have configured in your frontends. 

Backend resource groups can consist of a single server or multiple servers but in the context of load balancers, it is assumed that a minimum of two are being used. The more backend resources you can add to a group; the lower the load on each individual resource will be while increasing the number of users you can serve at any given time. 

Lastly, just as with the frontend, you will also configure the IP and ports that your backend resources are listening for requests on. 

Example backend:

backend appX-backend
    balance roundrobin
    server appX_01 192.168.2.2:8080 check
    server appX_02 192.168.2.3:8080 check

In this example, the backend is called appX-backend and it contains two servers that are being accessed using roundrobin which is explained later in this guide. 


ACL (Access Control List)

In the context of HAProxy, ACLs are the backbone of more in-depth and complex configurations that contain multiple frontends as well as multiple backends and need very precise routing.

With ACLs, you have the ability to parse through requests and do a multitude of different actions such as rewriting and redirecting all traffic requests as needed. Because HAProxy has the ability to load balance over Layer 4 or Layer 7 in the OSI model, you can effectively configure it to handle a number of different uses at the same time with multiple frontends and backend.

Example ACL:

frontend http-in
    bind *:80
    acl url_appX path_beg -i /appX/
    use_backend appX-backend if url_appX
    default_backend appZ-backend 

This example is rather simple in that it evaluates the incoming request for the resource path which is the context immediately after the first / such as http://example.com/appX. In this scenario, if the incoming request is http://example.com/appX, the request is sent to the backend called appX-backend whereas all other requests will default to the appZ-backend.


Algorithms 

HAProxy comes with a fairly large number of options when it comes to choosing the method in which you want requests to be served to your backend resources. Below are a few of the more common ones as well as a short description of how each will work. 


Round Robin 

Each server is used in a never-ending line, starting with the first one listed in a given backend until the end of that list is reached at which point the next request will go back to the first resource again. By default, HAProxy will use this algorithm if one is not specified when building a backend. 


Least Connection

Each resource in a given backend is evaluated to determine which one has the least number of active connections. The resource with the lowest number will receive the next request. The developers of HAProxy state that this algorithm is a great option for connections that are expected to last a long time such as LDAP and SQL but not for HTTP.       


Structure

Files & Folders 

Below are a number of the directories you can expect to find in a default HAProxy installation on Debian-based OS but regardless of the host OS, the files will be essentially the same with relation to what they do. While there are a number of files and directories listed, the one that will matter the most is the haproxy.cfg file located at /etc/haproxy/haproxy.cfg.

1.  /usr/sbin/haproxy

  •  Default location of the binary.

2.  /usr/share/doc/haproxy

  • Built-in documentation.

3. /etc/init.d/haproxy

  • init script used to control the HAProxy process/service.

4. /etc/default/haproxy

  • File that is sourced by both the initscript for haproxy.cfg file location.

5.   /etc/haproxy

  • Default location of the haproxy.cfg file which determines all functions of HAProxy.


Configuration File

HAProxy functions are completely controlled by the haproxy.cfg file. This file is where you will build your frontends and backends as well as other various settings which are described below. For example, here is an example of a basic haproxy.cfg file.


global
log 127.0.0.1 local0
    daemon
    maxconn 256
 
defaults
    mode http
    timeout connect 5000ms
    timeout client 50000ms
    timeout server 50000ms
    option forwardfor
option http-server-close
 
frontend http-in
    bind *:80
    rspadd X-Forwarded-Host:\ http:\\\\example.com
default_backend appZ-backend
 
backend appZ-backend
    balance roundrobin
    server appZ_01 IP_OF_MACHINE_01:8080 check
    server appZ_02 IP_OF_MACHINE_02:8080 check
    server appZ_03 IP_OF_MACHINE_03:8080 check
    server appZ_04 IP_OF_MACHINE_04:8080 check 

With this example, you can see how the different pieces discussed are brought together to create a working configuration. While the introduction of the global and default fields is new, the syntax should be similar enough to the frontend and backend that you are able to determine what their functions are. With that said, below are a few explanations for a few of these new lines that may not seem immediately obvious if you have never seen them before. 

mode http

As mentioned previously, HAProxy has the ability to load balance using layer 4 or 7 in the OSI model. With that in mind and by setting the mode to http, HAProxy can now inspect HTTP headers for all requests and modify and redirect per each request.  The other option is to use tcp as the mode which allows HAProxy to route per IP and port; ignoring all HTTP headers. 

Reference: https://cbonte.github.io/haproxy-dconv/1.6/configuration.html#mode 


timeout 's

HAProxy has the ability to specify timeouts for various aspects of standard HTTP/TCP protocols. In the example, the configuration file has been configured to limit connection (time to wait for a successful connection attempt), client (maximum client side inactivity time) and server (maximum server side inactivity time) time frames. 

By setting various timeout options, HAProxy can efficiently manage the amount of resources it needs to continually consume which can be freed for new connections. Because there are a large number of timeouts available, referencing the documentation will provide a much more thorough overview of what is possible. 

Reference: https://cbonte.github.io/haproxy-dconv/1.6/configuration.html#timeout 


option forwardfor

Enables the use of X-Forwarded-For headers by HAProxy.


option http-server-close

This option allows you to configure how
HAProxy handles connections from the server side. HAProxy by default runs in keep-alive mode which means that connections are kept open and in an idle state. By using this option, we can force HAProxy to close these connections so that they are not consuming resources.

Reference: https://cbonte.github.io/haproxy-dconv/1.6/configuration.html#option
http-server-close
 


rspadd X-Forwarded-Host:\

As the forwardfor option enables the use of the X-Forwarded-For header, it must now be configured and that is what rspadd is used for. One of the more common uses of this is when hosting various applications behind HAProxy and you want to ensure that your end users only see responses coming from HAProxy and not the machines behind it which can occasionally occur with Asynchronous processes.


Usage

To bring the above information into further context, below are a few basic example configurations to help illustrate how one would setup HAProxy for different scenarios. 


Active/Passive 

This configuration can be used when you need to route all traffic to one machine until it goes offline - at which time, HAProxy will automatically begin to route traffic to the second machine. 

backend appZ-backend
    server appZ_01 192.168.2.2:8080 check
    server appZ_02 192.168.2.3:8080 check backup 


ACL Request

With this configuration, you can route a given incoming request to different backends as needed. If you run multiple applications behind HAProxy, this will allow you to add additional backend resources as needed to each individual application instead of increasing both
at the same time. 

frontend http-in
    bind *:80
    rspadd X-Forwarded-Host:\ http:\\\\example.com
    acl url_appY path_beg -i /appY/
    use_backend appY-backend if url_appY
    default_backend appZ-backend
   
backend appY-backend
    balance roundrobin
    server appY_01 192.168.2.2:8080 check
    server appY_02 192.168.2.3:8080 check
 
backend appZ-backend
    server appZ_01 192.168.2.4:8080 check
    server appZ_02 192.168.2.5:8080 check backup 


Additional Resources 

The following resources may help with additional understanding of the ways in which HAProxy can be configured and used.


HAProxy Documentation - http://www.haproxy.org/#docs

HAProxy References - http://www.haproxy.org/they-use-it.html

  • post-author-pic
    Terrence C
    11-02-2016

    Thanks for the great contribution!

  • post-author-pic
    Rilindo F
    11-11-2016

    I like this. I needed an easy to digest guide for me to try out haproxy in my environment. This fits the bill.

  • post-author-pic
    Rodrigo D
    11-18-2016

    Excellent contribution. Thanks a lot!

  • post-author-pic
    Hunter F
    11-18-2016

    Thank you for your contribution! 

  • post-author-pic
    Martin C
    11-19-2016

    It's a great introduction ! Good job Michael

    I really like working with HAProxy, what I find it lack is better reporting interface. I find he GUI is somewhat hard to interpret.

  • post-author-pic
    Alexandra G
    11-21-2016

    A great introduction, thanks for this.

  • post-author-pic
    Christian C
    11-23-2016

    Simple, clean and precise introduction. Thanks

  • post-author-pic
    Rodrigo D
    11-23-2016

    Now, if possible, a  LA course on HAProxy would be great. :)

  • post-author-pic
    Nikko P
    01-06-2017

    Fantastic! As everyone else has said I would also love a LA course on HAProxy.

  • post-author-pic
    Anthony J
    01-06-2017

    HAProxy is included in our LPIC 202 course launching this month.


  • post-author-pic
    George T
    01-19-2017

    Nice guide, and great news  @Anthony !

  • post-author-pic
    Andrew H
    01-26-2017

    nice guide that helped me understand it better!

  • post-author-pic
    Ritesh S
    01-26-2017

    Way to go !. I have started liking it more as many times you don't just need a full lenghty tutorial but just a brief, highlighting the important conecpts. Exactly what is being done in the topics .

  • post-author-pic
    Joseph K
    01-27-2017

    It is good a starting point to get it quick understand what it is.

  • post-author-pic
    Matt G
    01-27-2017

    Can I use a public IP on the frontend and use that to access private IPs on the backend? For example I have multiple server machines in my network on internal 10dot IPs, but only 1 public IP. Can I use HA Proxy to point to their 10dot IPs on the backend and allow all the servers to be available on what would seem like port 80 publicly?

  • post-author-pic
    Byron S
    01-30-2017

    Thank you for this. I'd love to see a "when to use haproxy and when to use nginx" (or would you suggest only when you need a web server?) written in the same style. This was very helpful.

  • post-author-pic
    Nizam U
    02-02-2017

    Hi Anthony, could you please post the link to HaProxy tutorial. thank you

  • post-author-pic
    Christophe L
    02-04-2017

     @durjoy

     are you looking for a shareable link to this tutorial? Here you go: https://linuxacademy.com/howtoguides/posts/show/topic/12012-introduction-to-haproxy 

    Let me know if that's not what you had in mind!

  • post-author-pic
    Christophe L
    02-04-2017

     @bschwab you could use HAProxy as the load balancer and nginx for your webservers. One could argue that you could even drop HAProxy and use nginx for webserver + load balancer in order to simplify your stack while not sacrificing performance/scale too much. It's mostly a matter of personal preference.

  • post-author-pic
    Sharath G
    03-08-2017

     @Anthony Hi Anthony. I do not see HAProxy included in the LPIC 202. Will it be included in upcoming topics in LPIC 202? Thanks.

  • post-author-pic
    Chin E
    03-08-2017

    Hi how do I deploy Haproxy using chef and provision 2 Tomcat servers?

  • post-author-pic
    Luke A
    03-21-2017

    Has anyone got any experience/recommendation on which HAProxy mode to use (KAL, SCL, or FCL) for high request rate applications? Is it better to keep connections open to serve lots of requests, or to forceclose after each response is received?

  • post-author-pic
    Visharad D
    05-30-2017

    Nice article

  • post-author-pic
    Ernesto E
    06-15-2017

    I will be watching this webinar by BrightTALK today @ noon on HA Proxy Enterprise Edition (HAPEE).......In this webinar presented by HAProxy Technologies, you will learn:

    * How HAProxy Enterprise Edition (HAPEE) is different from HAProxy Community Edition

    * Why HAPEE is the most up-to-date, secure, and stable version of HAProxy

    * How enterprises leverage HAPEE to scale-out environments in the cloud

    * How enterprises can increase their admin productivity using HAPEE

    * How HAPEE enables advanced DDOS protection and helps mitigate other attacks

    Presenters:

    * Tim Hinds, Director of Product Marketing, HAProxy Technologies

    * Carl Caira, Systems Engineer, HAProxy Technologies

    * Chad Lavoie, Senior Systems Engineer, HAProxy Technologies

  • post-author-pic
    Praveen Kumar N
    07-25-2017

    can someone direct me to good vedios of introduction to configuration for haproxy ?


  • post-author-pic
    Sai K
    08-31-2017

    https://www.youtube.com/watch?v=mIOw4a34LCk , thisalong with documentation was helpful for me

  • post-author-pic
    Aj N
    10-25-2017

    Nice tutorial Mickael  @mhatcher  , how can I get inf. about the balancing what server gets how many connections at any time?

    I would like to collect this information regularly to be shown in another program

  • post-author-pic
    Luke A
    10-28-2017

     @AJNOURI the HAProxy stats page shows this information. You can add the following to your haproxy *:80 listener config:


    stats enable
    stats uri /stats
    stats auth username:password

    then you will be able to access it at http://<ip_of_haproxy>/stats. This shows lots of connection stats for your frontends/backends.

    For use with automation, this page is also available in CSV format by appending ";csv" to your HAProxy stats URL (http://<ip_of_haproxy>/stats;csv). You can then parse this CSV output in your automation.

    Hope this helps!

  • post-author-pic
    Aj N
    10-28-2017

    Thank you very much  @landerson61 , that's what I'm looking for!

  • post-author-pic
    Mayduavong T
    02-02-2018

    Thanks for sharing, nice post! Post really provice useful information! http://anthaison.vn/sp/may-dua-vong-tu-dong-ts/

  • post-author-pic
    Giaonhan Q
    05-07-2018

    Nice post! https://giaonhanquocte247.com/order-hang-nhat/

  • post-author-pic
    Teddy B
    10-22-2018

    Great introduction ! Thank you  Michael Hatcher. 

Looking For Team Training?

Learn More