[Infra] HTTP load-balancing using HaProxy

If someone who has spent many hours with servers and network part may be familiar with the term load-balancing. However, applying load-balancing into reality 's never an easy task because it requires deep knowledge about systems and network traffic. This post will introduce tool to support load balancing for HTTP traffic named HaProxy and provide a small lab based on MacOS and Virtualbox.

Glance at HaProxy

HAProxy standing for High Availability Proxy is a open source software which is built for load balancing purpose from Layer 4 to Layer 7. HAProxy works almost based on Linux, Solaris and FreeBSD. The main role of HAProxy is increasing performance of distributed system over numerous server. Today, there’re many sites use HAProxy as an load balancer such as Github, Imgur, Instagram and Twitter.

HAProxy’s integrated many useful features but I just want to concentrate on three keys function which are background of HAProxy

Access Control List (ACL)

Access Control List defines rules for switch server based on some characters of incoming traffic. ACL can help the system to divide the main application with bundle of function into different kinds of services which are served in separated servers. An ACL contains two crucial parts:

*Define criterion with sets of values

*Perform actions accompany with sets of values when its valid

To make an ACL, we need to follow their syntax:

acl <aclname> <criterion> [flags] [operator] <value> ...
  • acl: keyword for access control list

  • <aclname>: name specific for each ACL and using case-sensitive to distinguish others

  • <criterion>: define the portion need to match with request/response

  • <flags>: the main action when matching. (I) -i ignore case during matching, (II) -f load matching pattern from file and (III) force end of flags, use when a string’s similar to one of the flag

  • <operators>: comparing operaters, depend on type of matching (Integers, String, Regex, network and IP address). E.g: eq true if equal with value, ge true of value is greater or equal at least one value

  • <value>: The name has revealed everything

For more detail about Access Control List, let visit documentation page of HAProxy

Frontend

Frontend is a postion receiving incoming request then forward it to suitable backends. In HAProxy configuration, Frontend has to contains three components:

  • a set of IP address and port (e.g 10.1.2.3:80, *:22, etc.)

  • ACLs

  • use_backend rules which define backends for each ACLs condition if it’s matched or use default_backend for remaining case

We can set configuration in Frontend to be suitable with various kind of network traffic from Layer 4 to Layer 7 of OSI model.

Backend

Backend contains list of server for forwarded requests. Fundamentally, Backend defined by:

  • Load balancing algorithm

  • List of servers and ports

For example, here is a sample of backend configuration:

backend web-backend
   balance roundrobin
   server web-1 web1domain.com:80 check
   server web-2 web2domain.com:80 check

About algorithms, there’re roughly different types:

  • Round Robin: The most common one, the servers will resolve forwarded request follow turns and server list will be ordered based on their weights, suitable for HTTP

  • Least Connections: The server with lowest connection number will be chosen. Recommend for request of long session such as LDAP, SQL, TSE...

  • Source: the request will be served based on their original IP address.

  • URI/URL: similar to source but input params are URI or URL partern

Laboratory

Experiencing infrastructure’s never easy for everyone, especially with high performance devices like servers. This part I will introduce how to create a small lab using HAProxy using VirtualBox on OSX Elcapital

Setup environment

To implement HAProxy system, I need to have at least 3 servers: 2 HTTP servers and 1 HAProxy with connections like this HAProxy.jpg

All of servers in here are running Ubuntu 14.04 server and the IP address should be statically config like above diagram. For someone doesn’t familiar with Virtualbox on OSX, I recommend to install at least 2 network interfaces/each servers:

  • An interface running NAT: to connect the Internet, for installing packages

  • An interface connected to internal or host-only network: connect to 192.168.56.0/24 network, just for private of HAProxy. For more detail about setup internal network of Virtualbox, let refer this link

Web servers

Assign IP address

auto eth0
iface eth0 inet dhcp

# The host only network interface

auto eth2
iface eth2 inet static
address 192.168.56.102

# For server 2: 192.168.56.103

netmask 255.255.255.0

Install Apache2

sudo apt-get install apache2

To distinguish two web server, let do some modification in /var/www/html/index.html

  <body>
    <div class="main_page">
      <div class="page_header floating_element">
        <img src="/icons/ubuntu-logo.png" alt="Ubuntu Logo" class="floating_element"/>
        <span class="floating_element">
          Web server 1(2)
        </span>
      </div>
      <div class="content_section floating_element">
      </div>
    </div>
  </body>

This html code is content for web page when accessing HTTP service of each servers and we need to mark the pages from each of them.

HAProxy server

Install HAProxy’s so simple with

apt-get install haproxy

To enable service, we need to set a flag in /etc/default/haproxy and set ENABLED to 1

Configuration

The crucial part of HAProxy places in /etc/haproxy/haproxy.cfg where define behaviors of HAProxy. This is the main configuration of HAProxy

global
  log     127.0.0.1 local2
  chroot /var/lib/haproxy
  stats socket /run/haproxy/admin.sock mode 660 level admin
  stats timeout 30s
  user haproxy
  group haproxy
  daemon

# Default SSL material locations

  ca-base /etc/ssl/certs
  crt-base /etc/ssl/private

# Default ciphers to use on SSL-enabled listening sockets

# For more information, see ciphers(1SSL)

  ssl-default-bind-ciphers kEECDH+aRSA+AES:kRSA+AES:+AES256:RC4-SHA:!kEDH:!LOW:!EXP:!MD5:!aNULL:!eNULL

defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option http-server-close
    option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 3
    timeout http-request    20
    timeout queue           86400
    timeout connect         86400
    timeout client          86400
    timeout server          86400
    timeout http-keep-alive 30
    timeout check           20
    maxconn                 50000

frontend LB
   bind 192.168.56.101:80
   reqadd X-Forwarded-Proto:\ http
   default_backend LB

backend LB 192.168.56.101:80
   mode http
   stats enable
   stats hide-version
   stats uri /stats
   stats realm Haproxy\ Statistics
   stats auth haproxy:admin    # Credentials for HAProxy Statistic report page.
   balance roundrobin     # Load balancing will work in round-robin process.
   option httpchk
   option httpclose
   option forwardfor
   cookie LB insert
   server web1-srv 192.168.56.102:80 cookie web1-srv check   # backend server.
   server web2-srv 192.168.56.103:80 cookie web2-srv check   # backend server.

The global and defaults part contains some basic configuration for incoming traffic and request. We can easily detect some rule define for fronted and backend that we mentioned above. However, there’s another tool of HAProxy to monitor and tracking the system is statistics. The stats ’s enabled and the portal to access through a web page with credentials haproxy/admin. We can see detail information of request comming and forwarding to the backend

Screen Shot 2015-12-28 at 9.32.49 AM.png

Now everything’s done, let try by access the frontend IP address 192.168.56.101 by different browsers and we can see the web pages comming from different servers

Safari Screen Shot 2015-12-28 at 9.35.03 AM.png

Firefox Screen Shot 2015-12-28 at 9.35.17 AM.png

The bottom line

It’s very hard to compare HAProxy with others methods because it depends on abilities of administrators as well as the strength of system. However, HAProxy has soften the challenges for us when scaling the large application and now, we can forget all the nightmares about server configuration when adding/removing new one.