HAProxy Load Balancer. Part 1: basic terms and levels

It is the first article from the cycle about HAProxy. Here I'm going to explain basic terms and existing layers of load balancing.

Summary

So if the load has been increased you have two ways: add resources on a server or add more servers and join them to the cluster. Both ways could help to solve the problem, but the first one has some troubles with high availability because there's lack of usefulness from a powerful server when it's down. With a cluster your life is more comfortable in this case: if one of the nodes breaks down — others take the load.

With joining a group of servers in a cluster, we come to a need of balancing the load between them. For this purpose, we need a load balancer. In D2C we use HAProxy. So this is why I'm going to show you how to configure it in according to different tasks. With examples of course.

Artem Zaytsev

Evil marketer

What is HAProxy and why do you need it

HAProxy — it is a load balancer, which can distribute and proxy requests by TCP and HTTP protocols. It has a lot of advantages:

  1. It's free.
  2. It's lightweight and fast, it can handle 20 thousand and more requests in a second.
  3. It's good for HTTP-traffic.
  4. It is easy to add it into an already working project.
  5. It has an easy configuration so you can tune it for your purposes very fast.

Also, Airbnb, Alibaba, Github, Instagram, Vimeo and more than 40 famous high-load projects use HAProxy.

Basic terms

Configuring of HAProxy is usually concentrated around five main sections: global, defaults, frontend, backend and listen.

Global section defines a general configuration

Defaults define defaults for other sections.

Listen section combines a description for the frontend and backend and contains the full proxy list. It is useful for TCP traffic.

Frontend section determines in what way redirect requests to a backend depending on the type of the request from the client.

Backend section contains a list of hosts and defines a balancing algorithm.

I won't be touching the global, listen and defaults sections and will give you the standard configuration from D2C which you may use:

global
log 127.0.0.1 local0
log 127.0.0.1 local1 notice
maxconn 4096
tune.ssl.default-dh-param 2048

defaults
log global
mode http
option httplog
option dontlognull
option http-server-close
option http-pretend-keepalive
option forwardfor
option originalto
retries 3
option redispatch
maxconn 2000
timeout connect 5000
timeout client 50000
timeout server 50000
default-server init-addr last,libc,none

But I'll tell much more about Frontend and Backend sections. They will help us to configure the balancer in according to our tasks.

Frontend section and the layers

Frontend defines on which layer should the load to be balanced and which backend to choose, according to Access Control Lists (ACL).

The transport layer or the layer 4

The transport layer is the simplest. This fetching method allows HAProxy to choose a backend depending on the port of the incoming connection. For example, if the request was from http://yoursite.com it would be processed by the responsible for the port 80 backend.

The configuration example of the frontend for the Layer 4:

frontend haproxy
  mode tcp
  bind *:80
  default_backend http-backend
  • mode.The fetching method. In this case, we use the Layer 4 this is why we define TCP protocol. If you specify http, HAProxy will work on the application Layer (Layer 7).
  • bind. The IP Address and port that HAProxy should listen
  • default_backend. The name of the group of servers which should process the request. In this case, requests will be processed by the backend with a name "http-backend".

On the backend, we should define "mode tcp" too. In this case, backend configuration could be the next:

backend web_server_tcp
  mode tcp
  balance roundrobin
    server srv1 luna-1:80 check
    server srv2 luna-2:80 check

In HAProxy in the same configuration file, you can use several fetching types at the same time. For example for the port 80 use HTTP mode and application layer, but for the port 443 use TCP mode and transport layer.

This functionality is useful with SSL. You can forward requests from 80 port to 443:

frontend http_frontend *:80
  mode http
  redirect scheme https code 301 if !{ ssl_fc }

then route requests on the transport layer:

frontend https_frontend_ssl_pass
  mode tcp
  bind *:443
  default_backend https-backend

and then the servers on the backend will deal with SSL, not HAProxy 🙂

This scheme shows that HAProxy forwards requests on the application layer, but the termination occurs on one of the backend servers.

The application layer or the layer 7

The application layer is more complicated. However, it will allow us to run multiple groups of application servers on the same domain and to distribute the requests between them based on HTTP contents.

Load balancing at the application level is useful when the website consists of different web applications, for example.

I'll show an example. Let's say, we use WordPress for the blog, self-made PHP CMS on the website and the panel requires Node-js for the frontend. And all of this served by different teams of developers working in several git repositories.

In this scenario, you'd want to separate different parts physically. And Access Control Lists will help to us. To solve the problem, we should separate the traffic by the path on the website:

  • if the request will get the main website http://yoursite.com, backend-1 with a several NGINX servers will process it;
  • if the blog with a path /blog will get the request it will be forwarded to the backend-2 with several Varnish servers which caching our blog; 
  • if we will get responses from the panel with URL /cabinet they will be forwarded to the backend-3 with several Node-js servers.

Fetching on the layer 7 example

This is how the scheme looks at the configuration:

frontend http
 bind *:80
 mode http

acl url_blog path_beg /blog
 use_backend backend-2 if url_blog

acl url_cabinet path_beg /cabinet
 use_backend backend-3 if url_cabinet 

default_backend backend-1
  • mode http is needed for working on the layer 7 by HTTP protocol;
  • acl instruction url_blog path_beg /blog defines the rule with a name url_blog, which will be triggered when referring to /blog URL;
  • instruction use_backend backend-2 if url_blog defines the backend which has to process the request if the rule called "url_blog" is triggered;
  • default_backend backend-1 defines the default backend if no other rule has bee triggered.

Also, we should define mode http in the backend section: 

backend web_server_tcp
  mode http
  balance roundrobin
    server srv1 luna-1:80 check
    server srv2 luna-2:80 check

That's all. In the next article from the cycle, I'll tell about the Backend section and different balancing algorithms.

Shortly

  • HAProxy it is a free fast TCP/HTTP load balancer
  • Configuring is usually concentrated around five sections: global, defaults, frontend, backend and listen. In this case, the first 2 sections define the general parameters by default, and the main configuring takes place in the last three sections.

  • HAProxy could work on a transport layer (Layer 4) and application layer (Layer 7).
  • To balance the load at the transport level it is necessary to prescribe "mode tcp", at the application level — "mode http"
  • In HAProxy in the same configuration file is permissible to use several fetching types
  • For flexible configuration, it is necessary to define ACL's rules in the "Frontend" section.

Liked the article?