Scalable WordPress with NGINX and smart Varnish caching

Telling about how to construct a stack to get fast and scalable WordPress using Varnish reverse cache, NGINX and load balancer. 

Summary

WordPress architecture made flexible for a huge dev community. This is why so easy to extend it but so bad in a case of performance. To render a page WordPress has to go through thousands of code lines making multiple SQL queries. 

Usually to deploy WordPress use next configuration: Apache+PHP+MySQL+any caching plugin inside the CMS. It is popular configuration, but can’t be considered the fastest. It is suitable for almost pure WordPress with a small traffic. For the loaded website, such the stack will not good at work.

The good news is that WordPress mostly used to produce static pages. In that case, it is no sense to waste resources on a re-rendering page for every client’s query. This is why I offer using server cache and for queries which won’t be cached, using load balancing. 

For reverse caching, I took Varnish 4.0. It's fast, flexible and perfectly solve such tasks.

Architecture

I took a NGINX server for the frontend, Varnish for caching and a stack of several containers with NGINX+php-fpm for load balancing. I supposed to get such view of an infrastructure: 

Artem Zaytsev

Evil marketer

The WordPress+Varnish+Cluster stack design

How the stack works

Nginx on front-end takes requests from users and proxying them to Varnish. Then Varnish checks cache: if it has a cached page for the response it sends the page back if has not, Varnish proxying the request to load balancer Haproxy, which distributes the load between several nodes.

In the nodes deployed NGINX in FastCGI mode with cluster configuration which allows NGINX sending responses to nearest PHP. As a base is used MariaDB.

To construct such stack inside the D2C you need:

Services

  • Nginx
  • Custom Docker service for deploying Varnish
  • Haproxy
  • NginxCluster
  • Php-FPM
  • MariaDB

On WordPress side

  • Varnish HTTP Purge plugin to clean cache when posts are added.
  • Fake press plugin. I wanted to test the stack with lots of data, not empty WordPress. This plugin helped me to generate several hundreds of posts.

Let's get started with creating services

1. Create hosts. For this example, I created 3 demo hosts on AWS. You may use your own.

2. Choose an SQL database. I suggest using MariaDB service. In this case, Maria DB was without Master/Slave for a purity of experiment and with default configs. So I took one single empty host for a database.

Don't forget to add a root password and a new database for your project.

3. Create PHP-FPM service. On this stage, you will need a WordPress app. You can deploy it in 3 different ways: from Git, using an HTTP/FTP link, or from your local .zip/.tar archive. I used official WordPress git repository

4. Install NginxCluster. For this stack, you need Nginx Cluster in FastCGI mode. It's pre-configured for working with PHP-FPM, so just create it by default.

5. Add load balancer for NGINX instances. In D2C it will be Haproxy with a simple round robin algorithm.

6. Get Varnish cache from Docker image. For now, there's no pre-configured Varnish service inside the D2C so you need to deploy it from Docker image. In D2C you can do the task using  "Docker" service.

Specify Docker Image as "debian", version "jessie" and fill in install commands:

apt-get install wget
wget -qO- https://packagecloud.io/install/repositories/varnishcache/varnish51/script.deb.sh | bash
apt-get install varnish 

 

Then fill in "Start Command":

varnishd -j unix,user=vcache -F -f /etc/varnish/default.vcl -s malloc,100m -a 0.0.0.0:80

This command starts Varnish from user "vcache" with allocation 100MB RAM and specifying a path to the config file and listening interface.

Then create custom config with this path "/etc/varnish/default.vcl". Fill it in with the following:

vcl 4.0;
backend default { 
    .host = "web alias";  
    .port = "80";
}
 

"Web alias" it is an alias of a web server. In our stack, Varnish should proxying queries to the Load balancer. So specify a Haproxy alias.

It's a basic Varnish config. After creating all needed services we will edit it for our needs.

6. Create simple Nginx for a frontend. It will be serving SSL. In this service just add service config and choose Varnish. Then generate https configuration for SSL.

After creating all services you should see such a picture in the panel: 

Now it's time to tune the infrastructure.

Let's tune Varnish

Configuration, which we did above doesn't match our expectations. And now it's time to tune Varnish for our needs. Let's see, what can we do for better caching.

By the first, you should set hosts from which Varnish will be allowed to execute queries for cleaning. In Varnish we should add this: 

acl purge {
"host-alias1";
"host2-alias2";
"host3-alias3";
...
}

In hosts, you should fill in aliases of nearest to Varnish servers. In this stack, I specified NGINX and Haproxy names.

After that, we should tune our caching algorithm:

# Receiving a request from the client
sub vcl_recv {
# Allow cache to be cleaned by written above range of hosts
if (req.method == "PURGE") {
# If the request is not from the list
if (!client.ip ~ purge) {
return(synth(405, "This IP is not allowed to send PURGE requests."));
}
return (purge);
}

# Skip POST requests and requests from a page with Basic authorization.
if (req.http.Authorization || req.method == "POST") {
return (pass);
}

# Skip admin panel
if (req.url ~ "wp-(login|admin)" || req.url ~ "preview=true") {
return (pass);
}

# Skip sitemap and robots

if (req.url ~ "sitemap" || req.url ~ "robots") {
return (pass);
}

# Deleting cookies with "has_js" and "__*", added CloudFlare and Google Analytics, because Varnish does not cache requests that have cookies.
set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(_[_a-z]+|has_js)=[^;]*", "");

# Remove prefix ";" from cookies
set req.http.Cookie = regsub(req.http.Cookie, "^;\s*", "");

# Remove Quant Capital cookies (added some plugins)
set req.http.Cookie = regsuball(req.http.Cookie, "__qc.=[^;]+(; )?", "");

# Remove wp-settings-1 cookie
set req.http.Cookie = regsuball(req.http.Cookie, "wp-settings-1=[^;]+(; )?", "");

# Remove wp-settings-time-1 cookie
set req.http.Cookie = regsuball(req.http.Cookie, "wp-settings-time-1=[^;]+(; )?", "");

# Deleting wp test cookie
set req.http.Cookie = regsuball(req.http.Cookie, "wordpress_test_cookie=[^;]+(; )?", "");

# Develeting cookie, consisting only of white space (or generally empty)
if (req.http.cookie ~ "^ *$") {
unset req.http.cookie;
}

# For static documents, deleting all the cookies, let them cached
if (req.url ~ "\.(css|js|png|gif|jp(e)?g|swf|ico|woff|svg|htm|html)") {
unset req.http.cookie;
}

# If set to cookie "wordpress_" or "comment_" skip directly to the backend
if (req.http.Cookie ~ "wordpress_" || req.http.Cookie ~ "comment_") {
return (pass);
}

# If cookies is not found, delete this parameter from the incoming request
if (!req.http.cookie) {
unset req.http.cookie;
}

# Do not cache requests with cookies set, this is not related to WordPress
if (req.http.Authorization || req.http.Cookie) {

# Not cacheable by default
return (pass);
}

# Cache everything else
return (hash);
}

sub vcl_pass {
return (fetch);
}

sub vcl_hash {
hash_data(req.url);

return (lookup);
}

# Receiving a response from the backend
sub vcl_backend_response {
# Удаляем ненужные заголовки
unset beresp.http.Server;
unset beresp.http.X-Powered-By;

# Do not store in the cache robots and sitemap
if (bereq.url ~ "sitemap" || bereq.url ~ "robots") {
set beresp.uncacheable = true;
set beresp.ttl = 30s;
return (deliver);
}

# For static files from the backend.
if (bereq.url ~ "\.(css|js|png|gif|jp(e?)g)|swf|ico|woff|svg|htm|html") {
# Delete cookies
unset beresp.http.cookie;
# Set the retention period in the cache - a week
set beresp.ttl = 7d;
# Set headers Cache-Control and Expires, informing the browser that the file is saved in the client cache and do not load excess times our server
set beresp.http.Cache-Control = "public, max-age=604800";
set beresp.http.Expires = now + beresp.ttl;
}

# Do not cache the admin and login page
if (bereq.url ~ "wp-(login|admin)" || bereq.url ~ "preview=true") {
set beresp.uncacheable = true;
set beresp.ttl = 30s;
return (deliver);
}

# Allow to set a cookie only when accessing these routes, everything else will be cut
if (!(bereq.url ~ "(wp-login|wp-admin|preview=true)")) {
unset beresp.http.set-cookie;
}

# Do not cache the result of response to a POST request, or Basic authorization
if ( bereq.method == "POST" || bereq.http.Authorization ) {
set beresp.uncacheable = true;
set beresp.ttl = 120s;
return (deliver);
}

# Do not cache the search results
if ( bereq.url ~ "\?s=" ){
set beresp.uncacheable = true;
set beresp.ttl = 120s;
return (deliver);
}

# Do not cache error pages, just the right stuff in the cache!
if ( beresp.status != 200 ) {
set beresp.uncacheable = true;
set beresp.ttl = 120s;
return (deliver);
}


# Cache everything else for one day
set beresp.ttl = 1d;
# The lifetime of the cache after TTL expires
set beresp.grace = 30s;

return (deliver);
}

# Actions before the return the result to the user
sub vcl_deliver {
# Deleting some headers
unset resp.http.X-Powered-By;
unset resp.http.Server;
unset resp.http.Via;
unset resp.http.X-Varnish;

return (deliver);
}

Now all almost is finished. Remained to install Varnish HTTP Purge plugin to clean cache when posts are added. And our stack is completed.

Benchmark tests

Finally a few Apache benchmark tests ours 2 t2.micro Amazon EC2 instances. I set up concurrency level 100 with 1000 request. 

Without caching

With Varnish

As you can see, Varnish is added to us, to put it mildly, much performance 🙂

Usefull links