Speed up Ansible

Under the hood of d2c.io service we use Ansible a lot: from cloud VM creation and provisioning to Docker containers and user apps orchestration.

Ansible is a convenient tool that doesn’t require complex setup because of it agentless nature. You don’t need to preinstall any software (agents) on managed hosts. In most cases, you would use ssh connection to configure servers. One of the cons to this simplicity is speed. Depending on your environment and playbook workflow Ansible can operate with alarmingly slow speed: it does all the logic locally, generate task “package”, send it to remote host, execute, wait for result, read result, analyze it and moves to the next task. In this article we describe severals ways to increase that speed.

Test methodology

If you can’t measure it, you can’t improve it. So we are going to write a small script file for counting execution time.

Test playbook test.yml:
- hosts: all
# gather_facts: no
- name: Create directory
path: /tmp/ansible_speed
state: directory
- name: Create file
content: SPEED
dest: /tmp/ansible_speed/speed
- name: Remove directory
path: /tmp/ansible_speed
state: absent
Time measurement script time_test.sh:
# calculate the mean average of wall clock time from multiple /usr/bin/time results.
# credits to https://stackoverflow.com/a/8216082/2795592

cat /dev/null > time.log

for i in `seq 1 10`; do
echo "Iteration $i: $@"
/usr/bin/time -p -a -o time.log $@
rm -rf /home/ubuntu/.ansible/cp/*


if [ ${#file} -lt 1 ]; then
echo "you must specify a file containing output of /usr/bin/time results"
exit 1
elif [ ${#file} -gt 1 ]; then
samples=(`grep --color=never real ${file} | awk '{print $2}' | cut -dm -f2 | cut -ds -f1`)

for sample in `grep --color=never real ${file} | awk '{print $2}' | cut -dm -f2 | cut -ds -f1`; do
cnt=$(echo ${cnt}+${sample} | bc -l)

# Calculate the 'Mean' average (sum / samples).
mean_avg=$(echo ${cnt}/${#samples[@]} | bc -l)
mean_avg=$(echo ${mean_avg} | cut -b1-6)

printf "\tSamples:\t%s \n\tMean Avg:\t%s\n\n" ${#samples[@]} ${mean_avg}

grep --color=never real ${file}
So we execute our playbook 10 times and take mean execution time.

Константин Суворов

Ansible ninja

SSH multiplexing

LAN connection: before 7.68s, after 2.38s
WAN connection: before 26.64s, after 10.85s

The first thing to check – if SSH multiplexing is enabled and used. This gives tremendous speed boost because Ansible can reuse opened SSH sessions instead of negotiating new one (actually more than one) for every task. Ansible has this setting turned on by default. It can be set in configuration file as follows:
ssh_args = -o ControlMaster=auto -o ControlPersist=60s
But be careful to override ssh_args – if you don’t set ControlMaster and ControlPersist while overriding, Ansible will “forget” to use them. To check whether SSH multiplexing is used, start Ansible with -vvvv option:

ansible test -vvvv -m ping
You should see required settings in the output:

SSH: EXEC ssh -vvv -C -o ControlMaster=auto -o ControlPersist=60s ... -o ControlPath=/home/ubuntu/.ansible/cp/7c223265ce
That follows by setting up multiplex master socket:

Trying existing master
Control socket "/home/ubuntu/.ansible/cp/7c223265ce" does not exist
setting up multiplex master socket
Also you can check for socket files to be presetn in ControlPath for 60 seconds after connection (in our example: /home/ubuntu/.ansible/cp/7c223265ce).

Warning: if you work with several identical environments from one Ansible control host (for example, blue/green or stage/prod), be careful not to shoot in the foot. For example, you have checked something on production servers (e.g. executed configuration steps in check-mode). Now you have opened master sockets that points to production servers. Then you decide to update your staging environment (that has same host names as production), and boom… your production is blown up. To prevent this, always close/clean master sessions when switching environment or set unique ControlPath setting for each environment.


LAN: before 2.38s, after 1.96s
WAN: before 10.85s, after 5.23s

Here is the default workflow of module execution:

  • Generate Python-file with module and its parameters for remote execution
  • Connect via SSH to detect remote user home directory
  • Connect via SSH to create temporary work directory
  • Connect via SSH to upload Python-file via SFTP
  • Connect via SSH to execute Python-file and cleanup temp dir
  • Get module’s result from SSH standard output

Now multiply this to a number of tasks and loop iterations in your playbook to imagine overhead. This is 100%-working way to execute different type of modules on a variety of target systems. However, if you use only Ansible native modules and modern target boxes, you can enable pipelining mode. Here’s the parameter:

pipelining = true

Here is the workflow with pipelining mode enabled:

  • Generate Python-file with module and its parameters for remote execution
  • Connect via SSH to execute Python interpreter
  • Send Python-file content to interpreter’s standard input
  • Get module’s result from standard output

As a result: one SSH connection instead of four! Speed boost is significant, especially over WAN connections.

To check whether pipelining is in use call ansible with verbose output, for example:

ansible test -vvv -m ping

If you see several ssh calls:

SSH: EXEC ssh ...
SSH: EXEC ssh ...
SSH: EXEC sftp ...
SSH: EXEC ssh ... python ... ping.py

Then pipelining is NOT in use. If there is single ssh call:

SSH: EXEC ssh ... python && sleep 0

Then pipelining is working.

By default, this settings is turned off in Ansible because of possible conflict with requiretty setting for sudo. At the time of writing this article requiretty is disabled on recent Ubuntu and RHEL images in Amazon EC2 cloud, so you can safely enable pipelining on this distributions.

PreferredAuthentications и UseDNS

LAN: before 1.96s, after 1.92s
WAN: before 5.23s, after 4.92s


It is an SSH-server setting (/etc/ssh/sshd_config file) which forces server to check client’s PTR-record upon connection. It may cause connection delays especially with slow DNS servers on the server side. In modern Linux distribution, this setting is turned off by default, which is correct.


It is an SSH-client setting which informs server about preferred authentication methods. By default Ansible uses:

-o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey

So if GSSAPIAuthentication is enabled on the server (at the time of writing this it is turned on in RHEL EC2 AMI) it will be tried as the first option forcing client and server to make PTR-record lookups. But in most cases, we want to use only public key auth. We can force Ansible to do so by changing ansible.cfg:

ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o PreferredAuthentications=publickey

This eliminates unnecessary steps and speeds up initial ssh master connection.

Facts gathering

LAN: before 1.96s, after 1.47s
WAN: before 4.92s, after 4.77s

At the start of playbook execution, Ansible collects facts about remote system (this is default behaviour for ansible-playbook but not relevant to ansible ad-hoc commands). It is similar to calling setup module thus requires another ssh communication step. If you don’t need any facts in your playbook (e.g. our test playbook) you can disable fact gathering:

gather_facts: no

If you often run playbooks that depend on facts, but fact gathering slows your runs, consider setting up external fact-caching backend. For example, you define Redis backend, collect facts by hourly cron job and disable fact gathering in your playbooks in favor of cached facts.


Before 4.77s, after 1.47s

Your WAN connection can have good bandwidth and latency. But LAN connection is better. If you manage multitude of hosts let’s say in Amazon EC2 eu-west-1 region, you can expect significant speed boost if your Ansible control machine is also in that region. Rule of thumb is to move control host closer to managed systems.


Before 1.47s, after 1.25s

Need even more speed? Execute playbooks locally on remote servers. There is ansible-pull tool for that. You can read about it in official Ansible docs. It works as follows:

  • Clone specified repo into local subdirectory
  • Executes specified playbook with local connection (-c local option)
  • If playbook name is omitted, tries to execute:
    • .yml
    • .yml
    • local.yml

One of workflows is to execute ansible-pull --only-if-changed as cron job: it will monitor target repository and if there is a change, execute playbook.


Until this moment we discussed how to speed up playbook execution on a given remote host. But if you run playbook against tens or hundreds of hosts, Ansible internal performance becomes a bottleneck. For example, there’s preconfigured number of forks – number of hosts that can be interacted simultaneously. You can change this value in ansible.cfg file:

forks = 20

The default value is 5, which is quite conservative. You can experiment with this setting depending on your local CPU and network bandwidth resources.

Another thing about forks is that if you have a lot of servers to work with and low number of available forks, your master ssh-sessions may expire between tasks. Ansible uses linear strategy by default, which executes one task for every hosts and then proceeds to the next task. This way if time between task execution on the first server and on the last one is greater than ControlPersist, then master socket will expire by the time Ansible starts execution of the following task on the first server, thus new ssh connection will be required.

Poll Interval

When module is executed on remote host, Ansible starts to poll for its result. The lower is interval between poll attempts, the higher is CPU load on Ansible control host. But we want to have CPU available for greater forks number (see above). You can tweak poll interval in ansible.cfg:

internal_poll_interval = 0.001

If you run “slow” jobs (like backups) on multiple hosts, you may want to increase interval to 0.05 to use less CPU.

PS: Hope this helps you to speed up your setup. Seems like there are no more items in environment check-list and further speed gains only possible by optimizing your playbook code.

Liked the article?