Last week I took part in an experiment testing the scalability of Docker Swarm.

The goal was to hit 2000 nodes and run containers on them. In the end it was somewhere around 2600 nodes and ~97k containers.

This was very interesting for me since 2000 "hosts" and 10's of thousands of containers is what you need to expect in any real enterprise if they start putting workloads in docker. I'm always curious about how lifecycle and all other maintenance aspects will look in a scenario of this size - or larger, since things tend to grow :-)

The experiment started with a warmup phase and then they just kept adding more nodes and launching more containers. Many people hit various issues at their providers when starting this many systems. I.e. API call throttles, account quotas and even some bugs.

I started preparing one day before the official start and joined with 15 virtual machines which were hosted at Scaleway.

I'd have loved to use my own OpenNebula-based hardware and add another 1000 virtual machines (there's even a docker-machine driver), but the experiment still required IPv4 connectivity for each node. And I sure as hell don't have 1000 spare IPv4.

The good thing was I already had a script to create systems at Scaleway using the API client, and a config management node there, based on Rudder.

So, workflow was like this:

  • Use script to create all the VMs 
  • Using the script, configure them to ring up the configuration management node.
  • Allow them to access configuration management
  • let them automatically be configured
  • make sure the automatic policy is doing a good job, i.e. apply updates overnight etc.
  • let them automatically join the largest swarm made

 

It worked well, and below is how.

 

Creating the nodes at Scaleway

 

This is the script I used to create the nodes at Scaleway:

I could have made an image but I'm just not there yet. An image would have performed a lot faster for the creation (time as is was about 30 minutes for 15 nodes, an image can be deployed in parallel)

But it's a script - the good side of this is it's easily modified and can be run in steps till everything works. For the future I imagine this script will be the starting point and once things "look good" I'll re-do the steps in image-builder.

 

The script iterates over a number of systems and creates them via the scaleway API.

The server type is the smallest possible, and the OS image used is Debian Jessie because I had already created autopatch configs etc. for Debian.

One important thing it does is that it uses the Scaleway UUID for Rudder. It also adds the GPG keys for Rudder and Docker.

In the future I could add the one for Docker from within Rudder but I was too tired to extract the GPG key.

Either way I felt it's important to not make a big mess of the system security right at the start.

 

#!/usr/bin/env ksh

servers_running=$(scw ps -a)
datadef="tswarm VC1S"

name_server()
{
  _sname=${_app}-${idx}
  export _sname
}

test_server()
{
  echo "$servers_running" | grep -q $1
  return $? 
}


create_server()
{
  # create an empty, not-running server
  ID=$(scw create --commercial-type=$_size --name=$1 Debian_Jessie)
  export ID
}

start_server()
{
  # scw _patch is used set the bootscript that locks a docker-ready kernel 
  # with AUFS support
  # it's id can be obtained if you first manually switch the kernel of one system
  # and then run "scw inspect $ID"
  scw _patch $ID bootscript="aa9f03c9-5d0e-42bb-82b1-0a73e29501a0"
  scw start $ID
  return $?
}

exec_server()
{
  # start the server and run base config on it
  # modify RUDDER_MASTER_IP to point at your rudder master :-)
  scw exec --wait $1 "echo server is now up"
  scw exec $1 mkdir -p /opt/rudder/etc /var/rudder/cfengine-community
  scw exec $1 "echo $ID > /opt/rudder/etc/uuid.hive"
  scw exec $1 "echo RUDDER_MASTER_IP > /var/rudder/cfengine-community/policy_server.dat"
  scw exec $1 apt-get update
  scw exec $1 apt-get -o Dpkg::Options::="--force-confold" --force-yes -y install apt-transport-https ca-certificates
  scw exec $1 'apt-key adv --recv-keys --keyserver keyserver.ubuntu.com 474A19E8'
  scw exec $1 'apt-key adv --keyserver hkp://p80.pool.sks-keyservers.net:80 --recv-keys 58118E89F3A912897C070ADBF76221572C52609D'
  scw exec $1 'echo "deb http://www.rudder-project.org/apt-3.2/ jessie main" > /etc/apt/sources.list.d/rudder.list'
  scw exec $1 apt-get update
  scw exec $1 apt-get -o Dpkg::Options::="--force-confold" --force-yes -y install rudder-agent xfsprogs
  return $?
}

idx=1
echo "$datadef" | \
while read _app _size ; do
  while [ $idx -lt 15 ]; do
    idx=$(( $idx + 1 ))
    echo $_app $_size
    name_server
    test_server $_sname
    if [ $? != 0 ]; then
      echo "$_sname will be created"
      create_server  $_sname &&
      start_server   $_sname &&
      exec_server    $_sname
    fi
  done
done

 

 

The configuration in Rudder

 

Rules

Global Configuration for All Nodes

First, there's a rule named Global Configuration for All Nodes which handles all the other stuff you want to have by default.

In my case for non-personalized lab systems, that's some add-on utilities, sudo rules and the combo of etckeeper+unattended-upgrades to automatically protect the system.

 

app-DOCKERDOCKERDOCKER

This shows the status of my Rudder rule app-DOCKERDOCKERDOCKER

 

This rule links a number of policy directives with a group of nodes that defines all nodes that are meant to be part of the experiment.

 

The interesting part here is the *group* which is dynamically generated from all nodes that have a name starting with "tswarm-.*"

Note dynamically means I do not have to do anything else than just "accept" a node on the rudder master.

The policies then just kick in because that's their whole point.

(In a prod environment the "accept" part is also automated so you do exactly nothing)

 

 

 

 

Here you can see the components of this rule on both axis (directives, aka settings, and nodes):

 

Settings in detail

This is all the basics, here's what things are actually configured to do:

 

Docker setup

First part does all the installation, this is as easy as it gets.

But, a bad thing happened:

The install did not work! Reason was the default kernel at scaleway didn't work for docker.

So I changed the kernel at scaleway and temporarly put a cronjob directive in place that would reboot the systems at night.

I woke up to systems with running docker daemons, so point proven.

Later I found the bootscript modification already added to the example script above.

Now one more thing about dynamic groups. Imagine I had some servers with the new bootscript and a few with the old ones?

One way to bring them over would be to update it everywhere and add a "node property" (a key-value setting) that reboots the systems if the bootscript is already switched, but their current kernel does not match!

 

Swarm join

The next and final is a double-step:

Track your configured swarm herder in a file in /etc - and if that file changes, leave and then join the swarm.

 

Just to be clear, the file is automatically filled and the other two "command" sections are following suit.

I didn't configure the master / pw using a variable since I was really eager to be done at some point :-)

That's all there was!

 

Time

 

What I spent time onTime spentFun FactorDid it work
Using existing policies for base setupClose to nothing, I didn't even check, just worked from keys to updatesnoneyes
Modifying Scaleway scripts to setup for dockerVery littleGreat!yes

Trying to fix scaleway issues and configure bootscript

(kernel with docker AUFS support)

About 80% of time, multiple hours and no successesSUCKED

partially

Maintenance of Rules (turn off old ones)Quite a bit since there's no multiselectSUCKEDyes
Maintenance of failed deployments

Quite a bit but I was stupid - I forgot I keyed by UUID -

I could've had a full workflow for this.

reluctanceyes
Creating Policy for docker + swarmVery very littleGreat!yes
Find out how to automatically set bootscriptVery little (once I was told how)Great!yes