Last week I took part in an experiment testing the scalability of Docker Swarm.
The goal was to hit 2000 nodes and run containers on them. In the end it was somewhere around 2600 nodes and ~97k containers.
This was very interesting for me since 2000 "hosts" and 10's of thousands of containers is what you need to expect in any real enterprise if they start putting workloads in docker. I'm always curious about how lifecycle and all other maintenance aspects will look in a scenario of this size - or larger, since things tend to grow :-)
The experiment started with a warmup phase and then they just kept adding more nodes and launching more containers. Many people hit various issues at their providers when starting this many systems. I.e. API call throttles, account quotas and even some bugs.
I started preparing one day before the official start and joined with 15 virtual machines which were hosted at Scaleway.
I'd have loved to use my own OpenNebula-based hardware and add another 1000 virtual machines (there's even a docker-machine driver), but the experiment still required IPv4 connectivity for each node. And I sure as hell don't have 1000 spare IPv4.
The good thing was I already had a script to create systems at Scaleway using the API client, and a config management node there, based on Rudder.
So, workflow was like this:
- Use script to create all the VMs
- Using the script, configure them to ring up the configuration management node.
- Allow them to access configuration management
- let them automatically be configured
- make sure the automatic policy is doing a good job, i.e. apply updates overnight etc.
- let them automatically join the largest swarm made
It worked well, and below is how.
Creating the nodes at Scaleway
This is the script I used to create the nodes at Scaleway:
I could have made an image but I'm just not there yet. An image would have performed a lot faster for the creation (time as is was about 30 minutes for 15 nodes, an image can be deployed in parallel)
But it's a script - the good side of this is it's easily modified and can be run in steps till everything works. For the future I imagine this script will be the starting point and once things "look good" I'll re-do the steps in image-builder.
The script iterates over a number of systems and creates them via the scaleway API.
The server type is the smallest possible, and the OS image used is Debian Jessie because I had already created autopatch configs etc. for Debian.
One important thing it does is that it uses the Scaleway UUID for Rudder. It also adds the GPG keys for Rudder and Docker.
In the future I could add the one for Docker from within Rudder but I was too tired to extract the GPG key.
Either way I felt it's important to not make a big mess of the system security right at the start.
The configuration in Rudder
Global Configuration for All Nodes
First, there's a rule named Global Configuration for All Nodes which handles all the other stuff you want to have by default.
In my case for non-personalized lab systems, that's some add-on utilities, sudo rules and the combo of etckeeper+unattended-upgrades to automatically protect the system.
This shows the status of my Rudder rule app-DOCKERDOCKERDOCKER
This rule links a number of policy directives with a group of nodes that defines all nodes that are meant to be part of the experiment.
The interesting part here is the *group* which is dynamically generated from all nodes that have a name starting with "tswarm-.*"
Note dynamically means I do not have to do anything else than just "accept" a node on the rudder master.
The policies then just kick in because that's their whole point.
(In a prod environment the "accept" part is also automated so you do exactly nothing)
Here you can see the components of this rule on both axis (directives, aka settings, and nodes):
Settings in detail
This is all the basics, here's what things are actually configured to do:
First part does all the installation, this is as easy as it gets.
But, a bad thing happened:
The install did not work! Reason was the default kernel at scaleway didn't work for docker.
So I changed the kernel at scaleway and temporarly put a cronjob directive in place that would reboot the systems at night.
I woke up to systems with running docker daemons, so point proven.
Later I found the bootscript modification already added to the example script above.
Now one more thing about dynamic groups. Imagine I had some servers with the new bootscript and a few with the old ones?
One way to bring them over would be to update it everywhere and add a "node property" (a key-value setting) that reboots the systems if the bootscript is already switched, but their current kernel does not match!
The next and final is a double-step:
Track your configured swarm herder in a file in /etc - and if that file changes, leave and then join the swarm.
Just to be clear, the file is automatically filled and the other two "command" sections are following suit.
I didn't configure the master / pw using a variable since I was really eager to be done at some point :-)
That's all there was!
|What I spent time on||Time spent||Fun Factor||Did it work|
|Using existing policies for base setup||Close to nothing, I didn't even check, just worked from keys to updates||none||yes|
|Modifying Scaleway scripts to setup for docker||Very little||Great!||yes|
Trying to fix scaleway issues and configure bootscript
(kernel with docker AUFS support)
|About 80% of time, multiple hours and no successes||SUCKED|
|Maintenance of Rules (turn off old ones)||Quite a bit since there's no multiselect||SUCKED||yes|
|Maintenance of failed deployments|
Quite a bit but I was stupid - I forgot I keyed by UUID -
I could've had a full workflow for this.
|Creating Policy for docker + swarm||Very very little||Great!||yes|
|Find out how to automatically set bootscript||Very little (once I was told how)||Great!||yes|