Make your own configuration deployment system, part 1

Introduction

In this series of articles, I describe the steps to making a flexible configuration deployment system tailored to your needs. It can be as simple or as complete as you care to make it. And since you made it, you can understand it intimately.

If you have two or more machines to manage, you have probably noticed that they have certain similarities of configuration.

These similarities may include

network configuration
basic package list
configurations of packages
aliases and shortcuts
internationalisation settings

You may have spent an enormous amount of time finding the ideal configuration for a piece of software and you would really regret losing your masterpiece in an unfortunate accident. Or you may need to rapidly deploy the same configuration change to a hundred machines. Or you may be simply tired of doing the same procedures every time you install a new machine.

A configuration deployment system can greatly reduce the amount of work necessary to manage 2 or more machines, but the amount of time necessary to learn the in-and-outs of currently existing systems may be daunting. ISconf, FAI, cfengine, debconf+LDAP, Subversion, etc. all have their strong points, but if you are just getting started, they are probably overkill. One solution is to build your own system from scratch.

Essential components

The essential components of the system are:

a configuration repository, i.e. what to deploy, including containing the database of configuration files, data, package lists, scripts, and jobs
a configuration transfer method, i.e. how to get the data to the clients
a collection of deployment scripts, i.e. how to apply the data to the clients

Depending on your needs, you can use solutions that overlap the boundaries of these functional divisions or you can keep them strictly separate which allows you to easily substitute methods or build on them if the need arises.

Configuration repository

You have a wide choice available for a configuration repository. Here is a non-exhaustive list of possibilities:

directories of files, one directory for each machine
directories of files, organised by classes
tarballs (or .deb or .rpm packages), one for each machine
versioning systems like CVS or Subversion
LDAP server
SQL database

There is a choice of media too. You can use a network-connected server or some removable media like a floppy, USB key, or a CDROM.

Note that one is not limited to one configuration repository - you can have multiple repositories, but you will have to make decisions about their priorities and what to do if a repository fails.

Configuration transfer method

You need a method to get your configuration from the repository to the client machines. This is somewhat determined by your choice of repository, but there is still some flexibility.

Here is yet another non-exhaustive list of common methods:

direct copy from removable media
direct copy from network-mounted share
rsync or scp or SSH
download from FTP or web server
versioning system check-out (CVS, Subversion, etc.)
transfer integrated into configuration-management software (cfengine)

And here are some more exotic methods for transferring configuration info:

POP or IMAP
LDAP query
SQL query
SNMP query
DHCP query (somewhat limited)
IRC download (think "botnet")
peer-to-peer (like Bittorrent)
DNS query (!)

Deployment scripts

After you get the configuration info to the client it must be used, but how? Again, you have a lot of flexibility.

Config files can be simply be copied into place automatically, or first manipulated in a local workspace to resolve configuration priorities coming from several repositories and then finally copied into place.
Scripts can be used to automatically edit configuration files and registries using the new values of various parameters if a change is necessary.
Little jobs to check/signal/reload/restart daemons can be triggered if configuration changes.
Old config files can be backed-up before being over-written.
A configuration roll-back mechanism can be implemented.

Research and define your needs

One of the most thoroughly thought-out configuration systems is ISconf, found at www.isconf.org. ISconf is probably too complicated for a beginner and over-kill for just a few systems, but the philosophy and history of the system is detailed at www.infrastructures.org and it is well worth the time to read over the paper "Bootstrapping an Infrastructure" at http://www.infrastructures.org/papers/bootstrap/bootstrap.html.

Since I usually use Debian or Ubuntu, my preferred installation/configuration system is FAI, "Fully Automatic Installation", http://www.informatik.uni-koeln.de/fai/.

One of the sub-systems used by FAI is cfengine, www.cfengine.org, a self-contained high-level scripting language and configuration deployment system itself.

Before you reach for your favorite scripting language, think about what you want your system to manage now and in the future. A few hours of reading reflection at this point could save a few false starts and re-inventions of the wheel.

contents of system config files only?
file permissions and ownerships?
user files too?
changes are fully automatic or just advisory?
push or pull?
polled or instantaneous changes?
logging?
backups?
roll-back capability?
multiple source?
package management?
multiple distribution?
multiple OS?
how many sites?
integration with present systems?
preserve local admin changes?
bullet-proof or hackware?
cryptographicly secured?
management interface other than the command-line+vi?
uploading of local changes?
confirmation of changes?

Hints and warnings

Organise your deployment by following the checklist at http://www.infrastructures.org/bootstrap/checklist.shtml. The principle is to always assemble the lowest-level infrastructure first in order to save time assembling the rest.

Make sure that everything in your DNS is complete and perfectly correct. A misspelling of a machine name or a false address will cause all sorts of time-wasting mysteries.

Use NTP to make sure every machine knows precisely what time it is or updates based on "make" or file time-stamps can fail in a bizarre manner.

Decide on a method for dealing with local changes (AKA cowboy admins). You might consider strictly forbidding local changes to configuration like Infrastructures.Org and FAI recommend.

Install integrit or some other file-system integrity checker and tune it so that configuration changes are obvious. That is, tune it to ignore files that are expected to change so that the reports are always tiny.

Simple examples

Here are some simple examples of configuration deployment systems. For small networks of composed of a small number of more-or-less identical machines all on one site, these examples may be all that you need. The examples also illustrate how the functions of configuration repository, transfer, and deployment scripts can overlap.

Simple recursive copy

Assume that you have a directly-accessible repository directory /srv/cfg/site/etc. It contains only /etc files that are valid for every machine at your site, eg. /etc/resolv.conf, /etc/hosts. To deploy these files, just copy them recursively into place using the GNU "cp" command and its "-a" or "--archive" option to preserve modification time, ownerships, and permissions:

        cp -a /srv/cfg/site/etc -T /etc

There are a few problems with the above example. Firstly, the files will be copied every time the command is run even if the source and target files are already identical. Apart from being inefficient, this might cause file integrity systems (like integrit) to trigger a useless warning. Secondly, if modifications were made to the files in /etc but the repository was not updated, the changes will be wiped out without a backup. Nevertheless, if your needs are simple and you intend to manually run the command only on the rare occasions that there is a change, this may be all that you need.

Congratulations - you are done.

Simple recursive update (based on file mod time) with backups

GNU cp has two options that are interesting: the "-u" or "--update" option that will copy a source file only if its modification time is newer than the target file and the "-b" or "--backup" option that makes a single or incrementally-numbered backup of the target file if a copy is done. Here is how they might be used:

        cp -u -a --backup=numbered /srv/cfg/site/etc -T /etc

This method has problems too. You end up with /etc directories cluttered with backup files with names like "hosts.~4~" that need to be dealt with. And if one of your target files is touched, which changes the modification timestamp, the cp will not copy the source to the target since the target is newer. This is a problem if all machines are supposed to be always using the canonical configuration file from the repository. Local administrators might consider this problem to be a feature and not a bug.

Simple recursive update (based on contents) with backups

Ideally, the updates should be based upon the files' contents, not their modification times. By default rsync will update only files with differing mod times or sizes, but it can be told to ignore these checks and look at file contents with the "-I" (or "--ignore-times") and "-c" (or "--checksum") options. In addition, one can specify a separate directory for keeping backed-up files:

        rsync -I -c -a --backup --backup-dir=/var/backup /srv/cfg/site/etc/ /etc

Simple recursive update from a remote repository with date-organised backups

Of course rsync has extra features that make it the ideal simple configuration deployment tool. It has remote file-transfer capabilities that can be used to solve the problem of access to the configuration repository if it is on another machine in your network instead of some locally-accessible media.

Assume that "cfg" is the name (or even better, a DNS alias) for the configuration repository machine and we want to save backups of local files that get replaced into directory hierarchies organised by date (and time, if you need). The configuration deployment commands could be:

        bd=/var/backup/cfg/$(date '+%Y/%m/%d'); mkdir -p $bd
        rsync -I -c -a --backup --backup-dir=$bd root@cfg:/srv/cfg/site/etc/ /etc

Simple recursive update from multiple remote repositories with date-organised backups

So far, we have only been recuperating site-wide /etc files. It is highly probable that we want to add useful files to /usr/local/{bin,sbin}, /root, and other directories. And we probably want to manage customisations that are valid only for a particular machine. The structure of our configuration repository on "cfg" might look like this:

/srv/cfg/site/
/srv/cfg/site/etc/
/srv/cfg/site/etc/hosts
/srv/cfg/site/etc/resolv.conf
...
/srv/cfg/host01/
/srv/cfg/host01/etc/
/srv/cfg/host01/etc/network/
/srv/cfg/host01/etc/network/interfaces
...
/srv/cfg/host02/
...

Here are the deployment commands to run on host01, host02, etc.:

        bd=/var/backup/cfg/$(date '+%Y/%m/%d'); mkdir -p $bd
        rsync -I -c -a --backup --backup-dir=$bd root@cfg:/srv/cfg/site/ /
        rsync -I -c -a --backup --backup-dir=$bd root@cfg:/srv/cfg/$(hostname)/ /

What next?

Part 2 of this series will probably deal with writing helper tools, for example a script to easily check files from the client into the configuration repository. If there is interest in this article, direction of the series will be in part determined by any questions that are posed.

About this document

URL: http://www.rtfm-sarl.ch/articles/configuration-deployment-p1.txt

HTML-conversion: txt2html --titlefirst --noanchors --preformat_trigger_lines 1 configuration-deployment-p1.txt > configuration-deployment-p1.html

Title: Make your own configuration deployment system, part 1

Version: 2008-06-27-001

Author: Erik Rossen <rossen@rossen.ch>

Licence: Creative Commons Attribution-Share Alike 2.5 Switzerland, http://creativecommons.org/licenses/by-sa/2.5/ch/