Sam Newman's site, a Consultant at ThoughtWorks

Archive for ‘October, 2005’

Now we’ve worked out what files you want backed up, and we’ve sorted automatic authentication, the only thing left to do is schedule our backups. Unfortunately we can’t make use of the very handy iSync or Backup tools to do this (Apple don’t seem keen on opening up these tools up, as they no-doubt help sell expensive .Mac accounts).

h3. cron

cron is available on all *NIX systems, and is a well established solution to running all kinds of house keeping tasks. On OSX 10.4 however cron is being phased out in favour of launchd which we’ll look at next.


As of OSX 10.4 a new daemon called launchd has been added. Eventually this will replace tools such as cron, rc and xinetd – however it’s low on the userfriendly scale right now. Unlike cron’s simple crontab file, launchd requires XML files which can be a pain to edit without something like the shareware program launchd editor. Hopefully later OSX releases might bundle a decent UI for managing it.

Anacron + launchd – for those who like to switch their computers off

Both cron and the newer launchd suffer from the same problem when it comes to running routine housekeeping tasks – if the computer isn’t on when a task is scheduled, the task won’t get run, and even when you switch the computer on again, neither tool can work out that it should try and catch up by running the task on startup.

So using either cron or launchd means that if we use either task to manage our backups, the computer has to be on when the backup is scheduled to run.

Anacron attempts to solve these deficiencies. It stores a record of when each job was last run, so on startup it can work out if it needs to catch up. This is especially handy for laptops which are probably only on for a few hours a day.

Depending on your operating system, you can either get anacron setup to work with cron, or launchd. Assuming you take the anacron + launcd approach, grab Ronald Florence’s highly useful Anacron for OSX 10.4 installer. Once installed, you should have a file /etc/anacrontab which will look like this:


#period delay job-identifier command 1 5 cron.daily periodic daily 7 10 cron.weekly periodic weekly 30 15 cron.monthly periodic monthly

The first field is the frequency of the job – 1 means daily, 7 means weekly and 30 means monthly. The second value is the delay in minutes between anacron realising a job needs to be run, and it actually getting scheduled. The third field is a unique identifier for the job, and finally the command itself.

To simplify things we should create a simple script which contains our rsync command(s) – for now you can place it into your bin directory, and call it something sensible like backup. Next, assuming you want to run the backup job daily, add the following line to /etc/anacrontab:


1 10 backup.user /Users/johnsmith/bin/backup

To make sure anacron is working (and is picking up your job) simply start, and select Open System Log from the File menu – you should see its output. If you need to tweak your backup job, you can remove the record of the previous run /var/spool/anacron/ – it should contain one file per record in anacrontab. Then, you can force anacron to rerun the job by executing anacron -n.

And there you have it! Trouble free, secure backups that will get run even if you switch your computer on for only a few hours each day.

In our last part, we used rsync to connect to a remote server to perform incremental backups. The problem is that we really want this to be automatic. Scheduling when a backup occurs is actually fairly simple. What is more work is performing automatic authentication so our backup can occur without user intervention.

Authentication using an SSH key

We’ll be using an SSH key to authenticate ourselves with Strongspace (or any other rsync server for that matter). Thanks have to go to Jens-Christian Fischer’s post on his own backup solution, which helped get me started. To start with, open a terminal and enter the following command:


$ ssh-keygen -t dsa -b 1024

Leave the key location unchanged, but enter a passphrase. You can choose not to enter a passphrase (and this will simplify things) however this is pretty insecure. If anyone gets hold of your key they’ll be able to access your Strongspace files without the need for a password.

Now we’ve generated our key, we can use it to authenticate ourselves when connecting to the remote server. The server we’re connecting to needs a copy of our public key:

$ cp ~/.ssh/ authorized_keys

Log into Strongspace via the web interface, create a directory called .ssh and copy the authorized_keys file into this directory. To test this, run the rsync command we created in part one:


$ rsync -azvL /Users/johnsmith/backup

This time, rather than the Strongpsace server asking you for your password, you’ll get prompted for your key’s passphrase.


So at this point you must be thinking “Well, what was the point of that – you’ve replaced the need to enter the password for the remote server, with the need to enter a passphrase for your key!” – and you’d be right. What we need is some kind of software that can be used to automatically handle unlocking our key.

SSHKeychain is an OSX specific SSH key management tool. When using tools like rsync or ssh, SSHKeychain can automatically lookup your passphrase. Download and install the tool, then start it up. Open the Preferences pane, select the Environment tab and enable Manage global environment variables – this will allow other applications to use the keys managed by SSHKeychain. Check the keys tab and ensure your key (~/.ssh/id_dsa) is visible. Finally select Agent/Key Status from the Agent menu, enter your passphrase for your key, and enable the option to add the passphrase to your keychain – this means that when you log on, SSHKeychain will automatically have access to your SSH key’s passphrase. So long as you’re logged on you’ll have no need to type it in again.

For SSHKeychain to start managing your key, you’ll need to log off and back on, but before you do it’s a good idea to add SSHKeychain to your startup items so you don’t have to remember to start it up every time.

Once you’ve logged off and back on, SSHKeychain will be silently managing your keys. The first time you use an application like ssh or rsync, SSHKeychain will look in your OSX Keychain, locate the passphrase for your key and automatically authenticate you. Note that you will need to be logged in for this to work, as SSHKeychain needs to be running, and your OSX Keychain needs to be unlocked.

To test this, after logging back in run our rsync command again – this time you shouldn’t be prompted for any password.

In part three we’ll complete our backup solution by creating an automated backup script.

If you’ve been using computers for a while, chances are you’ve lost valuable data more than once. Most computers nowadays come with CD writers (and many with DVD writers) which is great for the occasional manual backup – there is even the nice OSX-specific iSync tool which remind you when to perform the backup, and can manage the files being backed up too. However what would be better would be an automatic solution.

The rsync tool ships with OSX, Linux, and even Windows. Rather than copying and overwriting files on a remote location rsyc performs incremental transfer of files – only those files not already present will be copied – this drastically reduces the time taken to perform backups.

For my purposes, I wanted files to be backed up off site. I decided to sign up for a Strongspace account. Strongspace provide external, secure storage – you can access your files either via the web interface, SFTP, or (handy for our purposes) via rsync. While the rest of the article (and following parts) assume you’ll also be backing up to a Strongpsace account, it is a small matter to instead copy your files to an external disk drive or even another computer on your network. Note that Strongpsace supports rsync by running an rsync server – so these instructions should work equally well for backing up to any other rsync server.

Assuming you have your Strongspace account setup already, lets get started. To start with, I simply want to backup the contents of my Documents folder:


rsync -azv /Users/johnsmith/Documents

When run, this will copy the contents of Documents to the remote directory /home/johnsmith/backups/mac. Unfortunately, since the release of Tiger my Documents folder has become stuffed with lots of Dashboard widgets I never use. To ignore these, I specify the exclude option:


rsync <del>azv </del>-exclude "Widgets" /Users/johnsmith/Documents

The exclude option can take a regular expression, so you can specify multiple directories if you want to. If you want to backup files from multiple locations, then consider having a single backup directory, then create symlinks to the directories you want backed up. You’ll need to add the L option so that rsync copies the referenced directory rather than simply copying the symlink itself:


rsync -azvL /Users/johnsmith/backup

Notice that when you run the command, you get prompted every time for your strongspace password. In part two we’ll start using an SSH key to avoid this, and we’ll be well on the way to providing an automatic backup solution.

Update: I’ve caught what I think is a cross between avian flu and sars, with a bit of ebola thrown in. Needless to say I’ll be unable to attend this evening – so Simon B is in charge. Needless to say I’m a little annoyed.

OK, arrangements have been made and we have a new venue for the monthly Django/Rails/Python/Ruby meeting. This time around we’ll have our own space at the Old Bank Of England, which should be much more quiet than Smiths. Like last month, we’ll hopefully be joined by the London Python group, and Django/Rails/Python/Ruby newbies are more than welcome.

Given that we’ll have a better venue for it, I’d also be as keen on people showing demos – hopefully I’ll try and get my colleagues to repeat the demo created for our recent GreenPeace bid. Make sure you leave a comment if you’ll be attending, as I’d like to let the barman know if we’ll drink him out of house and home…

Good developers create good technical solutions to problems.

Good consultants find that delicate balance between being themselves and not being themselves that enable them to get the job done for the client.

This was driving me nuts for ages – I couldn’t work out how to turn off that annoying beep you got whenever an auto-completion had multiple or no hits. Anyway, it’s a ReadLine configuration (so it’ll apply to any shell using ReadLone) – edit (or create) .inputrc in your home directory, and insert the following line:

set bell-style off 

Finally – peace!

Carlos make a very interesting point concerning my suprise that Flickr haven’t invested in functional testing:

…[not testing can incur] less upfront [costs] and more maintenance costs over time. As a startup (well, before they were acquired by Yahoo, anyway), this makes sense: the whole point of a startup is that you can do riskier things, and they guessed at some point that automatically testing anything but the most significant bits (smoke tests?) wasn’t as important as getting code out the door, fast, and obssessively listening and reacting to user feedback. This probably required keeping insane levels of attention to detail and commitment, which is quite rare I might add, but a great part of what I attribute to their success.

That certainly seems to tally with how the Flickr team went about spending money during their early years (and software is nothing if not expensive) – they only spent money if needed. The problem with developing an application without testing in mind, is that it can make testing at any level other than functional very difficult without restructuring after the fact (whch by definition is harder without lower level tests).

It is certainly possible the Flickr team decided not to test due to the percieved cost of testing – I wonder how many PHP developers back when Flickr was developed were aware of the testing options out there? They certainly didn’t have Selenium, and I doubt FIT was up to much back then. Without such higher-level functional testing aids, the cost of developing in-house functional tools (or bringing in something expensive and complicated like Mercury) may well be prohibitive.

On the other hand, is the average PHP developer interested in testing? Take a look at the testing tools available to the large PHP community – then compare it to the wealth of tools and API’s available to Java. There seems to be nothing like the same demand amongst the PHP crowd for such tools. It may well be of course that PHP projects are on the whole smaller scale and less complex, and therefore have less need for higher-level testing tools.

I’d like to think that the fact that many PHP developers now seem to be trying out Rails (and by extention I expect to see them trying out Django if/when it gets more hype) is a sign that they realise being more rigorus in their development approach is important, and that a more advanced language will make it easier to embrace programming techniques such as OO (even if some of them do think it is insane 🙂 ). But the cynic in me still thinks most of them are being attracted by great Ajax support and the percieved benifits of scaffolding.