If you’ve been using computers for a while, chances are you’ve lost valuable data more than once. Most computers nowadays come with CD writers (and many with DVD writers) which is great for the occasional manual backup – there is even the nice OSX-specific iSync tool which remind you when to perform the backup, and can manage the files being backed up too. However what would be better would be an automatic solution.
The rsync tool ships with OSX, Linux, and even Windows. Rather than copying and overwriting files on a remote location rsyc
performs incremental transfer of files – only those files not already present will be copied – this drastically reduces the time taken to perform backups.
For my purposes, I wanted files to be backed up off site. I decided to sign up for a Strongspace account. Strongspace provide external, secure storage – you can access your files either via the web interface, SFTP, or (handy for our purposes) via rsync
. While the rest of the article (and following parts) assume you’ll also be backing up to a Strongpsace account, it is a small matter to instead copy your files to an external disk drive or even another computer on your network. Note that Strongpsace supports rsync
by running an rsync
server – so these instructions should work equally well for backing up to any other rsync
server.
Assuming you have your Strongspace account setup already, lets get started. To start with, I simply want to backup the contents of my Documents
folder:
rsync -azv /Users/johnsmith/Documents johnsmith@johnsmith.strongspace.com:/home/johnsmith/backups/mac
When run, this will copy the contents of Documents
to the remote directory /home/johnsmith/backups/mac
. Unfortunately, since the release of Tiger my Documents
folder has become stuffed with lots of Dashboard widgets I never use. To ignore these, I specify the exclude
option:
rsync <del>azv </del>-exclude "Widgets" /Users/johnsmith/Documents johnsmith@johnsmith.strongspace.com:/home/johnsmith/backups/mac
The exclude
option can take a regular expression, so you can specify multiple directories if you want to. If you want to backup files from multiple locations, then consider having a single backup
directory, then create symlinks to the directories you want backed up. You’ll need to add the L
option so that rsync
copies the referenced directory rather than simply copying the symlink itself:
rsync -azvL /Users/johnsmith/backup johnsmith@johnsmith.strongspace.com:/home/johnsmith/backups/mac
Notice that when you run the command, you get prompted every time for your strongspace password. In part two we’ll start using an SSH key to avoid this, and we’ll be well on the way to providing an automatic backup solution.
9 Responses to “Trouble Free Backups, Part One – rsync and Strongspace”
You should probably mention that the built in rsync in Mac OS X doesn’t copy the resource fork which means that the files are pretty useless when restored from backup. You need to get RSyncX from here:
http://archive.macosxlabs.org/rsyncx/rsyncx.html
Ah – I wasn’t aware of that, thanks Jon (although I have been using the Fink build of rsync rather than the built in one). I’ll get round to updating the tutorial in a day or so.
Google’d a bit, and found that RsyncX won’t work perfectly if the destination server isn’t also OSX (apparently suome of the extra metadata can get lost). There is a “patch(rsync+hfsmode patch)”:http://www.quesera.com/reynhout/misc/rsync+hfsmode/ for the OSX rsyc which enables the copying of HFS+ resource forks and finder metadata.
Some more digging turns up at least one reference to rsync supporting resource forks in Tiger – I’ll try and test this tonight – so from the sounds of it you should only need rsyncx/the HFS+ patch if running Panther or earlier.
More googling implies rsync has a bug with resource forks in Tiger:
bq. on Tiger, rsync has a critical bug when handling files with resource forks. The bug makes rsync generate a new copy of files and directories with resource forks in each increment.
(from “this topic(AppleTalk forum)”:http://forums.appletalk.com.au/index.php?s=ba6dc11860726cb374cc170f466d6464&showtopic=11327&st=0&#entry93258)
*sigh* – for me this is probably OK though, I really only backup code and my documents to strongpsace. The bulk of my stuff gets backed up to an external drive, so I guess I can use RsyncX (or even Backup 3) for that.
Yeah, it’s a bit of pain in the backside those resource forks. Please do blog if you come up with a good solution for using rsync to a non-OSX machine that preserves resource forks.
(Sometimes I just wish Apple would’ve dropped resource forks when they switched to Unix.)
They _are_ deprecated according to Apple – they’ve been trying to get people to move away from them for a while, but I guess they had to keep them for backwards compatibility.
For the moment it looks like I’ll have to use a local full backup to either an external drive or another mac, and use my strongspace stuff for things like my documents, code and the like.
Apparantly, the version of rsync that ships with Mac OS X 10.4 includes an -E option to include “extended attributes” of files (such as resource forks). There are issues with Spotlight though: see this hint: http://www.macosxhints.com/article.php?story=20051104185525439&lsrc=osxh
What I’ve done to back up several machines on a staff network is:
1. Create a central server which uses rsync to back up each Mac
2. That central server nfs-mounts a Unix box (but you could do it with sftp virtual drives if nfs is not available or desirable)
3. The central server then saves the data to a disk image using hdiutil on the unix server. Voila: resource forks preserved, and all you have to do is download a handy disk image when you want to restore đŸ™‚
So, I still have not had a consistent answer on what happens if you *do* use rsync to try and backup to Strongspace (which I have by dint of my Joyent textdrive account).
Basically, an rsync backup of my Mac would kick a** (particularly as my flat was burgled and drive stolen few months back).
So, let’s say I rsync my entire Users folder to Strongspace and then need to restore the entire thing back to a new machine because of a complete hard drive failure on the PB.
Has anyone actually tried to see what would happen ? with or without the -E flag ? (in particular, mail archives, documents, photos, sites and sources directories and perhaps a compressed automated mysql dump)
Basically, I just want to know that I’d be able to reconstruct critical data in the event of catastrophic loss.
let me know,
Daryl.