Note: This post was previously published on January 29, 2009 on a deceased blog. R.I.P.
I’ve had my DreamHost account since May of 2006. Compared to some previous hosts that didn’t give me a lot of control of my portion of the webserver, getting DreamHost which allowed shell access was amazing.
A couple weeks ago I was browsing through the Panel and noticed something that I must have overlooked for quite some time. With each account DreamHost gives you 50 GB of storage that you can use to backup your own personal files. If you use up more than that, it costs only 10 cents per extra gigabyte a month.
You can manually backup your files through FTP, but the secret to a good backup plan is that you shouldn’t have to think about it. Set it up once and let it do its thing and hopefully you never have to use it. You should still monitor it every once in a while, because the worst thing that could happen is that your backup is incredibly old when the time does come. That’s really not too bad.
Luckily the service supports a Unix command called rsync. Every time rsync is ran on a certain folder, it will compile a list of only the files that were modified between the last time it was run. That means that once you’re done with the initial backup, subsequent ones will take little time.
Set up the Backup User
The first step that you’ll need to do is go to the DreamHost panel. Under Users, go to Backups User. Set up your password.
Set up Passwordless SSH
You can’t log into the backup server with SSH, but rsync uses the same protocol. If you don’t want to punch in the password then you’ll need to setup passwordless SSH.
Open the Terminal and enter in the following commands:
ssh-keygen -t dsa
cat ~/id_dsa.pub >> ~/.ssh/authorized_keys
Fire up your favorite FTP client and log into the backup server. Create a new folder called .ssh (don’t forget the dot). Upload the authorized_keys file that you created with the previous commands.
Initial rsync Sync
I use the following command:
rsync -avc –delete ~/SourceFolder username@backup.dreamhost.com:~/DestFolder
rsync is obviously the script. The -avc are parameters. The -a tells rsync to use archive mode which preserves timestamps, the -v increases the verbosity, the -c will skip based on checksum.
I had the -delete so that any file that is no longer in my Source Folder, will be deleted in the Destination Folder. This is dangerous if you’re planning on using this to keep different versions, but for me I use Time Machine to save those little files. This backup is just for catastrophic failures if my laptop and Time Capsule both break.
The Source Folder is the folder you want to sync on your computer. The Destination Folder is the folder you want it to go on the backup server.
Depending on the amount of data, this initial sync can take a long time. I synced about ~35 gigabytes my first time and off and on it took 6 days. My network crapped out a few times, so I had to re-run the command (the nice thing was that it knew where it was and didn’t resync files it didn’t have to).
Setting up a cron job
After the initial sync, I now let rsync run on a cron job. To set it up, fire up the Terminal and type crontab -e.
My syntax is (all one line):
0 3 * * 1 rsync -avc –delete ~/SourceFolder username@backup.dreamhost.com:~/DestinationFolder >> ~/Documents/rsync_log.txt
The “0 3 * * 1″ lets it know to run every Monday morning at 3 AM. More information on crontab syntax can be found elsewhere.
The only other thing that is different is the “>> ~/Documents/rsync_log.txt”. This syntax will write out the output of the script to the rsync_log.txt so you can look at it at another time.
And with that, we’re done. The whole process took me a week before everything was set up and running. Now I know that I have yet another backup and I can sleep a little easier.
Ryan,
Something I think has changed with dreamhost since you wrote this. Early 2009 I previously used this to set up cron based backups from my users, but as I just created a new user and tried to use the same technique, I noticed the id_dsa.pub key is different then the previously used authorized_keys in the backup users .ssh directory.
I haven’t figured out how to resolve this update this yet, but if you figure out it can you update/repost on the subject?
Thanks,
Rich
Wait, never mind, wrong file. I’m an idiot. Thanks for the great info!