Sunday, January 10, 2010

Use it or lose it

When I worry about IT things that could go disastrously wrong at my company, I first usually worry about losing our database of users and all their personal information followed by the prospect of just losing our database due to hardware failure, corruption etc. Joel on Software has a good post about how people need to worry less about backups and more about restores. I think this is great advice.

The primary way we make sure db restores work is to constantly do restores and use the data. Every one of our developers and QA engineers has a development environment that is a full copy of our entire website (website, batch jobs, db etc). Every night a production backup is restored into their personal development database. If there's a problem with the backup / restore or the code, we know immediately. Better to find out something is broken before you really need that restore to work!

We store every night's backup going back a week, then store every week's backups going back a month, and so on. This helps protect us against subtle database corruption issues that could ruin the last few night's backups. It also gives us a good way to go back in time and try to figure out when data problems first happened.

This advice goes for other stuff beyond backups and restores. I don't trust fail-over servers to work when the primary server goes down unless I've tried failing over live traffic every week. Things rot. Anything you're not doing constantly probably doesn't work.
blog comments powered by Disqus