A few months ago a server crashed and took down two blogs that I read. Within a short time, years of hard work had come down to one thing, a simple backup that did not exist for either of these blogs. This event was so catastrophic that the International Backup Awareness Day was declared in remembrance. It made thousands of developers, including me, question their backup/recovery strategies. The “what if’s” played through my head for days. Unlike Phil and Jeff, I would not be lucky enough to pull my data from users’ cache or old virtual hard drives. Even the SEO Toolkit would be of little help. My room for error is thin and failure would definitely lead to my demise.
I needed a good evaluation of my existing backup plan. The US military conducts War Games to test scenarios, theories, and battle plans. Taking from this, I decided to conduct my own War Games to test my disaster recovery/prevention strategy.
Scenario 1: Server Failure – FAILED
REQUIREMENT: If the server should fail, I would be able to restore the database on another server, loosing less than 1 hour of data.
To test this, I “pretended” my server was shutdown. Many will ask, “How did you fail this?” To be fair, I did have a backup from offsite storage. However, it was more than my required limit of 1 hour or less. This or any other lame ass excuse is not good enough. I would be in a tight spot trying to explain why the past four hours of transactions were lost.
I run regular backups of my databases. Where I went wrong was not having a recent backup somewhere other than the database server. Since our offsite storage solution allows me to schedule a pickup, it would be easy to use it for this. I think there are better solutions that would cost less money and be more accessible to those that might need these backups.
Currently I am using Microsoft's Sync Toy to sync a shared folder with the database server’s backup folder. The backups are small and take very little bandwidth on a 10gb network. This is not my ideal method, but it works with the time constraints I have now.
Another option is to setup Microsoft SQL Server to save the backups to a shared folder. This requires MSSQL to run under an account that has READ/WRITE privileges to the shared folder. This seems to be a simple solution.
Scenario 2: Server Failure - FAILED
REQUIREMENT: If the server should fail, someone other than me would be able to restore the database on another server, loosing less than 1 hour of data.
There is a chance that you might not be available when something happens. To cover your ass as well as provide a second line of defense, it is important that your disaster recovery/prevention plan be readily accessible to key players in your organization. Besides being accessible, your plan needs to provide useful information and instructions.
- At a minimum your disaster recovery/prevention plan should include:
- List of databases and the applications they support
- Ordered list of who is responsible for getting the databases up and running with at least two contact numbers for each
- Details of where the backups are located
- Detailed instructions on how to restore the databases from backups
I failed this scenario because I did not have a document that my organization’s key players could access. I have discussed changes in backup plans with my boss as well as the IT department. However, these conversations will likely be forgotten when the shit hits the fan.
To say the least, I did not fare so well on these two scenarios. Yes there are many other scenarios to play against. But these were the two that I assume would be the most beneficial. In the spirit of actual military War Games, my organization’s key players will take part in future games, acting in response to possible real world events extending beyond just database recovery. At least that’s my plan. It will definetly require active participation from key players and IT. Surely they will see the benefits from testing what you set to do.