Which sounds better after a disaster: a social media post to customers that your systems will be back up “soon” or you will be back online no later than 10AM?
If you don’t regularly test your backup and recovery process, you have no idea if your backups actually work or how long it will take to restore your systems. Are you willing to take that chance?
Let’s say your infrastructure team is responsible for developing a backup, restore, and recovery process. Since your infrastructure has never had a problem, the same must hold true for the backups, right? This is faulty logic.
If testing is never performed on your backup and restore processes, you will never know:
- Does the process work?
- Are your personnel familiar with your backup and recovery procedures?
- Are backups complete and timely?
- How long does it take to restore your system to working order?
- What bugs might occur during the backup or recovery process that can be fixed?
How to Make Backup and Recovery Testing Effective
All backup testing procedures should be automated as much as possible, so human resources are not pulled away from their day-to-day tasks. In an ideal world, testing would be performed every time your system completed a backup. However, this is usually impractical due to time and resource constraints. Nonetheless, testing backup and recovery procedures does need to happen on a regular basis to ensure the processes work, and that they can handle more data as your business grows. While the actual backup and recovery steps will vary by organization, there are generally three types of backups and testing processes:
1. Spot check file system backups
File-level backup is basic and usually for PCs and servers. To make sure you can recover lost files, you should try recovering these files from the backup to a new temporary folder and then compare them to the originals. Are they the same size? Can you open the restored files and view their contents? By using this simple test, you confirm you know how to find files in your backup system, the backups are current, and you are familiar with the backup system’s restore functionality.
2. Full system restore test
This second recovery test is a bit more challenging because it’s time-consuming and requires sufficient free space to accommodate the volume of restored data. However, this test ensures you can recover from total data loss. In this situation, you’re going to need to complete a full system restore, which will require a spare or virtual computer.
After the system is restored, randomly sample files to make sure they were restored properly and test applications and services to ensure they work as expected. If the restored spare or virtual computer is working as expected and the sampled files are correct, you’ll know the test was a success and your backup is working.
3. Database Recovery and Verification
Database backup and recovery works a bit differently than the previous two tests.
First, when testing database backup and recovery, you absolutely must have a place to restore the data to that won’t interfere with the original database. The database should be restored to a server that is running the same type and version of the original database management system. If you don’t have a spare server, it’s possible to recover the database to the original server--but be careful. If you’re forced to do this, you will need to use a different name for the restored database so you don’t overwrite your real database or interfere with production use of that database.
Once restored, there are three types of tests you can perform for verification:
- Run a spot check. Run queries against the production database and recovered database and compare the results. If the results match, you know the recovery was a success.
- Run a macro test. In this case, simply count the number of rows in a critical table or get the over-all size of the database. These numbers should roughly match your production database and any differences should be easily explainable by the normal operation of the database.
- Run an application test. The idea is to connect your applications to the recovered database and make sure they operate correctly. This is a little harder to setup, but it is very effective.
When to Test Your Backup and Recovery Process
Backup testing best practices dictate that you setup a dedicated schedule for your data. Testing your backups once a year is better than never; however, moving to a quarterly or monthly schedule increases the assurance that when you need to restore, you can. If you can automate backup testing procedures, you can run them at night or on weekends when employees and customers are not interacting with the system.