Why you need to test your Disaster Recovery (DR) plan, and how to do it
While almost all organizations have some form of disaster recovery in place, most of them don’t test whether they will actually work during times of crisis. Find out why you need to test your disaster recovery plan, and how you can do it.
What do storms and squirrels have in common?
They are the two leading causes of power outages in the country. Storms can cause massive damage across a wide area, while squirrels can chew through power lines causing localized damage to infrastructure. Both can strike without warning, which goes to show you that a disaster can happen at any time in any form, and your business might take the brunt of the impact.
The cost of disasters for organizations can be so severe, FEMA reported that 40% to 60% of small businesses don't survive major events. These include IT incidents like massive data loss.
Disasters can happen when you least expect them. However, just because you can't predict something, it doesn't mean you can't prepare for it.
That's where your Disaster Recovery (DR) plan comes in. It can help you mitigate financial losses, restore data and inform your team what to do in case of emergencies. Unfortunately, while everyone knows it's important, it often becomes an afterthought.
At ITS, we have years of experience helping our clients recover from cyber attacks, server failures, and other disasters. So believe us when we tell you; DR testing is vital for your business continuity.
In this article, we'll help teach you how you can prevent mishaps like the ones we mentioned above through DR testing. To do that, we'll dive into the following:
- Why do you need Disaster Recovery testing?
- How do you test your DR plan?
- How often should you test your DR plan?
Why do you need Disaster Recovery Testing?
According to a study conducted by Spiceworks, while almost all organizations have a disaster recovery plan in place, one in four (23%) don't test them. That's because many believe that testing the plan isn't worth their time. If you do the same, it may come back to bite you.
What do the Fergusson Medical Group, Delta Airlines, and the California DMV have in common? They all had DR plans in place, which failed miserably because they didn't test them.
- Fergusson Medical's plan was insufficient, causing them to lose over three months of patient data after a ransomware attack.
- Flawed DR plans were the cause of a Delta outage which canceled around 2,000 flights over a three-day period.
- Both of the California DMV's backup systems failed at the same time after a power source malfunction, causing them to shutter for several days.
It's impossible to know the extent of damage you might incur during a disaster. But the potential losses are far too great to risk. It's the same reason why car manufacturers extensively test their safety features before a new model hits the streets. Because if things go wrong, the consequences could be catastrophic.
In a report by FEMA, a whopping nine out of 10 businesses that fail to restore operations within five days after a disaster, file for bankruptcy within a year.
Testing your DR plan will help you identify and fix inconsistencies and flaws before they become full-blown problems. Doing that can save you from an oversight that could either hurt your business or shut it entirely.
How do you test your DR plan?
There is no one-size-fits-all approach to testing your DR plan. Each organization has its own unique processes and systems that factor into the overall plan. With that in mind, here are the steps you need to take when getting started on DR testing:
Step 1: Define Your Goals
Setting well-defined goals will allow you to gauge the readiness of your disaster recovery system properly. They can also serve as benchmarks for future testing.
The most common key performance indicators that you should consider are:
- Recovery Time Object (RTO) - RTO is the measure of how quickly you need to restore your IT systems after a disaster before your business is seriously affected. An RTO of 2 hours is a good number to shoot for, however, some organizations are able to make do with an RTO of 24 to 48 hours.
- Recovery Point Objective (RPO) - Data loss may be unavoidable during major disasters. RPO determines how much data loss is acceptable after an unplanned incident. It refers to the age of the files that must be recovered from backup storage for normal operations to resume. For example, an RPO of 60 minutes requires that your system is backed up every hour. An RPO of 24 hours means you’re in good shape, but of course, a smaller number means fewer data lost.
Step 2: Establish Your Training Scenarios
There are many different scenarios that you should prepare for; the most common ones include:
- Internet and Power Outages
- Equipment Failure
- Human Error
- Natural Disaster
- Cybersecurity Threats
Each industry may have scenarios that apply specifically to its business. For example, sectors like healthcare have zero tolerance for internet and power outages. Testing for the right scenarios will help you develop a clear roadmap when thinking of alternative solutions should your systems fail.
For example, to test for a power outage scenario, you can try simulating it by shutting off the electricity for a set period of time and check how quickly the plan is enacted. You can also determine how long an alternative power source will last. Or, you could try other testing methods, which you can learn more about below.
Step 3: Choose the Right Testing Method
You can choose from several testing methods to help ensure your disaster recovery plan works. However, it's important to choose the one that works the best for your team, and that's because a disaster recovery plan is only as good as the people enacting it.
If your test is too complicated, it might just go over the heads of relevant personnel, which opens up your DR plan to failure.
Here are some of the common training methods you might want to consider:
- Walkthrough Drills - This method involves relevant team members demonstrating the steps they are expected to take during a disruption. It may include moving files to the right backup or contacting specific personnel.
- Tabletop Exercises - In this test, you can lay out specific scenarios and ask each team member what they would do should they encounter them. You should invite a representative from each department of your organization to ensure that the plan cascades to everyone.
- Sandbox - For this test, you can employ a third party who can simulate disaster scenarios. You can then deploy it to your organization and catch flaws in the plan before they cause headaches.
- Full Interruption Testing - This method will require technical expertise. It involves migrating your main systems to an alternate location, such as a virtual machine. You will then proceed to actually down the main system and attempt to recover it. It's a risky method. However, it's also the most thorough.
Step 4: Ensure Your Test is Well-Documented
With the number of things you need to take into account during your tests, it's essential to document everything so you can keep track. Detailed documentation can provide you with the information you need to evaluate and improve your DR plan and future testing efforts. Some info you may want to document include:
- list of relevant personnel and their contact info
- results of the test and areas of improvement
- limitations of the test, etc.
Step 5: Review the Results and Update Your Plan
Once testing is complete, you can now evaluate the results to ensure that your plan is up to your standards. However, do keep in mind to update your plans. Also, there are several key elements that organizations tend to overlook when updating their DR plan, these include:
- Contact list - This list should accurately reflect the contact details of relevant team members.
- IT Assets List - Similarly, this list should reflect any new or replaced hardware, software, or system you have in place.
- Systems Prioritization - As network environments change over time, you need to ensure that your documents reflect the current mission-critical systems you have in place.
- Security and Compliance Requirements - Your documents should also note the security upgrades and all other items you need for compliance to improve your DR plan.
How often should you test your DR plan?
How frequently you should test your DR plan depends entirely on the nature of your business. High turnover rates, process changes, or new regulations are all factors you need to consider when it comes to testing frequency.
That said, testing your DR plan at least once a year is a good practice to start with.
Ready to improve your disaster recovery plan?
We all know the value of a Disaster Recovery plan. However, having one in place and not testing it might be detrimental. That's because it gives you a false sense of security, and that's where the danger lies.
Testing your disaster recovery shouldn't be an afterthought; it should be at the forefront of your business continuity plan. That will help ensure that your plan kicks into place and mitigates losses during times when you need it most.
Need help assessing your disaster recovery plan? At ITS, we've spent over a decade helping businesses keep their technology running even through unexpected disasters. Fill out our form so we can help you out with a free tech assessment.