By Pete Schmitt, CTO, cStor
2 Key Areas Your Disaster Recovery Strategy May Be Missing
So far in 2017, we have seen a combination of both large natural disasters and man-made disasters in the form of cyberattacks that have wreaked havoc on businesses around the world. It’s no wonder the need for Disaster Recovery is now more common than ever – becoming a staple for businesses of all sizes to not only protect their data and financial assets, but also to ensure there is a known plan and timeline to recovery for the business. In fact, Gartner now views Disaster Recovery as as Service (DRaaS) as a mainstream market with potential to become a $3.73 billion business by 2021, but states that “because it is mainstream does not make it less complex for potential customers.”
Some of the complexity lies in the fact that many companies have adopted a “set it and forget it” approach to Disaster Recovery, failing to regularly review, test and update their plans. Given that the average rate of downtime hovers around $7,900 per minute, a plan that is not current and tested can have a devastating impact to the business and its reputation.
With that in mind, here are two recommended areas of focus to ensure your Disaster Recovery plan operates when and how you need it to during a real DR scenario.
Testing and Frequency
The first step to ensure your plan is working and up-to-date is regular testing. Many companies like to conduct DR testing on a quarterly basis, but annual testing is recommended at a minimum.
When preparing for your test, there are three categories to focus on:
- Proactive Testing – Testing to ensure your data and apps are backed up and/or highly available, redundancies for power and cooling systems in all data centers are in place, and all necessary data center infrastructure and backups are in place and accessible to make your backup site ready as soon as needed.
- Detective Testing – Scrutinize your plan and processes to determine what new areas or items are not inline with your current DR plan.
- Corrective Testing – Are the areas you updated working correctly as documented and ready to enact? If not, make sure to align all your relevant workloads to comply with your DR methodology.
There are different ways to conduct a test of your DR strategy. You will want to select the test that makes the most sense for your specific business and downtime requirements.
Real-World Test – This is the best method of testing to ensure everything in your plan is functioning as it should. The test is usually conducted outside of normal business hours, and systems are brought down and put into a full Disaster Recovery scenario. Real World DR Testing is often used in organizations that have defined, “normal” business hours since it can be difficult to accomplish if the businesses operate in a 24x7x365 scenario.
Simulated Disaster Recovery Scenario – This method utilizes the most recent copy of the DR data and systems bringing them online in an isolated environment at the DR site. The benefit of this testing method is that it offers no disruption to users. The downside is that you may not be able to test network functionality fully. But it still provides a good way to test your DR plan in organizations where 24/7 operations are required (such as online commerce or taxi dispatch services), or where it’s difficult to find downtime for the Real World Test.
Business Continuity Disaster Recovery (BCDR) Scenario – This is when an organization has at least two data centers and/or cloud-providers in different locations where both sites serve up critical business apps in parallel and can sustain the loss of one site without data loss. While this solution can be the most expensive one, it is a constant test of your DR strategy and lessens the need to perform a Real-World Test on a scheduled basis.
Plan Documentation and Updates
Change is the only constant in today’s world. Personnel, organizational structure, divisions, data points and applications may all change in a small course of time. Disaster Recovery plans that were drafted only a few months ago may already be out of date. Just as with testing, it is important to ensure your plan is documented and updated on a regular basis – annually at a minimum.
Review your plan to ensure the following critical components are well documented:
- The ability to sustain a major failure at a production site such as physical destruction or major failure of the infrastructure that local high-availability components cannot recover from
- Your second site has a current and regularly updated copy of your data
- Your core business applications are defined so that you know how to plan for their recovery
- The recovery plan has a run book to bring systems online in the correct order
- Networking components are included to ensure users can access the DR site even in a failure scenario (via home or alternate office location)
A crucial area overlooked at times is ensuring the business defines the Recovery Point Objectives (RPO) and Recovery Time Objectives (RTO) objectives. According to Gartner, 41% of customers did not have SLAs for RPO or RTO documented in their plans. When defining the RTO and RPO objectives, it is critical to go beyond IT to get input from key stakeholders in each area of the organization, such as application teams, security teams and operations, as well as HR, marketing and communication to communicate the plan in event of a disaster. Failure to do so can cause serious problems if those commitments are not properly documented or met in an actual DR scenario.
When establishing RTO objectives, another mistake people make is including all of their applications and systems into a “Zero RTO” requirement. This can unnecessarily explode your DR expense. It is important to define which systems and data are critical in the RTO objectives, and which items can fall outside of those defined recovery times. This will help keep your DR costs in check.
If you would like more help developing, updating or testing your DR strategy, contact us. cStor solution architects are well versed in understanding client needs, budget, backup, RTO and RPO, and can work with your key stakeholders to help define a plan and testing scenario to best fit your organization.