Unplanned downtime can cost enterprise-level businesses thousands of dollars per minute. Clearly, you need to have a disaster recovery plan that minimizes your losses. But obsessing about downtime can drive up costs, too, as it can lead you to invest in a more robust solution than you need.
To create the most effective disaster recovery and business continuity plan for your organization, you need to assess the needs of each of your workloads separately using two critical metrics: RTO and RPO.
- Recovery time objective (RTO) - The targeted length of time from failure to restoration of business systems and services after a disaster.
- Recovery point objective (RPO) – The maximum amount of data loss the business deems acceptable following a disaster or failure.
Here are three questions you should ask when defining RTOs and RPOs for each of your workloads:
Q1: How mission critical is this workload?
The answer to this question largely defines your RTO (and somewhat your RPO), so we’re going to put a lot of emphasis on this one before moving on to a couple additional questions you should ask.
We can divide the criticality of applications into four tiers:
- Mission critical - We tend to use the term “mission-critical” loosely, so to help you set your disaster recovery objectives effectively, we need to lock down what we mean. In this discussion, mission-critical applies only to those workloads that the organization requires in order to continue to do business. If they’re down, the business is down.
For an organization doing the majority of its business online, this might be the customer-facing website as well as any order entry or inventory systems that allow customers to continue to place orders. Obviously, these systems require the lowest RTOs and RPOs and are candidates for high-availability, synchronous replication. With the technologies available today, some of these solutions can bring recovery times down to near zero.
- Critical – These are vital workloads in which you don’t want to lose any data, but the business can keep functioning for at least a little while even without them. Financial systems, human resources, and payroll often fall into this category. Asynchronous replication, where the data is written to the primary storage array first and then to the fail over systems, may be adequate. There will be some data loss due to the delay, but it can be measured in minutes.
- Important – Important workloads may also be vital, but they are one step further removed from mission critical. You might decide, for example, that you can do without your marketing execution systems for several days without suffering any serious side effects. These systems can be handled with the least expensive disaster recovery options such as asynchronous replication or physical backup system, e.g., tape backup.
- Not important – Finally, you may have an archive of old data that you need to keep for one reason or another, though it doesn’t need to be accessed frequently (or even at all) in the course of doing business. Again, the least expensive solutions can be used to backup these systems. Just remember that the mediums used in physical backups can degrade over time, so if you absolutely need to keep these files in an archive, you may want to back up your backups from time to time.
Q2: How easily can the data be reconstructed?
The answer to this question can help you refine both your RTOs and RPOs for each workload. Let’s say, for example, that you prioritized something simple like the systems your field service personnel use to log their time and call details as important, but not critical or mission critical. Your technicians can continue to make customer calls and just log their time and the results of the calls on the old paper systems you used to use. Once their systems are back online, a data entry clerk can enter the details from the call sheets into the system. In your initial assessment, you put this workload in the “important” tier and marked it as a candidate for tape backup.
You may want to rethink that. If your backups get run every night, you could lose as much as a day’s worth of data. (Remember, you only started using the call sheets again after the disaster occurred.) Depending on the business you’re in, that data might not be so easy to construct. For example, if a technician makes six or seven calls a day, it could be extremely difficult for them to recall what happened on every call, the time spent, and what the results were. This could have a significant negative impact on customer service.
Q3: What compliance requirements govern this workload?
The answer to this final question can affect both RTO and RPO. For instance, HIPAA requires healthcare providers (and their Business Associates) to have a disaster recovery plan that covers any systems that contain ePHI to ensure availability. Though the regulations stop short of dictating RTOs and RPOs, if you’re in healthcare, the nature of the data you collect may require lower targets than other industries. If a natural disaster, such as a significant storm, strikes your area, faster recovery and less data loss means you’ll be better able to treat the healthcare needs in your community.
Find the best disaster recovery plan for you
When we work with clients on their disaster recovery strategies, setting RTOs and RPOs per workload is one of the first things we do. It’s vital to ensuring their disaster recovery plan meets their needs, without blowing their budget. Contact us to learn more about building the right disaster recovery plan for your business.