SQL Server 2016 HA/DR: One size does not fit all
SQL Server 2016 has a lot of great options for organizations looking for high availability and disaster recovery (HA/DR). But, sometimes, the needs of the business can put constraints on the options you choose.
Recovery objectives are, of course, two primary constraints. Some workloads will require faster RTOs and lower RPOs than others.
Recovery time objective (RTO) - The targeted length of time from failure to restoration of business systems and services after a disaster.
But other constraints come into play as well. For example, SQL Server comes in a variety of flavors from Express to the Enterprise Edition. Enterprise has the most HA/DR options available, but your organization may have decided not to allocate the budget for Enterprise Edition. Somehow, you’ll need to achieve your objectives with the features you have available.
Another common example is the organization with mission-critical applications built on an older version of SQL Server that can’t yet be migrated to a newer version. They need an HA/DR solution that helps them meet their objectives, using the features found in the older version of SQL Server.
We discussed the various HA/DR options available in SQL Server 2016 on a webinar in conjunction with 451 Research. Mike Donaghey, TierPoint’s Manager of Database Administration Services, also shares a couple of case studies in which the constraints of the business impacted the HA/DR options deployed.
Watch the full webinar here. Here we’ll be going through many of the key points. First, let’s start with a quick review of the various HA/DR features available in SQL Server 2016.
Five SQL Server 2016 HA/DR Options
SQL Server 2016’s HA/DR options can be broken down into five primary features.
Database Mirroring has been traditionally used to create a replica of a single production database. However, you may want to note, Microsoft is phasing out database mirroring in future versions of SQL Server. The recommended functional replacements are Basic Availability Groups or AlwaysOn Availability Groups. However, as you’ll see from our case studies, that doesn’t mean you can’t leverage Mirroring while it lasts.
Basic Availability Groups
Basic Availability Groups is the simplest form of availability groups. Basic Availability Groups enable a primary database to maintain a single replica. This secondary replica remains inactive unless there is a need to failover.
AlwaysOn Availability Groups
AlwaysOn Availability Groups are groups of databases designed to failover together. This consists of a primary replica and between one and four secondary replicas. Replication is performed through SQL Server log-based data movement, and there is no need for shared storage.
AlwaysOn Availability Groups require separate SQL Server instances: Only one SQL Server instance may host an availability group’s primary replica. All secondary replicas reside on a separate instance, and each instance must reside on separate physical nodes or VMs in the cloud.
AlwaysOn Failover Clustering
AlwaysOn Failover Clustering is a SQL Server instance installed across nodes in a cluster with resource dependencies on shared disk storage. In other words, an AlwaysOn Failover Cluster requires symmetrical shared disk storage, such as a SAN or SMB file share. An AlwaysOn Failover Cluster instance can leverage AlwaysOn Availability Groups for remote disaster recovery at the database level.
Log Shipping operates at the database level to maintain one or more standby databases for one primary production database. Essentially, log shipping sends transaction log backups automatically from the primary database to read-only secondary databases.
Real-life client use cases for SQL Server 2016
Now, let’s take a look at how we leveraged SQL Server 2016 features for a couple of our clients.
Case Study #1: Infrastructure Constraints
In this example, our customer had some fairly standard RTO and RPO objectives. In the event of unplanned downtime, they wanted to be up and running within an hour, and they couldn’t afford to lose more than 15 minute’s worth of data.
On top of that, they also had some rather challenging infrastructure constraints that we needed to work around: their DR site was not managed by TierPoint, and the connectivity to it was through a relatively slow VPN. They also had limited disk capacity in their production site.
Looking at the SQL Server HA/DR options available to them, here’s what we saw:
If you’re a DBA and you’ve ever set up a cluster, you know that the resources behind it need to be solid. A slow VPN isn’t a great candidate for setting up a cluster, so we took that off the table.
They didn’t have the space in their production environment to create transaction log backups that could be sent to the secondary environment. And, since they didn’t want to add capacity, this wasn’t a viable option either.
Availability groups have an element of clustering in them, so again, their VPN connection didn’t allow us to leverage this option.
This left us with mirroring. This client was using SQL Server Enterprise Edition for their production environment, which allowed for asynchronous mirroring, so any performance issues at the DR site wouldn’t affect production. (Remember, we weren’t managing their DR site, so we had limited control over it.) Mirroring is also easy to configure, and the failover scripts are relatively simple.
Since Microsoft plans to discontinue Database Mirroring in future versions of SQL Server, this isn’t a great option for the long run, but this client needed a DR solution ASAP. We decided to leverage mirroring to protect their systems immediately, while we explore a viable path to modernize more of their infrastructure.
Because there were no dependencies, problems at the DR site would not affect the production site.
Case Study #2: HA/DR the Highest Priority
In our second example, we helped an organization with tight recovery objectives: They needed an RPO and RTO for their production environment that was as close to real-time as possible. Their infrastructure constraints weren’t as significant, but they did want to move reporting off the production database so it didn’t affect performance.
In addition, they didn’t want to have to alter their applications to access a different server name. We won’t go to deeply into this, but putting it simply, in a traditional cluster, you have a cluster name which has an IP address. When there’s a failover, that name follows the failover. They wanted an HA/DR solution where they didn’t want to have to alter the name that their applications hit if they need to failover.
Again, we looked at the options:
This would help our customer achieve their HA objectives but wouldn’t do much to help them meet their DR requirements. We explored geo clustering with them, but decided this would require a degree of coordination between vendors, which complicated things a bit too much for this organization.
At best, Log Shipping was going to give them an RPO of fifteen minutes, and they were looking for real-time. So, Log Shipping – not an option.
This option failed at every level for this client. It wouldn’t give them the reporting they needed. It wouldn’t allow for the automatic failover necessary for near real-time recovery. It wouldn’t allow for the shared name between the production and recovery sites.
This was really the only viable option for this organization, and it allowed us to meet every one of their requirements.
The image below shows their configuration. The Primary server is in the middle between the Standby Server and the Reporting Server. Replication to the Standby Server is synchronous because this is the site that needed the near real-time RTO and RPO. The Reporting Server is a read-only replica used for reporting purposes. Since it doesn’t require HA/DR, we set that server up for asynchronous replication.
A Managed Services Provider can help
If you don’t have deep expertise in SQL Server in-house, there’s an obvious benefit to reaching out to a Managed Service Provider (MSP) for help. One that provides Managed SQL Server Services can help you cover all day-to-day SQL Server DBA responsibilities. However, not all MSPs are well-versed in SQL Server HA/DR options, so look for someone who has experience with disaster recovery.
If you’d like to learn more about TierPoint’s Managed Services and Disaster Recovery solutions, contact us.