Upgrading SQL Server using Availability Groups – Checklist

This is a checklist put together and followed for an upgrade of a pair of physical SQL Server 2012 servers which hosted a single availability group of several terabytes of data with minimal downtime.

The availability group was configured with synchronous commit and automatic failover.

List of Steps

  • If possible, stop any ETL processes that put data into the affected databases – less traffic here means less traffic to catch up with when you resume the AG HADR.
  • Make sure all AGs hosted by the servers are on the same server
  • Put servers, clusters, listeners, everything into Maintenance Mode in SCOM / SQLMonitor / Solarwinds / whatever other monitoring system you use.
  • Disable Transaction Log backup jobs on all servers hosting the Availability Group(s)
  • Take Log backups
  • Configure the Availability Group(s) to manual failover
  • Suspend data movement for each database on the secondary server
  • Backup the system databases on the secondary server
  • Upgrade the secondary server and reboot if required (it might not be required, but it is recommended to do this anyway)
  • Resume data movement on the secondary, and allow the servers to synchronise.
  • Failover the availability group(s) to the upgraded server with no data loss
  • Check that data movement is suspended to the new secondary / unupgraded server due to version incompatibility
  • Upgrade other server and then reboot
  • Resume data movement on secondary, and allow to synchronise
  • (optional) failover AG with no data loss to test all is OK (this isn’t really required, but can be useful for extra peace of mind)
  • Reconfigure availability group(s) back to automatic failover (if that’s how it was)
  • Re-enable transaction log backup jobs
  • Take servers out of maintenance mode in SCOM/SQLMonitor/Solarwinds/Whatever

A couple of useful SQL queries

Suspend / Resume HADR for all databases on the server

This is the suspend version – change suspend to resume and rerun later

DECLARE @sql VARCHAR(MAX) = '';
SELECT @sql = @sql + 'ALTER DATABASE ' + QUOTENAME(DB_NAME(hars.database_id)) + ' SET HADR SUSPEND;' + CHAR(10) + CHAR(13)
FROM       sys.dm_hadr_database_replica_states AS hars
INNER JOIN sys.dm_hadr_availability_group_states AS hags
        ON hags.group_id = hars.group_id
WHERE hars.is_local = 1
ORDER BY DB_NAME(hars.database_id);
PRINT @sql; -- optional
EXEC (@sql);

Check health of replicas in the AG (run on primary)

SELECT ag.name,
ar.replica_server_name,
hars.is_local,
hars.role_desc,
hars.operational_state_desc,
hars.connected_state_desc,
hars.recovery_health_desc,
hars.synchronization_health_desc,
hars.last_connect_error_number,
hars.last_connect_error_description,
hars.last_connect_error_timestamp,
ag.failure_condition_level,
ag.health_check_timeout,
ar.replica_server_name,
ar.owner_sid,
ar.endpoint_url,
ar.availability_mode_desc,
ar.failover_mode_desc,
ar.session_timeout,
ar.primary_role_allow_connections_desc,
ar.secondary_role_allow_connections_desc,
ar.backup_priority,
ag.automated_backup_preference_desc,
ar.read_only_routing_url
FROM sys.dm_hadr_availability_replica_states AS hars
INNER JOIN sys.availability_groups AS ag
ON ag.group_id = hars.group_id
INNER JOIN sys.availability_replicas AS ar
ON ar.group_id = ag.group_id
AND ar.replica_id = hars.replica_id;

Monitor progress of catchup/synchronisation (run on primary)

SELECT ag.name,
adc.database_name,
ar.replica_server_name,
hdrs.is_local,
hdrs.synchronization_state,
hdrs.synchronization_state_desc,
hdrs.is_commit_participant,
hdrs.synchronization_health_desc,
hdrs.database_state_desc,
hdrs.is_suspended,
hdrs.suspend_reason,
hdrs.suspend_reason_desc,
hdrs.last_sent_time,
hdrs.last_received_time,
hdrs.last_hardened_time,
hdrs.last_redone_time,
hdrs.log_send_queue_size,
hdrs.log_send_rate,
hdrs.redo_queue_size,
hdrs.redo_rate,
hdrs.filestream_send_rate,
CEILING(hdrs.redo_queue_size / hdrs.redo_rate) / 60.0 AS est_redo_completion_minutes, -- approximately
hdrs.last_commit_time,
hdrs.low_water_mark_for_ghosts,
hdrs.secondary_lag_seconds -- for SQL 2016 and above
FROM sys.dm_hadr_database_replica_states AS hdrs
INNER JOIN sys.availability_groups AS ag
ON ag.group_id = hdrs.group_id
INNER JOIN sys.availability_databases_cluster AS adc
ON adc.group_database_id = hdrs.group_database_id
INNER JOIN sys.availability_replicas AS ar
ON ar.replica_id = hdrs.replica_id
WHERE hdrs.log_send_queue_size <> 0
OR hdrs.redo_queue_size <> 0;
This entry was posted in Uncategorized and tagged , , , , , , . Bookmark the permalink.

3 Responses to Upgrading SQL Server using Availability Groups – Checklist

  1. Pingback: Upgrading Servers in an Availability Group – Curated SQL

  2. Alex Friedman says:

    There’s no actual need to disable log backups and suspend data movement, is there?

    • thomasrushton says:

      The checklist above was, in part, provided by a Microsoft database guy, and that was definitely one of his recommendations, not mine…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.