Upgrading SQL Server using Availability Groups – Checklist

This is a checklist put together and followed for an upgrade of a pair of physical SQL Server 2012 servers which hosted a single availability group of several terabytes of data with minimal downtime.

The availability group was configured with synchronous commit and automatic failover.

List of Steps

If possible, stop any ETL processes that put data into the affected databases – less traffic here means less traffic to catch up with when you resume the AG HADR.
Make sure all AGs hosted by the servers are on the same server
Put servers, clusters, listeners, everything into Maintenance Mode in SCOM / SQLMonitor / Solarwinds / whatever other monitoring system you use.
Disable Transaction Log backup jobs on all servers hosting the Availability Group(s)
Take Log backups
Configure the Availability Group(s) to manual failover
Suspend data movement for each database on the secondary server
Backup the system databases on the secondary server
Upgrade the secondary server and reboot if required (it might not be required, but it is recommended to do this anyway)
Resume data movement on the secondary, and allow the servers to synchronise.
Failover the availability group(s) to the upgraded server with no data loss
Check that data movement is suspended to the new secondary / unupgraded server due to version incompatibility
Upgrade other server and then reboot
Resume data movement on secondary, and allow to synchronise
(optional) failover AG with no data loss to test all is OK (this isn’t really required, but can be useful for extra peace of mind)
Reconfigure availability group(s) back to automatic failover (if that’s how it was)
Re-enable transaction log backup jobs
Take servers out of maintenance mode in SCOM/SQLMonitor/Solarwinds/Whatever

A couple of useful SQL queries

Suspend / Resume HADR for all databases on the server

This is the suspend version – change suspend to resume and rerun later

DECLARE @sql VARCHAR(MAX) = '';
SELECT @sql = @sql + 'ALTER DATABASE ' + QUOTENAME(DB_NAME(hars.database_id)) + ' SET HADR SUSPEND;' + CHAR(10) + CHAR(13)
FROM       sys.dm_hadr_database_replica_states AS hars
INNER JOIN sys.dm_hadr_availability_group_states AS hags
        ON hags.group_id = hars.group_id
WHERE hars.is_local = 1
ORDER BY DB_NAME(hars.database_id);
PRINT @sql; -- optional
EXEC (@sql);

Check health of replicas in the AG (run on primary)

SELECT ag.name,
ar.replica_server_name,
hars.is_local,
hars.role_desc,
hars.operational_state_desc,
hars.connected_state_desc,
hars.recovery_health_desc,
hars.synchronization_health_desc,
hars.last_connect_error_number,
hars.last_connect_error_description,
hars.last_connect_error_timestamp,
ag.failure_condition_level,
ag.health_check_timeout,
ar.replica_server_name,
ar.owner_sid,
ar.endpoint_url,
ar.availability_mode_desc,
ar.failover_mode_desc,
ar.session_timeout,
ar.primary_role_allow_connections_desc,
ar.secondary_role_allow_connections_desc,
ar.backup_priority,
ag.automated_backup_preference_desc,
ar.read_only_routing_url
FROM sys.dm_hadr_availability_replica_states AS hars
INNER JOIN sys.availability_groups AS ag
ON ag.group_id = hars.group_id
INNER JOIN sys.availability_replicas AS ar
ON ar.group_id = ag.group_id
AND ar.replica_id = hars.replica_id;

Monitor progress of catchup/synchronisation (run on primary)

SELECT ag.name,
adc.database_name,
ar.replica_server_name,
hdrs.is_local,
hdrs.synchronization_state,
hdrs.synchronization_state_desc,
hdrs.is_commit_participant,
hdrs.synchronization_health_desc,
hdrs.database_state_desc,
hdrs.is_suspended,
hdrs.suspend_reason,
hdrs.suspend_reason_desc,
hdrs.last_sent_time,
hdrs.last_received_time,
hdrs.last_hardened_time,
hdrs.last_redone_time,
hdrs.log_send_queue_size,
hdrs.log_send_rate,
hdrs.redo_queue_size,
hdrs.redo_rate,
hdrs.filestream_send_rate,
CEILING(hdrs.redo_queue_size / hdrs.redo_rate) / 60.0 AS est_redo_completion_minutes, -- approximately
hdrs.last_commit_time,
hdrs.low_water_mark_for_ghosts,
hdrs.secondary_lag_seconds -- for SQL 2016 and above
FROM sys.dm_hadr_database_replica_states AS hdrs
INNER JOIN sys.availability_groups AS ag
ON ag.group_id = hdrs.group_id
INNER JOIN sys.availability_databases_cluster AS adc
ON adc.group_database_id = hdrs.group_database_id
INNER JOIN sys.availability_replicas AS ar
ON ar.replica_id = hdrs.replica_id
WHERE hdrs.log_send_queue_size <> 0
OR hdrs.redo_queue_size <> 0;

3 Responses to Upgrading SQL Server using Availability Groups – Checklist

Pingback: Upgrading Servers in an Availability Group – Curated SQL
Alex Friedman says:

October 22, 2019 at 5:40 AM

There’s no actual need to disable log backups and suspend data movement, is there?

- thomasrushton says:
  
  October 22, 2019 at 7:57 AM
  
  The checklist above was, in part, provided by a Microsoft database guy, and that was definitely one of his recommendations, not mine…

Upgrading SQL Server using Availability Groups – Checklist

List of Steps

A couple of useful SQL queries

Suspend / Resume HADR for all databases on the server

Check health of replicas in the AG (run on primary)

Monitor progress of catchup/synchronisation (run on primary)

3 Responses to Upgrading SQL Server using Availability Groups – Checklist

Leave a comment Cancel reply

Thomas Rushton

Recent Posts

Archives

Categories

SQL Server blogs

Upgrading SQL Server using Availability Groups – Checklist

List of Steps

A couple of useful SQL queries

Suspend / Resume HADR for all databases on the server

Check health of replicas in the AG (run on primary)

Monitor progress of catchup/synchronisation (run on primary)

Share this:

Related

3 Responses to Upgrading SQL Server using Availability Groups – Checklist

Leave a comment Cancel reply

Thomas Rushton

Recent Posts

Archives

Categories

SQL Server blogs