T-SQL Tuesday #16 – Aggregates and Average Weekly Backup Sizes


T-SQL Tuesday this month is hosted by Jes Borland. T-SQL Tuesday, which occurs on the second Tuesday of each month, provides an opportunity for many SQL Server bloggers to come together to discuss a single issue, selected by the host for that month. This month, Jes has selected the topic of Aggregation.

Size of Backups

The last few places I have worked have all been concerned with data growth rates.  Can you blame them, with the cost of storage being what it is?  A recent request from the Storage / Infrastructure guys was to provide an average for the weekly backup volumes of each of our SQL Server clusters.  This was also to be of use when planning a new DR strategy and for calculating the load that this would place on the WAN between various offices.  So here’s a query that can be used to perform this calculation.  (This particular query was lifted from an environment which uses SQL 2000 & 2005; it should be OK on 2008 as well.)

DISCLAIMER: None of these queries makes any attempt to work the calculation only for weeks where we have a full set of backups!

Run Anywhere

This is a lash-up of the clunky runs-anywhere version.

First, we use the msdb database – that’s where the backup history lives. Or should live, assuming you’re not being overly enthusiastic with your tidy-up maintenance tasks.

USE msdb
go

(NB: I’m going to miss the above out of subsequent examples…)

So, we grab the total backup size for each database for each week – datepart(week, backup_start_date) – into a temporary table. This is a simple aggregate function (SUM) grouped by the database name and the week number. No fancy tricks.

SELECT  Datepart(week, backup_start_date) AS WeekNumber,
        database_name,
        Sum(backup_size)                  AS TotalBackupSize
INTO    ##WeeklyBackupSizes
FROM    msdb.dbo.backupset
GROUP BY Datepart(week, backup_start_date),
         database_name

We then query this temporary table, using another aggregate function (AVG – average), grouping just by the database name.

SELECT  database_name,
         CONVERT(BIGINT, Avg(totalbackupsize)/(1024*1024)) AS AvgBackupSizeMB
FROM    ##WeeklyBackupSizes
GROUP BY database_name
ORDER BY database_name

Oh, and let’s just get rid of that worktable…

DROP TABLE ##WeeklyBackupSizes

Of course, if you don’t want the database_name field in your resultset, you can miss that out from the various SELECT statements, remembering to also remove it from the GROUP BY statements.

However, this is not the most elegant solution given the fun new features available to us in SQL Server 2005 and above, so let’s try this again, this time using a CTE – Common Table Expression.

CTE version – for SQL Server 2005 and later

This time, instead of faffing about building temporary tables, and making sure we tidy up after ourselves, we’ll use a CTE. If you’re not familiar with these, they are conceptually similar to temporary tables, but they have far more limited shelf-life – just valid for that particular query – not for a query batch, just a query.

Don’t forget that the statement before this will need to be terminated with a semicolon.

WITH WeeklyBackupSizes AS (
    SELECT Datepart(week, backup_start_date) AS WeekNumber,
           database_name,
           Sum(backup_size)                  AS TotalBackupSize
    FROM   msdb.dbo.backupset
    GROUP  BY Datepart(week, backup_start_date),
    database_name
)
SELECT database_name,
CONVERT(BIGINT, Avg(totalbackupsize)/(1024*1024)) AS AvgBackupSizeMB
FROM   WeeklyBackupSizes
GROUP  BY database_name
ORDER  BY database_name

As you can see, this query is almost identical to the previous query. CTEs are a very powerful way of creating these short-term views of data; however, note that they are limited – not everything that can go into a SELECT statement can go into a CTE. For more on CTEs, read Microsoft’s SQL Server documentation on the subject.

Partition Functions / OVER clause – for SQL Server 2005 and later

An alternative to using the Aggregate functions with a GROUP BY clause is to use partition functions and the OVER clause. This time, our query looks like:

WITH WeeklyBackupTotals AS 
(
    SELECT DISTINCT Datepart(week, backup_start_date)                                                     AS WeekNumber,
    database_name,
    Sum(backup_size) OVER (PARTITION BY Datepart(week, backup_start_date), database_name) AS TotalBackupSize
    FROM   backupset
)
SELECT DISTINCT database_name,
       CONVERT(BIGINT, Avg(TotalBackupSize) OVER (Partition BY database_name)/(1024*1024)) AS AvgBackupSizeMB
FROM   WeeklyBackupTotals
ORDER  BY Database_name

Note the use of the DISTINCT clause in the definition of the CTE – if this weren’t there, we would get a row for each record within the msdb..backupset table. This is one of the less interesting uses of the OVER clause – more interesting things happen when you look at ranking functions such as ROW_NUMBER, or DENSE_RANK… Again, see Microsoft’s SQL Server documentation for the OVER BY clause, and for the various ranking functions.

As is usually the case, it’s taken me about ten times as long to write this post as it took to write the query and send the results back to the infrastructure boys in the first place.  But I hope it’s been of use – both to me (re-iterating the use of the OVER clause) and to you.

Advertisements
This entry was posted in SQLServerPedia Syndication, T-SQL Tuesday and tagged , , , , . Bookmark the permalink.

2 Responses to T-SQL Tuesday #16 – Aggregates and Average Weekly Backup Sizes

  1. Pingback: T-SQL Tuesday #018 – CTEs – They’re not always good for you | The Lone DBA

  2. Pingback: Less Than Dot - Blog - Awesome

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s