T-SQL Tuesday this month is hosted by Jes Borland. T-SQL Tuesday, which occurs on the second Tuesday of each month, provides an opportunity for many SQL Server bloggers to come together to discuss a single issue, selected by the host for that month. This month, Jes has selected the topic of Aggregation.
Size of Backups
The last few places I have worked have all been concerned with data growth rates. Can you blame them, with the cost of storage being what it is? A recent request from the Storage / Infrastructure guys was to provide an average for the weekly backup volumes of each of our SQL Server clusters. This was also to be of use when planning a new DR strategy and for calculating the load that this would place on the WAN between various offices. So here’s a query that can be used to perform this calculation. (This particular query was lifted from an environment which uses SQL 2000 & 2005; it should be OK on 2008 as well.)
DISCLAIMER: None of these queries makes any attempt to work the calculation only for weeks where we have a full set of backups!
This is a lash-up of the clunky runs-anywhere version.
First, we use the msdb database – that’s where the backup history lives. Or should live, assuming you’re not being overly enthusiastic with your tidy-up maintenance tasks.
USE msdb go
(NB: I’m going to miss the above out of subsequent examples…)
So, we grab the total backup size for each database for each week –
datepart(week, backup_start_date) – into a temporary table. This is a simple aggregate function (SUM) grouped by the database name and the week number. No fancy tricks.
SELECT Datepart(week, backup_start_date) AS WeekNumber, database_name, Sum(backup_size) AS TotalBackupSize INTO ##WeeklyBackupSizes FROM msdb.dbo.backupset GROUP BY Datepart(week, backup_start_date), database_name
We then query this temporary table, using another aggregate function (AVG – average), grouping just by the database name.
SELECT database_name, CONVERT(BIGINT, Avg(totalbackupsize)/(1024*1024)) AS AvgBackupSizeMB FROM ##WeeklyBackupSizes GROUP BY database_name ORDER BY database_name
Oh, and let’s just get rid of that worktable…
DROP TABLE ##WeeklyBackupSizes
Of course, if you don’t want the
database_name field in your resultset, you can miss that out from the various
SELECT statements, remembering to also remove it from the
GROUP BY statements.
However, this is not the most elegant solution given the fun new features available to us in SQL Server 2005 and above, so let’s try this again, this time using a CTE – Common Table Expression.
CTE version – for SQL Server 2005 and later
This time, instead of faffing about building temporary tables, and making sure we tidy up after ourselves, we’ll use a CTE. If you’re not familiar with these, they are conceptually similar to temporary tables, but they have far more limited shelf-life – just valid for that particular query – not for a query batch, just a query.
Don’t forget that the statement before this will need to be terminated with a semicolon.
WITH WeeklyBackupSizes AS ( SELECT Datepart(week, backup_start_date) AS WeekNumber, database_name, Sum(backup_size) AS TotalBackupSize FROM msdb.dbo.backupset GROUP BY Datepart(week, backup_start_date), database_name ) SELECT database_name, CONVERT(BIGINT, Avg(totalbackupsize)/(1024*1024)) AS AvgBackupSizeMB FROM WeeklyBackupSizes GROUP BY database_name ORDER BY database_name
As you can see, this query is almost identical to the previous query. CTEs are a very powerful way of creating these short-term views of data; however, note that they are limited – not everything that can go into a SELECT statement can go into a CTE. For more on CTEs, read Microsoft’s SQL Server documentation on the subject.
Partition Functions / OVER clause – for SQL Server 2005 and later
An alternative to using the Aggregate functions with a
GROUP BY clause is to use partition functions and the
OVER clause. This time, our query looks like:
WITH WeeklyBackupTotals AS ( SELECT DISTINCT Datepart(week, backup_start_date) AS WeekNumber, database_name, Sum(backup_size) OVER (PARTITION BY Datepart(week, backup_start_date), database_name) AS TotalBackupSize FROM backupset ) SELECT DISTINCT database_name, CONVERT(BIGINT, Avg(TotalBackupSize) OVER (Partition BY database_name)/(1024*1024)) AS AvgBackupSizeMB FROM WeeklyBackupTotals ORDER BY Database_name
Note the use of the
DISTINCT clause in the definition of the CTE – if this weren’t there, we would get a row for each record within the
msdb..backupset table. This is one of the less interesting uses of the
OVER clause – more interesting things happen when you look at ranking functions such as
DENSE_RANK… Again, see Microsoft’s SQL Server documentation for the OVER BY clause, and for the various ranking functions.
As is usually the case, it’s taken me about ten times as long to write this post as it took to write the query and send the results back to the infrastructure boys in the first place. But I hope it’s been of use – both to me (re-iterating the use of the OVER clause) and to you.