Which databases were backed up in which backup task?

It’s a confusing question / title / headline. Let’s see if I can clarify what I’m after.

For each scheduled backup job, which databases were backed up? And (more to the point) which databases weren’t backed up? And how can I find this without trawling through the logs myself?

Assumptions

  1. Backup jobs that do full backups don’t overlap
  2. There’s nothing else doing full backups
  3. erm…
  4. that’s it

The reason I was thinking about this is that we have occasional-but-annoyingly-frequent backup job failures, wherein most of the databases back up just fine, but the odd one fails. (SharePoint box, I’m looking at you…) Rather than trawling through the error logs to find out which particular database didn’t back up successfully, I wanted a query to do the heavy lifting. Yes, I’m a lazy lone DBA…

The key components to this are:

  1. MSDB’s sysjobs and sysjobhistory tables
  2. MSDB’s backupset table
  3. JOINing these to get some useful data in the right form for the inevitable
  4. PIVOT
  5. and join with sysdatabases to see if we’re missing anything.

Right then. Here we go.

MSDB’s sysjobs and sysjobhistory tables

The MSDB database is full of useful bits and pieces, including the database tables used by the SQL Agent service. Two of these are required today:

  1. sysjobs – basic header-type information about the various scheduled tasks – we’re only really interested in the task name
  2. sysjobhistory – information about individual runs of those tasks – we want to get at the recent occurrences of those tasks, in particular the start & end dates

Due to various inconsistencies in the environment where this query was developed, I can’t say that the backup job name is a particular thing, but I can say that it includes the words “backup” and “full”. I’m only interested in the last ten days of data.

SELECT  jh.instance_id ,
        msdb.dbo.agent_datetime(jh.run_date, jh.run_time) AS RunStartDateTime ,
        DATEADD(SECOND, CONVERT(INT, RIGHT('000000' + CONVERT(VARCHAR(20), jh.run_duration), 2)),
                DATEADD(MINUTE, CONVERT(INT, SUBSTRING(RIGHT('000000' + CONVERT(VARCHAR(20), jh.run_duration), 6), 3, 2)),
                        DATEADD(HOUR, CONVERT(INT, LEFT(RIGHT('000000' + CONVERT(VARCHAR(20), jh.run_duration), 6), 2)),
                                DATEADD(DAY, jh.run_duration / 1000000, msdb.dbo.agent_datetime(jh.run_date, jh.run_time))))) AS RunFinishDateTime
FROM    msdb.dbo.sysjobhistory jh
        LEFT JOIN msdb.dbo.sysjobs j ON j.job_id = jh.job_id
WHERE   LOWER(j.name) LIKE '%backup%'
        AND LOWER(j.name) LIKE '%full%'
        AND jh.step_id = 0
        AND msdb.dbo.agent_datetime(jh.run_date, jh.run_time) >= DATEADD(DAY, -10, GETDATE());

Some results to show that we’re on the right lines:
SysJobs and SysJobHistory Query Results

MSDB’s backupset table

MSDB really is a useful little database – also containing various tables relating to SQL Server database backups and restores. Here, we’re just interested in the backupset table, as it contains summary information about backups. This will do for our purposes.

We only want a very small set of information here – the database name, and the creation and finish date of the backup itself.

Again, we’re only interested in full backups that were taken in the last ten days.

SELECT  backup_start_date ,
        backup_finish_date ,
        database_name AS DBName
FROM    msdb.dbo.backupset
WHERE   type = 'D'
        AND backup_start_date > DATEADD(DAY, -10, GETDATE());

And some results:

BackupSet Results

JOIN these together

This is where it starts to get a bit more interesting. What we need to do is join the above two queries together based on database backup files that were created between start & end dates of a particular database backup job. Results required are the name of the database, and the start time of the database backup scheduled task.

I’m joining these using CTEs rather than creating a single SELECT statement, mainly to aid readability. This might come back to bite me later, but, y’know, this is an ongoing thing.

WITH    JobRuns
          AS ( SELECT   jh.instance_id ,
                        msdb.dbo.agent_datetime(jh.run_date, jh.run_time) AS RunStartDateTime ,
                        DATEADD(SECOND, CONVERT(INT, RIGHT('000000' + CONVERT(VARCHAR(20), jh.run_duration), 2)),
                                DATEADD(MINUTE, CONVERT(INT, SUBSTRING(RIGHT('000000' + CONVERT(VARCHAR(20), jh.run_duration), 6), 3, 2)),
                                        DATEADD(HOUR, CONVERT(INT, LEFT(RIGHT('000000' + CONVERT(VARCHAR(20), jh.run_duration), 6), 2)),
                                                DATEADD(DAY, jh.run_duration / 1000000, msdb.dbo.agent_datetime(jh.run_date, jh.run_time))))) AS RunFinishDateTime
               FROM     msdb.dbo.sysjobhistory jh
                        LEFT JOIN msdb.dbo.sysjobs j ON j.job_id = jh.job_id
               WHERE    LOWER(j.name) LIKE '%backup%'
                        AND LOWER(j.name) LIKE '%full%'
                        AND jh.step_id = 0
                        AND msdb.dbo.agent_datetime(jh.run_date, jh.run_time) >= DATEADD(DAY, -10, GETDATE())
             ),
        BackupsTaken
          AS ( SELECT   backup_start_date ,
                        backup_finish_date ,
                        database_name AS DBName
               FROM     msdb.dbo.backupset
               WHERE    type = 'D'
                        AND backup_start_date > DATEADD(DAY, -10, GETDATE())
             )
    SELECT  CONVERT(VARCHAR(20), RunStartDateTime, 120) AS RunStarted ,
            BackupsTaken.DBName
    FROM    JobRuns
            LEFT JOIN BackupsTaken ON JobRuns.RunStartDateTime <= BackupsTaken.backup_start_date
                                      AND BackupsTaken.backup_finish_date <= JobRuns.RunFinishDateTime;

And, as is obligatory, some results:

BackupSet Joined with SysJobs and SysJobHistory

This is looking hopeful. Ugly query, though, and it’s about to get uglier…

PIVOT

The next step is to pivot the data so we can see the results in the more friendly (well…) format.

The output I’m looking for is a table with one row per database, and one column per backup job / scheduled task. In effect, we’ll end up with a series of ticks and crosses (or ones and zeroes) to indicate success/failure. And this is where things get really gnarly.

Because we’re going to be running this on multiple servers, and they don’t all share the same backup schedule, I need to dynamically generate the column headers. I’m using the XML concatenation trick to generate a list of the full backup job start times. This is being held in a string that will be used to generate the field names in the final result set. I’m not going to show the PIVOT right now…

DECLARE @RunStartedList VARCHAR(MAX);
SELECT  @RunStartedList = STUFF(( SELECT    ', ' + QUOTENAME(CONVERT(VARCHAR(20), msdb.dbo.agent_datetime(jh.run_date, jh.run_time), 120))
                                  FROM      msdb.dbo.sysjobhistory jh
                                            LEFT JOIN msdb.dbo.sysjobs j ON j.job_id = jh.job_id
                                  WHERE     LOWER(j.name) LIKE '%backup%'
                                            AND LOWER(j.name) LIKE '%full%'
                                            AND jh.step_id = 0
                                            AND msdb.dbo.agent_datetime(jh.run_date, jh.run_time) >= DATEADD(DAY, -10, GETDATE())
                                  ORDER BY  msdb.dbo.agent_datetime(jh.run_date, jh.run_time) DESC
                                FOR
                                  XML PATH('')
                                ), 1, 2, '');
SELECT  @RunStartedList;

The output from this is just a big old string:

[2016-07-17 21:00:00], [2016-07-16 21:00:00], [2016-07-15 21:00:00], [2016-07-14 21:00:00], [2016-07-13 21:00:00], [2016-07-12 21:00:00], [2016-07-11 21:00:00], [2016-07-10 21:00:00], [2016-07-09 21:00:00], [2016-07-08 21:00:00]

So, putting this all together, we now have a query that looks like this:

WITH  JobRuns
          AS ( SELECT   jh.instance_id ,
                        msdb.dbo.agent_datetime(jh.run_date, jh.run_time) AS RunStartDateTime ,
                        DATEADD(SECOND, CONVERT(INT, RIGHT('000000' + CONVERT(VARCHAR(20), jh.run_duration), 2)),
                                DATEADD(MINUTE, CONVERT(INT, SUBSTRING(RIGHT('000000' + CONVERT(VARCHAR(20), jh.run_duration), 6), 3, 2)),
                                        DATEADD(HOUR, CONVERT(INT, LEFT(RIGHT('000000' + CONVERT(VARCHAR(20), jh.run_duration), 6), 2)),
                                                DATEADD(DAY, jh.run_duration / 1000000, msdb.dbo.agent_datetime(jh.run_date, jh.run_time))))) AS RunFinishDateTime
               FROM     msdb.dbo.sysjobhistory jh
                        LEFT JOIN msdb.dbo.sysjobs j ON j.job_id = jh.job_id
               WHERE    LOWER(j.name) LIKE '%backup%'
                        AND LOWER(j.name) LIKE '%full%'
                        AND jh.step_id = 0
                        AND msdb.dbo.agent_datetime(jh.run_date, jh.run_time) >= DATEADD(DAY, -10, GETDATE())
             ),
        BackupsTaken
          AS ( SELECT   backup_start_date ,
                        backup_finish_date ,
                        database_name AS DBName
               FROM     msdb.dbo.backupset
               WHERE    type = 'D'
             ),
        BackupPivot
          AS ( SELECT   DBName ,
                        [2016-07-17 21:00:00] ,
                        [2016-07-16 21:00:00] ,
                        [2016-07-15 21:00:00] ,
                        [2016-07-14 21:00:00] ,
                        [2016-07-13 21:00:00] ,
                        [2016-07-12 21:00:00] ,
                        [2016-07-11 21:00:00] ,
                        [2016-07-10 21:00:00] ,
                        [2016-07-09 21:00:00] ,
                        [2016-07-08 21:00:00]
               FROM     ( SELECT    CONVERT(VARCHAR(20), RunStartDateTime, 120) AS RunStarted ,
                                    BackupsTaken.DBName
                          FROM      JobRuns
                                    LEFT JOIN BackupsTaken ON JobRuns.RunStartDateTime <= BackupsTaken.backup_start_date
                                                              AND BackupsTaken.backup_finish_date <= JobRuns.RunFinishDateTime
                          WHERE     JobRuns.RunStartDateTime >= DATEADD(DAY, -10, GETDATE())
                        ) AS src PIVOT      ( COUNT(RunStarted) FOR RunStarted IN ( [2016-07-17 21:00:00], [2016-07-16 21:00:00], [2016-07-15 21:00:00],
                                                                                    [2016-07-14 21:00:00], [2016-07-13 21:00:00], [2016-07-12 21:00:00],
                                                                                    [2016-07-11 21:00:00], [2016-07-10 21:00:00], [2016-07-09 21:00:00],
                                                                                    [2016-07-08 21:00:00] ) ) AS pvt
             )
    SELECT  sd.name AS [SysDatabases.Name] ,
            bp.*
    FROM    BackupPivot bp
            FULL OUTER JOIN master.sys.databases sd ON bp.DBName = sd.name
    ORDER BY sd.name ,
            bp.DBName; 

You think that’s ugly? Wait for the full dynamic monstrosity…

Dynamic query, with added sysdatabases

Finally, I’m going to make this thing dynamic. I may want to change the number of days of backup history we’re going to look at, and I will definitely want to run it on servers that have different backup schedules, so can’t hardcode the backup job times. And, for bonus information, I’m going to join the results with master..sysdatabases so we can get information about databases that aren’t being backed up.

Here’s the query:

DECLARE @SQL VARCHAR(MAX);
DECLARE @RunStartedList VARCHAR(MAX);
DECLARE @NumDays INT;
SELECT  @NumDays = 10; -- not doing this as a single line declare-define, as we might be running on older versions.
 
IF @NumDays > 0
    BEGIN
        SELECT  @NumDays = @NumDays * ( -1 );
    END;
 
SELECT  @RunStartedList = STUFF(( SELECT    ', ' + QUOTENAME(CONVERT(VARCHAR(20), msdb.dbo.agent_datetime(jh.run_date, jh.run_time), 120))
                                  FROM      msdb.dbo.sysjobhistory jh
                                            LEFT JOIN msdb.dbo.sysjobs j ON j.job_id = jh.job_id
                                  WHERE     LOWER(j.name) LIKE '%backup%'
                                            AND LOWER(j.name) LIKE '%full%'
                                            AND jh.step_id = 0
                                            AND msdb.dbo.agent_datetime(jh.run_date, jh.run_time) >= DATEADD(DAY, @NumDays, GETDATE())
                                  ORDER BY  msdb.dbo.agent_datetime(jh.run_date, jh.run_time) DESC
                                FOR
                                  XML PATH('')
                                ), 1, 2, '');
 
SELECT  @SQL = '
WITH    JobRuns
          AS ( SELECT   jh.instance_id ,
                        msdb.dbo.agent_datetime(jh.run_date, jh.run_time) AS RunStartDateTime ,
                        DATEADD(SECOND, CONVERT(INT, RIGHT(''000000'' + CONVERT(VARCHAR(20), jh.run_duration), 2)),
                                DATEADD(MINUTE, CONVERT(INT, SUBSTRING(RIGHT(''000000'' + CONVERT(VARCHAR(20), jh.run_duration), 6), 3, 2)),
                                        DATEADD(HOUR, CONVERT(INT, LEFT(RIGHT(''000000'' + CONVERT(VARCHAR(20), jh.run_duration), 6), 2)),
                                                DATEADD(DAY, jh.run_duration / 1000000, msdb.dbo.agent_datetime(jh.run_date, jh.run_time))))) AS RunFinishDateTime
               FROM     msdb.dbo.sysjobhistory jh
                        LEFT JOIN msdb.dbo.sysjobs j ON j.job_id = jh.job_id
               WHERE    LOWER(j.name) LIKE ''%backup%''
                        AND LOWER(j.name) LIKE ''%full%''
                        AND jh.step_id = 0
                                         AND msdb.dbo.agent_datetime(jh.run_date, jh.run_time) >= DATEADD(DAY, ' + CONVERT(NVARCHAR(8), @NumDays) + ', GETDATE())
             ),
        BackupsTaken
          AS ( SELECT   backup_start_date ,
                        backup_finish_date ,
                        database_name AS DBName
               FROM     msdb.dbo.backupset
               WHERE    type = ''D''
             ),
              BackupPivot
                AS ( SELECT DBName, ' + @RunStartedList + '
                        FROM (
                                  SELECT  CONVERT(VARCHAR(20), RunStartDateTime, 120) AS RunStarted,
                                                BackupsTaken.DBName
                                  FROM    JobRuns
                                                LEFT JOIN BackupsTaken ON JobRuns.RunStartDateTime <= BackupsTaken.backup_start_date
                                                                                           AND BackupsTaken.backup_finish_date <= JobRuns.RunFinishDateTime
                                  WHERE   JobRuns.RunStartDateTime >= DATEADD(DAY, ' + CONVERT(NVARCHAR(8), @NumDays) + ', GETDATE())
                                  ) AS src
                           PIVOT
                           ( COUNT(RunStarted) FOR RunStarted IN (' + @RunStartedList + ') ) AS pvt)
SELECT  sd.name as [Sys.Databases.Name],
        bp.*
FROM    BackupPivot bp
        FULL OUTER JOIN master.sys.databases sd ON bp.DBName = sd.name
ORDER BY sd.name ,
        bp.DBName
       ;
       ';
--SELECT  @SQL;
EXEC (@SQL);

And some results:

Backup Query Final Results

Look! We have a database that failed to back up in the job that started on 2016-07-13 at 21:00! And we have a couple of databases that aren’t being backed up at all! (Mind you, one of those is tempdb… I really should filter that out…)

Now, if only I could get this query to identify the reason for the failure from the logs, and create support tickets to annoy the infrastructure team. So many logs, so little time…

Disclaimer

This code worked for me on every server I tested it on, which started off with SQL Server 2005 SP1 (I couldn’t find anything more primitive) in the 2005 range, and this won’t run on SQL 2000 because that can’t cope with VARCHAR(MAX). And, hey, nobody should have SQL 2000 any more. Or 2005.

This code is worth what you paid for it, and may eat your bacon sandwich. Don’t run it without testing it. But how are you going to test it without running it? Hah.

Advertisements
This entry was posted in SQLServerPedia Syndication and tagged , , , . Bookmark the permalink.

3 Responses to Which databases were backed up in which backup task?

  1. Perry whittle says:

    Use master.sys.databases
    I prefer to use FOR XML PATH

    • thomasrushton says:

      Hi Perry. Not spoken with you for ages.

      Not quite sure what you’re getting at here. I am using sys.databases (granted, the alias in the SELECT statement was a bit confusing, but I’ve tweaked that). And I am using FOR XML PATH…

      What else were you thinking might be better achieved with FOR XML PATH?

  2. Pingback: Finding Failed Backups And Jobs – Curated SQL

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s