Retrying and Running a Failed Job After Some Delay in SQL Server

There can be scenarios in certain where there can be requirement of automatically rerunning a failed job after a certain amount of delay. For example, a job which fails due to intermittent issue like a network glitch or some file currently been locked by other process, so it would require that the job need to be attempted to be run again after some time, by which there are possibilities of the issue getting resolved by then and our job will completely successfully and no manual intervention is required by a DBA.

There are multiple ways of making a job to run again itself after sometime and you can choose any approach which you prefer, after testing the solution on Test server.

Method 1: – There is an option available in SQL Server Agent Job Step properties, where we can specify, if that particular job step need to be retried if failed and how many times it should retry to run the failed job and how much delay should be there between each retry of the failed step.

You can configure this from SQL Server Agent -> Jobs -> Job Properties -> Steps Tab -> Edit the Step -> Click on Advanced tab -> Set appropriate value for “Retry Attempts” and “Retry Interval(minutes)”

Job_Step_Retry_Options

Job_Step_Retry_Options

Method 2: – This is applicable for jobs which run T-SQL code as part of job step. We can add multiple schedules to a job and Job will run on both the schedules. Modify the job, “so that it first check if the previous run of this job was failed and was it during the first scheduled time, if Yes, then run the T-SQL code again. If it was successful in its last run and time it was run was during its first schedule, then do not do anything. If it was successful in its last run and time it was run was during its second schedule, then run the T-SQL code”. This needs writing this custom logic and of-course need lot of testing and also is limited only if job consists of T-SQL steps.

Method 3 – Create another job which will monitor if out critical Job completed successfully or not, if it failed, then just reruns that particular job. This is much simpler to implement compared to method 2 and also works for any type of job steps like T-SQL or Maintenance plan related jobs, etc.

Below is a sample code which can be used in the new job which monitors other jobs and rerun them if they failed. This requires you to specify the name of the jobs which need to rerun.

/*

This Job step finds if any of the job with names as "Backup_Full" 
and "App_Job" failed in its previous run and If they have failed in the 
current day, this will rerun that job. Change jobs names as per your names 
of jobs in your environment.

*/
 
DECLARE
  @Value [varchar](2048),
  @JobName [varchar](2048),
  @CurrentDate [datetime],
  @Year [varchar](4),
  @Month [varchar](2),
  @MonthPre [varchar](2),
  @Day [varchar](2),
  @DayPre [varchar](2),
  @FinalDate [int]
 
-- Initialize Variables
SET @CurrentDate = GETDATE()
SET @Year = DATEPART(yyyy, @CurrentDate)
 
SELECT @MonthPre = CONVERT([varchar](2), DATEPART(mm, @CurrentDate)) 
SELECT @Month = RIGHT(CONVERT([varchar], (@MonthPre + 1000000000)), 2) 
SELECT @DayPre = CONVERT([varchar](2), DATEPART(dd, @CurrentDate)) 
SELECT @Day = RIGHT(CONVERT([varchar], (@DayPre + 1000000000)), 2) 
SET @FinalDate = CAST(@Year + @Month + @Day AS [int]) 

--Find failed jobs "Backup_Full" and "App_Job"
DECLARE FailedJobs CURSOR FOR
SELECT DISTINCT 
  j.[name]
FROM 
  [msdb].[dbo].[sysjobhistory] h INNER JOIN
  [msdb].[dbo].[sysjobs] j ON h.[job_id] = j.[job_id] INNER JOIN 
  [msdb].[dbo].[sysjobsteps] s ON j.[job_id] = s.[job_id] AND h.[step_id] = s.[step_id]
WHERE 
  h.[run_status] = 0 AND 
  h.[run_date] = @FinalDate AND 
  j.name IN ('Backup_Full', 'App_Job')
 
OPEN FailedJobs

FETCH NEXT FROM FailedJobs
INTO @JobName

WHILE @@FETCH_STATUS=0
BEGIN

-- Restarts the job
  EXECUTE msdb..sp_start_job @JobName

FETCH NEXT FROM FailedJobs into @JobName
END

CLOSE FailedJobs
DEALLOCATE FailedJobs

There may be other methods as well, but mentioned methods are practical and simple and should serve the purpose.

This is applicable on below versions of SQL Server

SQL Server 2005
SQL Server 2008 R2
SQL Server 2012
SQL Server 2014

Hope this was helpful.

Thanks,
SQLServerF1 Team
In-Depth Blogs on SQL Server, Information about SQL Server Conferences and Events, SQL Server Frequently asked questions, SQL Server Trainings.

 

Leave a Reply

Your email address will not be published. Required fields are marked *