T-SQL-Driven Workflow Engine using SQL Agent jobs

By on March 22, 2019

The Summary and Apology

I began working on this blog post about six months ago and received a lot of really great feedback from really smart folks who know how to do word… stuff much better than I do. In trying to revise it I ended up just getting bogged down in life and it sat on the shelf collecting virtual dust. I have come to the conclusion that I either have to set it free, as is, or let it die and while it may not be the best structured or the most elegant prose I am still hopeful that the fundamental concepts might be useful to someone out there and if I manage to add a trick to anyone’s toolkit it will have been worth releasing.

So, that being said, if you are interested in learning one possible way to build out a workflow in T-SQL that can flow between executing steps in serial and in parallel through the spinning up and tearing down of agent jobs, then please, enjoy.

The Premise

When working with data in SQL Server it is fairly common to build out a custom sequence of events at runtime and then execute them. Most commonly I have seen this done by building out a single statement made up of the individual code statements and then just executing it.

In the following example you can see that it will always select ‘x’, then wait for 5 seconds, then depending on the client it will select y and wait 10 seconds or select z and wait 5 seconds. This process will therefore take 10-15 seconds to complete (less any additional overhead).

A simple math question: You have 12 tasks, each taking 5 seconds each. How long does it take to complete them? If you said a minute, you are correct. How long -could- it take to complete them? If you said 5 seconds, you are also potentially right. Welcome to the difference between serial and parallel processing. If you are having to complete each task in order, one after the other, this would be running them in a serial fashion. If, however, you can complete tasks simultaneously, you are running them in parallel.

The Clumsy Analogy

Think of it like building a house in very high-level terms coming from someone who has never built a house:

  1. Floor
  2. Walls
  3. Roof
  4. Decorate Living Room
  5. Decorate Dining Room

You have to need to do a floor first, then the walls, then the roof. After the house structure is built, you can then decorate the rooms. How you go about this is determined by your available resources. Have enough people to do both at the same time? You could. If not, you would decorate one room and then the other. Decorating them at the same time is an example of handling the tasks in a parallel fashion. If you have to decorate them one after the other you are handling it in a serial fashion.

Still using the example above, if you were to add another task of ‘Throw Housewarming Party” you would want it to happen after all the rooms were decorated. The likely ideal path on that would be for steps 1-3 to be performed one after another in a serial fashion, steps 4 and 5 to be performed at the same time in a parallel fashion, and then step 6 – Throw Housewarming Party to be done.

  1. Floor
  2. Walls
  3. Roof
  4. Decorate
    1. Living Room
    2. Dining Room
  5. Throw Housewarming Party

There is a serial workflow from 1->5 with step 4 having a parallel workflow of 1->2. For a much more technical explanation that doesn’t involve clumsy house analogies, head on over to read up on Amdahl’s Law.

The question became how to accomplish flowing in and out of serial and parallel processing in SQL Server. I will be showing my approach utilizing SQL Server Agent jobs below. Shout out here to Constantine Kokkinos (b|t) for talking through possible ideas and dropping the seed of SQL Server Agent which grew into this solution.

The process I am using I have lovingly termed a task engine. Currently a task can be a stored procedure or a straight bit of code. However it could be anything that can be run as a SQL Server Agent job so a task could be an SSIS package or something you want to run as CmdExec. These require the Agent Job to be constructed using different syntax though and I have yet to build it out. I will provide a small overview of each of the components and then provide the entirety of the script to set up that component and at the end I will step through the primary engine snippet by snippet.

The Setup

The demo you will see here relies on previously defined tasks but the tasks can also be generated at runtime.

Task: This table stores the task definition as the straight bit of code or the stored procedure that should be called.

For the purposes of this demo, every code-based task is going to run:

SELECT 'x'; WAITFOR DELAY '00:00:05';

and every stored procedure based task is going to call common.TestSproc which is defined as:

The same basic code with the exception that it deviates delay time based on the Client. This is to demonstrate passing the @Client parameter and verifying that it works.

TaskList: This table stores a defined listing of tasks that should be run as a single task list. This is also where the task order is set. For example if you look at the results of the following:

You will see that they call the identical tasks, the only difference is that Test1 specifies the TaskOrder one after the other, Test2 specifies the same TaskOrder, and Test3 takes a more realistic approach by weaving in and out of serial.
Tasks: 1 -> 2 -> (3|4|5) -> (6|7) -> 8 -> 9 -> (10|11|12).

The full setup script is below:

The Magic

Now on to the Task Engine stored procedure that contains the meat of the process. I should note that some of the things in place are just to make my life easier with my current stored tasklist-based implementation so I will only briefly touch on those.

  • @Client – This parameter specifies a Client. It, along with @TaskListName, are used to lookup against the TaskList table
  • @DeleteIfJobExists – If an Agent Job by the same name exists, delete it prior to trying to create it. Defaults to True.
  • @KeepJobOnFailure – If an Agent Job fails, controls whether it should be deleted or not. Defaults to False.
  • @KeepJobOnSuccess – If an Agent Job succeeds, controls whether it should be deleted or not. Defaults to False.
  • @ExitProcessOnFail – If a failure occurs, controls whether it should continue running the remaining tasks or exit out. Defaults to True.
  • @debug – Set to 1 to output more detailed debugging info such as generated code, etc.

It also contains a number of working parameters.

The first step is to build out a list of the tasks that need to be run. In our case we are pulling it from the defined task lists but this could certainly be generated dynamically. Instead of building statements and just concatenating into a single string to be executed you could run them into a similar table. This step also builds out the code that will be placed in the Agent job. You’ll see that it turns the code-based task into  SELECT ''x''; WAITFOR DELAY ''00:00:05'';   and it turns the stored procedure-based task into:

Keeping in mind that these are not valid scripts stand-alone but will be when used as the command text for creating Agent jobs.

 

This section also defines the TaskOrder range so it knows what it will need to iterate through for processing.

The next section begins the processing of the defined tasks. It drops/creates a new table #Jobs which will contain the individual tasks corresponding to the task order. If you look at Test3, you’ll see that Task ID 3,4,5 are defined as TaskOrder 3 so when it is processing TaskOrder 3 there are three entries in the #Jobs table to show that those three need to be processed in parallel.

The next step is to create the Agent jobs and start them. This still happens in a serial fashion but it is very quick. It will generate Agent job names based on Client and Task names and use the code from earlier. One thing to note is that the code used to spin up the Agent job here is for the “Transact-SQL script (T-SQL)” job step type. If you are going with an SSIS package type or CmdExec type, the code would need to differ here. It will then issue start statements.

Executing the stored procedure with @debug = 1 gives us the following snippet so you can see an example of what is output:

The next step loop over those job monitoring them for completion status until they all show either success or failure. You will want to consider adding in a logical delay between checks so you aren’t constantly hammering your system.

Then it will delete the jobs based on your parameter choices and success or failure results.

Then it will generate Success/Failure messages and choose whether or not to exit on failure based on your parameter choices. This section will be used to provide custom failure handling.

Complete the outer WHILE loop by marking the relevant tasks completed and setting the next TaskOrder to process.

The last bit just has basic placeholders for future custom Success/Failure logic.

The RunTaskEngine code in its entirety:

The Results

Once you have run the set up script for both the tasks and the task engine here are some of the baseline things to play around with:

While this is still a work in progress, I am excited by the potential I’ve seen so far and I am looking forward to utilizing this technique in future projects to help flow between serial and parallel processing where appropriate.

You can download all the scripts required to play around with this demo HERE.

Categorised in: ,

Leave a Reply

Your email address will not be published. Required fields are marked *

Subscribe

Want occasional notes from me?
(I promise, no spam.)

* indicates required

© Copyright 2019 The Data You Need. All rights reserved. Privacy Policy Cookie Policy