T-SQL Tuesday #98 – Your Technical Challenges Conquered
By January 9, 2018on
This month’s topic comes to us via Arun Sirpal (b|t) and the goal is to share a time when I faced a technical challenge that I overcame. Allow me to take a moment to traverse the mental minefield for near misses and hard fought victories until I find just the right one to talk about. Ah. There it is. That’ll do.
There once was a data archival process. This process had been running faithfully night after night, week after week, month after month, and year after year until, one day, it stopped. Now, unfortunately for this process, this was during a time before sufficient logging had been implemented and nobody noticed for a very long time. This was uncovered during a meeting when multiple people stated there was a data archival process in place and then, upon further examination, discovered that while technically true, the job that ran it had been disabled for several years.
So what are you to do in this situation? If you are anything like me you glance through the process to see if there is anything destructive in it and then enable the job and hope for the best.
*VOICE-OVER* – The best did not occur.
The process failed as I expected it would and the investigation began. The process failed due to transaction log growth when it grew from ~10GB to ~150GB and hit disk space issues on the dev instance. Surely you didn’t think I just rolled that out directly in production, right? This was… unfortunate. After investigating the process I discovered that the entire archival process was wrapped in a single transaction and in addition to some transactional tables which were just being pruned there were some other tables being archived with them. Heaps. Heaps with no simple way to identify any given row. Heaps with a very large number of columns. 30-50 million row heaps with a very large number of columns. Enough of those and the transaction log just… well…
The next step was to add a way to easily identify the rows so we could archive the data in batches but even just adding an identity column to these table would cause things to spiral out of control. I reached out to some friends of mine to talk through the issues. Shout out to Peter Schott (b) and Kenneth Fisher (b|t) for helping me out here.
--#region - Create temp table and populate IF OBJECT_ID('tempdb..#Example') IS NOT NULL DROP TABLE #Example CREATE TABLE #Example ( col1 uniqueidentifier default newid() , id_archival int NULL ) GO INSERT INTO #Example DEFAULT VALUES GO 1000 --#endregion --#region - Add id DECLARE @range_first_value int = (select ISNULL(MAX(id_archival),0) FROM #Example); DECLARE @rows int = 25; DECLARE @rowcount int = @rows; DECLARE @totalCount INT = 0; WHILE @rowCount >= @rows BEGIN WITH [Archival] AS ( SELECT TOP (@rows) id_archival, ROW_NUMBER() OVER (ORDER BY id_archival) cnt FROM #Example WHERE id_archival IS NULL ) UPDATE Archival SET id_archival = cnt + CAST(@range_first_value as int); SET @rowcount = @@rowcount; SET @range_first_value = @range_first_value + @rows; SET @totalCount = @totalCount + @rowCount; PRINT 'Added id_archival to ' + convert(varchar,@totalCount) + ' Records' /* IF NOT EXISTS ( SELECT 1 FROM msdb.dbo.sysjobs_view as job INNER JOIN msdb.dbo.sysjobactivity as act ON job.job_id = act.job_id WHERE act.run_requested_date IS NOT NULL AND act.stop_execution_date IS NULL AND job.name = @jobName -- This was the name of a scheduled job to truncate the log ) BEGIN EXEC msdb.dbo.sp_start_job @jobName; END */ END --#endregion
This code will populate the id_archival column in batches of the number of rows designated in @rows. Obviously you would not want to use 25, that was just scaled down so you could see that it runs in customizable batch sizes.
You will notice there is also a commented out section. That was used in conjunction with a scheduled log backup job. It would confirm that it was not in the process of running and kick it off.
This enabled us to get an ID field that we could use to handle the actual archival in batches while completely avoiding the transaction log growth which first caused the process to fail years back.
I am happy to say that this process is now, once again, running nightly and archiving data as expected. Although it did have several years it had to catch up on first.