T-SQL Tuesday #98 – Your Technical Challenges Conquered

By on January 9, 2018

 

This month’s topic comes to us via Arun Sirpal (b|t) and the goal is to share a time when I faced a technical challenge that I overcame. Allow me to take a moment to traverse the mental minefield for near misses and hard fought victories until I find just the right one to talk about. Ah. There it is. That’ll do.

There once was a data archival process. This process had been running faithfully night after night, week after week, month after month, and year after year until, one day, it stopped. Now, unfortunately for this process, this was during a time before sufficient logging had been implemented and nobody noticed for a very long time. This was uncovered during a meeting when multiple people stated there was a data archival process in place and then, upon further examination, discovered that while technically true, the job that ran it had been disabled for several years.

So what are you to do in this situation? If you are anything like me you glance through the process to see if there is anything destructive in it and then enable the job and hope for the best.

*VOICE-OVER* – The best did not occur.

The process failed as I expected it would and the investigation began. The process failed due to transaction log growth when it grew from ~10GB to ~150GB and hit disk space issues on the dev instance. Surely you didn’t think I just rolled that out directly in production, right? This was… unfortunate. After investigating the process I discovered that the entire archival process was wrapped in a single transaction and in addition to some transactional tables which were just being pruned there were some other tables being archived with them. Heaps. Heaps with no simple way to identify any given row. Heaps with a very large number of columns. 30-50 million row heaps with a very large number of columns. Enough of those and the transaction log just… well…

The next step was to add a way to easily identify the rows so we could archive the data in batches but even just adding an identity column to these table would cause things to spiral out of control. I reached out to some friends of mine to talk through the issues. Shout out to Peter Schott (b) and Kenneth Fisher (b|t) for helping me out here.

The Solution

--#region - Create temp table and populate
IF OBJECT_ID('tempdb..#Example') IS NOT NULL
    DROP TABLE #Example

CREATE TABLE #Example
(
		col1 uniqueidentifier default newid()
	,	id_archival int NULL
)
GO

INSERT INTO #Example DEFAULT VALUES
GO 1000
--#endregion

--#region - Add id
DECLARE @range_first_value int = (select ISNULL(MAX(id_archival),0) FROM #Example);
DECLARE @rows int = 25;
DECLARE @rowcount int = @rows;
DECLARE @totalCount INT = 0;

WHILE @rowCount >= @rows
BEGIN
	WITH [Archival] AS
	(
		SELECT TOP (@rows) id_archival, ROW_NUMBER() OVER (ORDER BY id_archival) cnt
		FROM #Example
		WHERE id_archival IS NULL
	)

	UPDATE Archival
	SET id_archival = cnt + CAST(@range_first_value as int);
	
	SET @rowcount = @@rowcount;
	SET @range_first_value = @range_first_value + @rows;

	SET @totalCount = @totalCount + @rowCount;
	PRINT 'Added id_archival to ' + convert(varchar,@totalCount) + ' Records'

	/*	
	IF NOT EXISTS (
		SELECT 1
		FROM msdb.dbo.sysjobs_view as job
		INNER JOIN msdb.dbo.sysjobactivity as act
			ON job.job_id = act.job_id
		WHERE
			act.run_requested_date IS NOT NULL
			AND act.stop_execution_date IS NULL
			AND job.name = @jobName -- This was the name of a scheduled job to truncate the log
		)
		BEGIN
			EXEC msdb.dbo.sp_start_job @jobName;
	END
	*/
END
--#endregion

This code will populate the id_archival column in batches of the number of rows designated in @rows. Obviously you would not want to use 25, that was just scaled down so you could see that it runs in customizable batch sizes.

You will notice there is also a commented out section. That was used in conjunction with a scheduled log backup job. It would confirm that it was not in the process of running and kick it off.

This enabled us to get an ID field that we could use to handle the actual archival in batches while completely avoiding the transaction log growth which first caused the process to fail years back.

I am happy to say that this process is now, once again, running nightly and archiving data as expected. Although it did have several years it had to catch up on first.

Categorised in: , ,

Leave a Reply

Your email address will not be published.

Subscribe

Want occasional notes from me?
(I promise, no spam.)

* indicates required

© Copyright 2024 The Data You Need. All rights reserved. Privacy Policy Cookie Policy