DataJumble - Shuffling characters in a data value

This was called a Data-Scrambling Function but it depends on what you mean by "scrambling". This is a function which merely uses the same characters but switches their order randomly, so I've renamed it DataJumble as opposed to my article on Data-Scrambling.

Why?
We want to scramble sensitive data that we send to suppliers for support or analysis. With inspiration from: "Obfuscating your SQL Server Data" by John Magnabosco but tweaked for our purposes. NOTE that the following has only ever been run on development environments and I would not recommend running this on a production system as I have not tested the performance and database load.

copyraw
Before:
           StudentID   StudentName          DateOfBirth
           ----------- -------------------- -------------
           1           John Smith           1990-03-21
           2           Fred Bloggs          1988-11-02
           3           Another User         1985-07-11
           4           Yet Another User     1977-06-25

       After:
           StudentID   StudentName          DateOfBirth
           ----------- -------------------- -------------
           1           mStnoh iJh           1926-01-02
           2           lgreg BFdos          1969-03-14
           3           onrUest ehAr         1968-05-05
           4           otentre AhYe Usr     1964-08-08
  1.  Before: 
  2.             StudentID   StudentName          DateOfBirth 
  3.             ----------- -------------------- ------------- 
  4.             1           John Smith           1990-03-21 
  5.             2           Fred Bloggs          1988-11-02 
  6.             3           Another User         1985-07-11 
  7.             4           Yet Another User     1977-06-25 
  8.   
  9.         After: 
  10.             StudentID   StudentName          DateOfBirth 
  11.             ----------- -------------------- ------------- 
  12.             1           mStnoh iJh           1926-01-02 
  13.             2           lgreg BFdos          1969-03-14 
  14.             3           onrUest ehAr         1968-05-05 
  15.             4           otentre AhYe Usr     1964-08-08 

Precursor
You will need to be able to create the following view to generate random numbers on SQL Server.
copyraw
-- Used to reference RAND with in a function
CREATE VIEW dbo.vwRandom
AS
SELECT RAND() as RandomValue;
GO
  1.  -- Used to reference RAND with in a function 
  2.  CREATE VIEW dbo.vwRandom 
  3.  AS 
  4.  SELECT RAND() as RandomValue; 
  5.  GO 

The Function
Again you need permission to create this function. Don't forget to GRANT permission to execute this function for whatever user/system account that will run it.
copyraw
-- Drop the function if it already exists
IF OBJECT_ID ('ufn_DataJumble', 'FN') IS NOT NULL
DROP FUNCTION ufn_DataJumble ;
GO

-- Create user defined function
CREATE FUNCTION ufn_DataJumble
(
    @OrigVal varchar(max)
)
RETURNS varchar(max)
WITH ENCRYPTION
AS
BEGIN

        -- Variables used
        DECLARE @NewVal varchar(max);
        DECLARE @OrigLen int;
        DECLARE @CurrLen int;
        DECLARE @LoopCt int;
        DECLARE @Rand int;

        -- for dates/times
        DECLARE @MinYear int;
        DECLARE @MaxYear int;
        DECLARE @RandYear int;
        DECLARE @RandMonth int;
        DECLARE @RandDay int;
        DECLARE @RandHour int;
        DECLARE @RandMinute int;
        DECLARE @RandSeconds int;
        DECLARE @RandMSeconds int;
        DECLARE @DateVal date;
        DECLARE @TimeVal time;
        DECLARE @DateMatch int;
        DECLARE @TimeMatch int;
        DECLARE @ColumnTypeIsDate int;

        -- Format given value (prevents CHAR datatype issue)
        SET @OrigVal = LTRIM(RTRIM(@OrigVal));

        -- Set variable default values
        SET @NewVal = '';
        SET @OrigLen = DATALENGTH(@OrigVal);
        SET @CurrLen = @OrigLen;
        SET @LoopCt = 1;
        SET @DateVal = ''; -- 1900-01-01
        SET @TimeVal = ''; -- 00:00:00.000
        SET @DateMatch = 0;
        SET @TimeMatch = 0;
        SET @ColumnTypeIsDate = 0;

        -- Determine if Given Value is date
        IF @OrigVal IS NULL
                RETURN NULL;

        -- Determine if Given Value is date
        IF ISDATE(@OrigVal)=1
                SET @ColumnTypeIsDate = 1;

        -- Determine if Given Value is date (ISDATE does not recognize CAST AS DATE)
        IF @OrigLen=10
                AND (LEN(@OrigVal) - LEN(REPLACE(@OrigVal, '-', '')))=2
                AND ISNUMERIC(SUBSTRING(@OrigVal, 1, 4))=1
                AND ISNUMERIC(SUBSTRING(@OrigVal, 6, 2))=1
                AND ISNUMERIC(SUBSTRING(@OrigVal, 9, 2))=1
                                        SET @ColumnTypeIsDate = 1;


        -- Determine if this is a date/time value
        IF @ColumnTypeIsDate=1
        BEGIN

                -- DETERMINE IF THIS FIELD CONTAINS A DATE VALUE
                SET @DateMatch =
                        CASE
                                WHEN CAST(CAST(@OrigVal AS date) AS VARCHAR(10))'1900-01-01' THEN 1
                                ELSE 0
                        END;


                -- DETERMINE IF THIS FIELD CONTAINS A TIME VALUE
                SET @TimeMatch =
                        CASE
                                WHEN CAST(CAST(@OrigVal AS time) AS VARCHAR(10))'00:00:00.000' THEN 1
                                ELSE 0
                        END;


                -- DETERMINE TIME VALUE IF TIME WAS IN GIVEN VALUE
                IF @DateMatch=1
                BEGIN
                        -- SET YEAR RANGE (1912 - 1994: for Birthdate specifically: Older than 18 but younger than 100)
                        SET @MinYear = DATEPART(YEAR,GETDATE())-100;    -- 100 years ago from today
                        SET @MaxYear = DATEPART(YEAR,GETDATE())-18;             -- 18 years ago from today


                        -- GET RANDOM DATE VALUES
                        SELECT @RandYear = ((@MinYear + 1) - @MaxYear) * RandomValue + @MaxYear FROM dbo.vwRandom;
                        SELECT @RandMonth = (1 - 12) * RandomValue + 12 FROM dbo.vwRandom;
                        SELECT @RandDay = (1 - 28) * RandomValue + 28 FROM dbo.vwRandom;  -- prevent invalid dates (specifically 31 days in feb)
                        SET     @DateVal =
                                CAST(@RandYear AS CHAR(4)) + '-'
                                + STUFF(@RandMonth, 1, 0, REPLICATE('0', 2 - LEN(@RandMonth))) + '-'
                                + STUFF(@RandDay, 1, 0, REPLICATE('0', 2 - LEN(@RandDay)));
                END


                -- DETERMINE TIME VALUE IF TIME WAS IN GIVEN VALUE
                IF @TimeMatch=1
                BEGIN

                        -- SET RANDOM TIME VALUE (include Milliseconds)
                        SELECT @RandHour = (0 - 23) * RandomValue + 23 FROM dbo.vwRandom;
                        SELECT @RandMinute = (0 - 59) * RandomValue + 59 FROM dbo.vwRandom;
                        SELECT @RandSeconds = (0 - 59) * RandomValue + 59 FROM dbo.vwRandom;
                        SELECT @RandMSeconds = (0 - 999) * RandomValue + 999 FROM dbo.vwRandom;
                        SET @TimeVal =
                                STUFF(@RandHour, 1, 0, REPLICATE('0', 2 - LEN(@RandHour)))
                                + ':' + STUFF(@RandMinute, 1, 0, REPLICATE('0', 2 - LEN(@RandMinute)))
                                + ':' + STUFF(@RandSeconds, 1, 0, REPLICATE('0', 2 - LEN(@RandSeconds)))
                                + '.' + STUFF(@RandMSeconds, 1, 0, REPLICATE('0', 3 - LEN(@RandMSeconds)));
                        SET @TimeVal = CONVERT(time, @TimeVal, 14);

                END


                -- DETERMINE WHICH DATE PARTS WERE SUBMITTED
                IF @TimeMatch=1 AND @DateMatch=0
                        SET @NewVal = @TimeVal;
                ELSE IF @TimeMatch=0 AND @DateMatch=1
                        SET @NewVal = @DateVal;
                ELSE IF @TimeMatch=1 AND @DateMatch=1
                        SET @NewVal = CAST(@DateVal AS varchar(10)) + ' ' + CAST(@TimeVal AS varchar(16));
                ELSE
                        SET @NewVal = '';


                -- REDUCE THIS STRING TO THE SAME LENGTH AS SUBMITTED VALUE
                SET @NewVal = SUBSTRING(LTRIM(RTRIM(CAST(@NewVal AS VARCHAR(30)))), 1, LEN(@OrigVal));

        END
        ELSE
        BEGIN

                -- Loop through the characters passed
                WHILE @LoopCt
  1.  -- Drop the function if it already exists 
  2.  IF OBJECT_ID ('ufn_DataJumble', 'FN') IS NOT NULL 
  3.  DROP FUNCTION ufn_DataJumble ; 
  4.  GO 
  5.   
  6.  -- Create user defined function 
  7.  CREATE FUNCTION ufn_DataJumble 
  8.  ( 
  9.      @OrigVal varchar(max) 
  10.  ) 
  11.  RETURNS varchar(max) 
  12.  WITH ENCRYPTION 
  13.  AS 
  14.  BEGIN 
  15.   
  16.          -- Variables used 
  17.          DECLARE @NewVal varchar(max)
  18.          DECLARE @OrigLen int; 
  19.          DECLARE @CurrLen int; 
  20.          DECLARE @LoopCt int; 
  21.          DECLARE @Rand int; 
  22.   
  23.          -- for dates/times 
  24.          DECLARE @MinYear int; 
  25.          DECLARE @MaxYear int; 
  26.          DECLARE @RandYear int; 
  27.          DECLARE @RandMonth int; 
  28.          DECLARE @RandDay int; 
  29.          DECLARE @RandHour int; 
  30.          DECLARE @RandMinute int; 
  31.          DECLARE @RandSeconds int; 
  32.          DECLARE @RandMSeconds int; 
  33.          DECLARE @DateVal date; 
  34.          DECLARE @TimeVal time; 
  35.          DECLARE @DateMatch int; 
  36.          DECLARE @TimeMatch int; 
  37.          DECLARE @ColumnTypeIsDate int; 
  38.   
  39.          -- Format given value (prevents CHAR datatype issue) 
  40.          SET @OrigVal = LTRIM(RTRIM(@OrigVal))
  41.   
  42.          -- Set variable default values 
  43.          SET @NewVal = ''
  44.          SET @OrigLen = DATALENGTH(@OrigVal)
  45.          SET @CurrLen = @OrigLen; 
  46.          SET @LoopCt = 1
  47.          SET @DateVal = ''-- 1900-01-01 
  48.          SET @TimeVal = ''-- 00:00:00.000 
  49.          SET @DateMatch = 0
  50.          SET @TimeMatch = 0
  51.          SET @ColumnTypeIsDate = 0
  52.   
  53.          -- Determine if Given Value is date 
  54.          IF @OrigVal IS NULL 
  55.                  RETURN null
  56.   
  57.          -- Determine if Given Value is date 
  58.          IF ISDATE(@OrigVal)=1 
  59.                  SET @ColumnTypeIsDate = 1
  60.   
  61.          -- Determine if Given Value is date (ISDATE does not recognize CAST AS DATE) 
  62.          IF @OrigLen=10 
  63.                  AND (LEN(@OrigVal) - LEN(REPLACE(@OrigVal, '-', '')))=2 
  64.                  AND ISNUMERIC(SUBSTRING(@OrigVal, 1, 4))=1 
  65.                  AND ISNUMERIC(SUBSTRING(@OrigVal, 6, 2))=1 
  66.                  AND ISNUMERIC(SUBSTRING(@OrigVal, 9, 2))=1 
  67.                                          SET @ColumnTypeIsDate = 1
  68.   
  69.   
  70.          -- Determine if this is a date/time value 
  71.          IF @ColumnTypeIsDate=1 
  72.          BEGIN 
  73.   
  74.                  -- DETERMINE IF THIS FIELD CONTAINS A DATE VALUE 
  75.                  SET @DateMatch = 
  76.                          CASE 
  77.                                  WHEN CAST(CAST(@OrigVal AS date) AS VARCHAR(10))'1900-01-01' THEN 1 
  78.                                  ELSE 0 
  79.                          END; 
  80.   
  81.   
  82.                  -- DETERMINE IF THIS FIELD CONTAINS A TIME VALUE 
  83.                  SET @TimeMatch = 
  84.                          CASE 
  85.                                  WHEN CAST(CAST(@OrigVal AS time) AS VARCHAR(10))'00:00:00.000' THEN 1 
  86.                                  ELSE 0 
  87.                          END; 
  88.   
  89.   
  90.                  -- DETERMINE TIME VALUE IF TIME WAS IN GIVEN VALUE 
  91.                  IF @DateMatch=1 
  92.                  BEGIN 
  93.                          -- SET YEAR RANGE (1912 - 1994: for Birthdate specifically: Older than 18 but younger than 100) 
  94.                          SET @MinYear = DATEPART(YEAR,GETDATE())-100;    -- 100 years ago from today 
  95.                          SET @MaxYear = DATEPART(YEAR,GETDATE())-18;             -- 18 years ago from today 
  96.   
  97.   
  98.                          -- GET RANDOM DATE VALUES 
  99.                          SELECT @RandYear = ((@MinYear + 1) - @MaxYear) * RandomValue + @MaxYear FROM dbo.vwRandom; 
  100.                          SELECT @RandMonth = (1 - 12) * RandomValue + 12 FROM dbo.vwRandom; 
  101.                          SELECT @RandDay = (1 - 28) * RandomValue + 28 FROM dbo.vwRandom;  -- prevent invalid dates (specifically 31 days in feb) 
  102.                          SET     @DateVal = 
  103.                                  CAST(@RandYear AS CHAR(4)) + '-' 
  104.                                  + STUFF(@RandMonth, 1, 0, REPLICATE('0', 2 - LEN(@RandMonth))) + '-' 
  105.                                  + STUFF(@RandDay, 1, 0, REPLICATE('0', 2 - LEN(@RandDay)))
  106.                  END 
  107.   
  108.   
  109.                  -- DETERMINE TIME VALUE IF TIME WAS IN GIVEN VALUE 
  110.                  IF @TimeMatch=1 
  111.                  BEGIN 
  112.   
  113.                          -- SET RANDOM TIME VALUE (include Milliseconds) 
  114.                          SELECT @RandHour = (0 - 23) * RandomValue + 23 FROM dbo.vwRandom; 
  115.                          SELECT @RandMinute = (0 - 59) * RandomValue + 59 FROM dbo.vwRandom; 
  116.                          SELECT @RandSeconds = (0 - 59) * RandomValue + 59 FROM dbo.vwRandom; 
  117.                          SELECT @RandMSeconds = (0 - 999) * RandomValue + 999 FROM dbo.vwRandom; 
  118.                          SET @TimeVal = 
  119.                                  STUFF(@RandHour, 1, 0, REPLICATE('0', 2 - LEN(@RandHour))) 
  120.                                  + ':' + STUFF(@RandMinute, 1, 0, REPLICATE('0', 2 - LEN(@RandMinute))) 
  121.                                  + ':' + STUFF(@RandSeconds, 1, 0, REPLICATE('0', 2 - LEN(@RandSeconds))) 
  122.                                  + '.' + STUFF(@RandMSeconds, 1, 0, REPLICATE('0', 3 - LEN(@RandMSeconds)))
  123.                          SET @TimeVal = CONVERT(time, @TimeVal, 14)
  124.   
  125.                  END 
  126.   
  127.   
  128.                  -- DETERMINE WHICH DATE PARTS WERE SUBMITTED 
  129.                  IF @TimeMatch=1 AND @DateMatch=0 
  130.                          SET @NewVal = @TimeVal; 
  131.                  ELSE IF @TimeMatch=0 AND @DateMatch=1 
  132.                          SET @NewVal = @DateVal; 
  133.                  ELSE IF @TimeMatch=1 AND @DateMatch=1 
  134.                          SET @NewVal = CAST(@DateVal AS varchar(10)) + ' ' + CAST(@TimeVal AS varchar(16))
  135.                  ELSE 
  136.                          SET @NewVal = ''
  137.   
  138.   
  139.                  -- REDUCE THIS STRING TO THE SAME LENGTH AS SUBMITTED VALUE 
  140.                  SET @NewVal = SUBSTRING(LTRIM(RTRIM(CAST(@NewVal AS VARCHAR(30)))), 1, LEN(@OrigVal))
  141.   
  142.          END 
  143.          ELSE 
  144.          BEGIN 
  145.   
  146.                  -- Loop through the characters passed 
  147.                  WHILE @LoopCt 

Usage
copyraw
SELECT 
	Common.ufn_DataJumble(Forenames) AS Forenames
	, Common.ufn_DataJumble(Surname) AS Surnames
	, Common.ufn_DataJumble(AccountNumber) AS AccountNumber
	, Common.ufn_DataJumble(DateOfBirth) AS DateOfBirth 
FROM 
	mySchema.mySensitiveTable;
  1.  SELECT 
  2.      Common.ufn_DataJumble(Forenames) AS Forenames 
  3.      , Common.ufn_DataJumble(Surname) AS Surnames 
  4.      , Common.ufn_DataJumble(AccountNumber) AS AccountNumber 
  5.      , Common.ufn_DataJumble(DateOfBirth) AS DateOfBirth 
  6.  FROM 
  7.      mySchema.mySensitiveTable; 
or
copyraw
UPDATE Student.PersonExtract 
SET Surname=Common.ufn_DataJumble(Surname);
  1.  UPDATE Student.PersonExtract 
  2.  SET Surname=Common.ufn_DataJumble(Surname)

Known Issues (fixed in above function)
  • "Invalid length parameter passed to the LEFT or SUBSTRING function"
    • Add LTRIM(RTRIM(Forenames)) if Forenames is a char datatype.
  • DateOfBirth is interpreted as string (isdate does not recognize datatype date)
    • CAST(Common.ufn_DataJumble(CAST(DateOfBirth AS DATETIME)) AS DATE) AS DateOfBirth

Still To Do
Create stored procedure that will take any table and return the table with scrambled data:
  • Needs to determine datatype
  • Cannot exceed allowed length (shouldn't happen if simply scrambling data)
  • Function needs to be reduced in size


Stored Procedure (using this function)
So that I can give it any table name and it will maintain structure and data types:
copyraw
USE UnitTestDataStaging;
GO


-- Drop Stored Procedure if already exists
IF OBJECT_ID ( 'Common.ufn_DataJumble', 'P' ) IS NOT NULL 
    DROP PROCEDURE Common.ufn_DataJumble;
GO


-- Create Stored Procedure
CREATE PROCEDURE Common.ufn_DataJumble
    @SchemaTableName nvarchar(100) 
AS 
BEGIN
 
	-- Variables used
	DECLARE @TableName varchar(50);
	DECLARE @SqlToExecute nvarchar(4000);


	-- Set variable default values
	SET @TableName = SUBSTRING(@SchemaTableName, CHARINDEX('.', @SchemaTableName)+1, LEN(@SchemaTableName));
	SET @SqlToExecute = '';

	-- Generate Select Query to extract data from given table
	SET @SqlToExecute='SELECT ' + CHAR(10) + STUFF(
		( 
		-- Code to concatenate column names and data types into one string
		SELECT 
			',Common.ufn_DataJumble(ltrim(rtrim(' + COLUMN_NAME+ '))) AS ' + COLUMN_NAME + CHAR(10)
		FROM 
			INFORMATION_SCHEMA.COLUMNS 
		WHERE
			TABLE_NAME = @TableName
		ORDER BY ORDINAL_POSITION  
		FOR XML PATH ('')
		),1,1,''
	) + ' FROM ' + @SchemaTableName;	


	EXEC(@SqlToExecute);
END	
GO

-- execute stored procedure on a table
EXECUTE Common.ufn_DataJumble 'dbo.myTable';
  1.  USE UnitTestDataStaging; 
  2.  GO 
  3.   
  4.   
  5.  -- Drop Stored Procedure if already exists 
  6.  IF OBJECT_ID ( 'Common.ufn_DataJumble', 'P' ) IS NOT NULL 
  7.      DROP PROCEDURE Common.ufn_DataJumble; 
  8.  GO 
  9.   
  10.   
  11.  -- Create Stored Procedure 
  12.  CREATE PROCEDURE Common.ufn_DataJumble 
  13.      @SchemaTableName nvarchar(100) 
  14.  AS 
  15.  BEGIN 
  16.   
  17.      -- Variables used 
  18.      DECLARE @TableName varchar(50)
  19.      DECLARE @SqlToExecute nvarchar(4000)
  20.   
  21.   
  22.      -- Set variable default values 
  23.      SET @TableName = SUBSTRING(@SchemaTableName, CHARINDEX('.', @SchemaTableName)+1, LEN(@SchemaTableName))
  24.      SET @SqlToExecute = ''
  25.   
  26.      -- Generate Select Query to extract data from given table 
  27.      SET @SqlToExecute='SELECT ' + CHAR(10) + STUFF( 
  28.          ( 
  29.          -- Code to concatenate column names and data types into one string 
  30.          SELECT 
  31.              ',Common.ufn_DataJumble(ltrim(rtrim(' + COLUMN_NAME+ '))) AS ' + COLUMN_NAME + CHAR(10) 
  32.          FROM 
  33.              INFORMATION_SCHEMA.COLUMNS 
  34.          WHERE 
  35.              TABLE_NAME = @TableName 
  36.          ORDER BY ORDINAL_POSITION 
  37.          FOR XML PATH ('') 
  38.          ),1,1,'' 
  39.      ) + ' FROM ' + @SchemaTableName; 
  40.   
  41.   
  42.      EXEC(@SqlToExecute)
  43.  END 
  44.  GO 
  45.   
  46.  -- execute stored procedure on a table 
  47.  EXECUTE Common.ufn_DataJumble 'dbo.myTable'
Category: Transact-SQL :: Article: 422

Credit where Credit is Due:


Feel free to copy, redistribute and share this information. All that we ask is that you attribute credit and possibly even a link back to this website as it really helps in our search engine rankings.

Disclaimer: Please note that the information provided on this website is intended for informational purposes only and does not represent a warranty. The opinions expressed are those of the author only. We recommend testing any solutions in a development environment before implementing them in production. The articles are based on our good faith efforts and were current at the time of writing, reflecting our practical experience in a commercial setting.

Thank you for visiting and, as always, we hope this website was of some use to you!

Kind Regards,

Joel Lipman
www.joellipman.com

Related Articles

Joes Revolver Map

Accreditation

Badge - Certified Zoho Creator Associate
Badge - Certified Zoho Creator Associate

Donate & Support

If you like my content, and would like to support this sharing site, feel free to donate using a method below:

Paypal:
Donate to Joel Lipman via PayPal

Bitcoin:
Donate to Joel Lipman with Bitcoin bc1qf6elrdxc968h0k673l2djc9wrpazhqtxw8qqp4

Ethereum:
Donate to Joel Lipman with Ethereum 0xb038962F3809b425D661EF5D22294Cf45E02FebF
© 2024 Joel Lipman .com. All Rights Reserved.