Subscribe to Cache Cow        RSS Feed
-----

A Few String Parsing Functions in SQL Server

Icon Leave Comment
While working on one of my projects at work today, I came to the realization that database programming languages handle strings much differently than do those programming languages that are not database-centric (i.e. C#, Java, C++). I know, you may be thinking "well...duh!", but think about it for a second. A person who has spent their entire career (short or long) working with strings in C# (or the like) is used to strings being easy to work with. Strings are simply arrays of characters. This makes iterating through each of the characters in the string very easy. The .NET framework even has many built in functions to make your string fiddling life easier. In SQL, however, there is no such thing as an array (at least not in a entry to mid level developers frame of reference). If, for example, our application needs to pass an array of values into a stored procedure (SP) or user defined function (UDF) then we have to first convert the array into a delimited string, pass the string to SQL, then parse out the individual values for use. I don't recall where I got this particular function from, but it is the one I use when I need to parse out those values. Normally, I want to join information based on those values or work with them in tables so I use my fn_StringToTable UDF:
alter function [dbo].[fn_StringToTable]( @inputStr varchar(max) , @delimiter varchar(max) = ',' ) 
returns @tbl table (String varchar(max))

begin
	declare @value varchar(max)

	while charindex(@delimiter,@inputStr,0) <> 0
	begin
		select	@value = rtrim(ltrim(substring(@inputStr,1,charindex(@delimiter,@inputStr,0)-1))),
				@inputStr = rtrim(ltrim(substring(@inputStr,charindex(@delimiter,@inputStr,0)+len(@delimiter),len(@inputStr))))

		IF len(@value) > 0
			insert into @tbl select @value
	end

	IF len(@inputStr) > 0
		insert into @tbl select @inputStr -- Put the last item in
	return
end



This code works well for extracting those values that were passed in from our array (as a delimited string) and turning it into a table.

I had to give the back story to bring me to today's job. I was tasked with comparing two fields and showing those rows where the fields were not the same. Seems easy enough, however, the values that are being compared are those from an internal web application where we enter all the data related to a particular item and those which are from our ERP where the first set of data was imported. When the data is imported, the ERP performs some kind of magical transformation (hidden code) which transforms decimals and integers into strings, adds fictional extra zeros to decimals then turns them into strings, and other such Tom Foolery. Once this magical transformation happens, my original number (1.00) becomes a string with the characters (1.00000). As you can see, the two values are the same, but not identical. So, you say we should just perform a conversion to a number and compare! WRONG! Some of the values are indeed strings. Here is how I solved the problem.

I went about creating a function which would cut any leading or trailing zeros off of any string value passed to it. That worked, but would return an empty string if the value passed in only contained zeros (i.e. 0, 00, 0000, 000000000...). That is not what I wanted. So I had to find a way to compare the two values and retain that 0 value for comparison. The empty string would have been fine, however, some of the values entered were actually an empty string and I didn't want to get a false positive. I used the code from the delimited StringToTable UDF to form another function which would take each character of a string and put it in rows. I then counted the distinct characters from that table. Below you will find the code to do both removing the zeros and splitting a string into a table:

Split a non-delimited string to table (removes spaces since they are not technically a character):
alter function [dbo].[fn_Split] (@str varchar(max))
returns @List table ([Char] char(1))

begin
	declare @sItem char(1)
	set @sItem = ''

	while len(@str) > 1
	begin
		select	@sItem = rtrim(ltrim(substring(@str, 1, 1))),
				@str = rtrim(ltrim(substring(@str, 2, len(@str))))

		if len(@sItem) = 1 
			insert into @List select @sItem
	end

	IF len(@str) > 0
		insert into @List select @str -- Put the last item in
	return
end


Remove leading and trailing zereos
alter function [dbo].[fn_RemoveZeros] 
(
	@input varchar(max)
)
returns varchar(max)
as
begin
	declare @result varchar(max)

	if (select count(*) from (select distinct [Char] from dbo.fn_Split(@input)) as t) = 1 and substring(@input, 1,1) = '0'
	begin
		set @result = '0'
	end
	else
	begin
		select @result = replace(rtrim(ltrim(replace(isnull(@input, ''),'0',' '))),' ','0')
	end

	return @result
end



As you can see, the fn_RemoveZeros UDF uses the fn_Split UDF. I'm not sure if there are other ways to do this (I'm sure there are actually), but this method works for me and is fast enough to not interfere with the normal operation of the stored procedures in which I have used them. Hopefully, someone will stumble upon this and same themselves some time looking for the same solution i did (which didn't exists as far as i could see).

0 Comments On This Entry

 

Trackbacks for this entry [ Trackback URL ]

There are no Trackbacks for this entry

November 2014

S M T W T F S
      1
2345678
9101112131415
16171819202122
232425 26 272829
30