Sunday, February 20, 2011

Doing Things with Strings: SQL, Xml, and String Manipulation, Part II

This series of posts is about using SQL Server’s Xml features to do string manipulations.  Part I talked about creating comma-separated (CSV) lists from SQL database data.  Part II, this post, will talk about parsing a comma-separated list.

Until Xml-typed parameters came along in SQL Server 2005, the only easy way to pass a set of information into a stored procedure was by passing a string parameter that represented serialized data.  The two most common forms were an Xml fragment, and a CSV string.  Parsing these CSV strings usually required a custom table-valued function that used T-SQL string manipulation operations to parse out the information and build the return table. 

This is a perfectly good solution, but there’s a faster one.  I’ll give you the code for it, right up front:

CREATE FUNCTION ParseCsv
(
    @Csv varchar(max)
)
RETURNS TABLE
AS
    RETURN 
    (
        -- We're being passed in a CSV string; we'll replace the commas with
        -- endtag/start tag pairs.
        WITH A
        AS  (
            SELECT  REPLACE(@Csv, ',', '</dummy><dummy>') as XmlFrag1
        )
        ,   B
        AS  (
            -- We're building an XML fragment out of the CSV, in fact.
            SELECT  '<dummy>' + XmlFrag1 + '</dummy>' as XmlFrag2
            FROM    A
        )
        ,   C
        AS  (
            -- Convert it actually to type XML
            SELECT  CONVERT(xml, XmlFrag2) as XmlFrag3
            FROM    B
        )
            -- For more on XmlFragment, see below.  Right now, we want the
            -- content of the fragment, which we're getting with the value()
            -- function, and converting it to "int". 
        SELECT  D.XmlFragment.value('.', 'int') as Value
        FROM    C
            -- When you want to access the Xml field in every record, and 
            -- apply the "nodes" function to it, you need to remember that
            -- nodes() is a *function*, and thus can be used with CROSS APPLY
            -- to run the function for every record in the record source (C).
            -- Hence, we run nodes() on C.XmlFrag3 for every record, and call
            -- the resulting set of data D, with the results of "nodes" being
            -- the field XmlFragment.  
            CROSS APPLY C.XmlFrag3.nodes('/dummy') as D(XmlFragment)
     )
     
  


You’ll want to test it, so here’s code (using the AdventureWorksLT database) that generates a CSV string. 



DECLARE @csv varchar(max)
WITH A
as (
    SELECT (
        SELECT  ProductID 
        FROM    SalesLT.Product
        FOR XML AUTO, ELEMENTS, TYPE
    ) as ProductXml
   )
   
,   B
AS  (
    SELECT A.ProductXml.query('data(*)') AS ProductSeries 
    FROM A
)
SELECT  @csv = REPLACE(CONVERT(varchar(max), ProductSeries),' ',',')
        FROM B
SELECT @csv
select * from ParseCsv(@csv)


And here’s the result you get from the test.  The first recordset is just the select on @Csv, the second one is the result of the function.



image 



In tests, this consistently ran in half the time of a T-SQL string manipulation solution, regardless of the number of values in the CSV.

Labels: , ,