Saturday, January 22, 2011

Doing Things With Strings: SQL, Xml, and String Manipulation, Part I

SQL Server 2005 introduced new features for handling and manipulating Xml. Generating Xml via T-SQL is much simpler than it used to be. Parsing Xml is much, much easier, and we have a an Xml data type to use, too. All of that is pretty well documented in the Books Online. So if you need to produce or consume Xml, that's where to look.

But what these posts will be about is using Xml and the Xml features to get some non-Xml things done. Specifically, I'm going to talk about some types of string manipulation that the Xml features make faster and easier.

Generating Comma-Separated Lists (CSV Strings)

Sometimes, when you're summarizing data, you want the values in a column to be aggregated into a comma-separated list. An example of this might be a list of your biggest customers, with the trade associations or buying groups they belong to listed off to one side.

As you know, there is no built-in aggregate to do this in SQL Server. Microsoft (and, I believe, others) have published a SQL CLR aggregate function that will do this for you.

But, thanks to the Xml features, this can be also be done with T-SQL code, without publishing assemblies or writing custom functions. I’m going to demonstrate it with a series of CTEs, so that each step in the transformation is separate, but you can combine all of them into one expression if you like.

Let’s suppose that we are querying the AdventureWorks2008 database, and we want a list of stores with a list of contacts for each store in the same row, in a comma-separated list.

We start by querying the Sales.Store table, and include a subquery of the Person.Contact table, using the Sales.StoreContact table to link between stores and contacts. Our subquery is going to return the names of the contacts as an Xml fragment.
select  CustomerId
    ,   Name
    ,   (
        Select  ContactName  = FirstName + ' ' + ISNULL(MiddleName + ' ','') + LastName
        from    Person.Contact pc
                    inner join
                Sales.StoreContact ssc
                    on    pc.ContactID = ssc.ContactID
        where   ssc.CustomerID = ss.CustomerID
        for     xml auto, elements, type
        ) as ContactList1
from    Sales.Store ss

This looks like this:


Capture


Note that we’re only returning one element in our fragments: the Contact Name element. This is the only text that the fragment contains.


Our next step transforms the ContactList1 Xml field into a space-delimited list of names. Because our data has spaces, we’re going to alter our subquery a little to replace the spaces in the data with a character unlikely to appear in a name. You’ll see why in a minute.

With A
as  (        
    Select    ContactName  = FirstName + ' ' + ISNULL(MiddleName + ' ','') + LastName
         ,    ssc.CustomerID   
    from      Person.Contact pc            
                  inner join                  
              Sales.StoreContact ssc              
                  on   pc.ContactID = ssc.ContactID    
 ),  B
 as  (    
     select  CustomerId        
         ,   Name
         ,   (                
                 select  ContactName = REPLACE(A.ContactName, ' ','|')
                 from    A
                 where   CustomerID = ss.CustomerID 
                 for     xml auto, elements, type 
             ) as ContactList1    
     from   Sales.Store ss
)
select  B.*
   ,    ContactList2 = B.ContactList1.query('data(*)')
from    B

This produces results like this:


Capture2


To convert this to a comma-separated list, we just need to do a conversion and some REPLACE statements:

With A
as  (
        Select    ContactName  = FirstName + ' ' + ISNULL(MiddleName + ' ','') + LastName
             ,    ssc.CustomerID
        from      Person.Contact pc
                        inner join
                  Sales.StoreContact ssc
                        on   pc.ContactID = ssc.ContactID
    )
,   B
as  (
    select  CustomerId
        ,   Name
        ,   (
                select  ContactName = REPLACE(A.ContactName, ' ','|') 
                from    A
                where   CustomerID = ss.CustomerID
                for     xml auto, elements, type
            ) as ContactList1
    from   Sales.Store ss
)
,   C
as  (
select  B.*
   ,    ContactList2 = B.ContactList1.query('data(*)')
from    B
)
select  C.CustomerID
    ,   C.Name
    ,   ContactList = REPLACE(
                        REPLACE(
                            CONVERT(varchar(max), C.ContactList2)
                        ,   ' '  -- convert spaces ...
                        ,   ', ' -- to commas (followed by spaces, if youlike)
                        )
                      , '|'  -- replace the pipe char ...
                      , ' '  -- with the original spaces!
                      )
from    C

… and the result looks like this:


capture3 Voilà: inline comma-separated lists!

Labels: , ,

0 Comments:

Post a Comment

<< Home