A Request
But please, don't call it "OINK".
Software development and such
CREATE PROCEDURE PrintTrace1@Text nvarchar(max)ASBEGINDECLARE @UserData BINARY(8000) = 0DECLARE @UserInfo NVARCHAR(256) = SUBSTRING(@Text,1,256)PRINT @TextEXEC sp_trace_generateevent 82, @UserInfo, @UserDataENDGO
DECLARE @PrintText varchar(256) = 'This is a testThis is a testThis is a testThis is a testThis is a test'EXEC PrintTrace1 @PrintText
Let’s suppose you want to return some data in the following extremely unpivoted format:
SELECT ID, ColumnName, ValueFROM .. something ..
Your source table or query has (let’s say) about 30 different columns. How can you quickly produce something in the format above?
Well, one way is a bunch of UNION statements, like so (using the AdventureWorks database):
select ID = ProductId, ColumnName = 'Name', Value = Namefrom Production.Productunionselect ID = ProductId, ColumnName = 'ProductNumber', Value = ProductNumberfrom Production.Productunionselect ID = ProductId, ColumnName = 'MakeFlag', Value = CONVERT(char(1), MakeFlag)from Production.Product
This works fine, if you have only a few columns you want to return in this format. If you have 20-30, as with the Production.Product table, that’s a lot of UNION statements. And copying the code to use for another table is basically a waste of time: you’ll have to change so many things you might as well just start typing from scratch.
Another way is using the UNPIVOT statement. It produces the same sort of result, and looks like this:
select *from (Select ProductId,ProductLine,Class,Stylefrom Production.Product) as pvUNPIVOT (Value for FieldName IN (ProductLine, Class, Style)) p
But all the fields you want to unpivot need to be of the exact same type. For strings, that means the same type AND the same length. (The ProductLine, Class, and Style fields above are are nchar(2)). Plus, you need to list all the fields to which you want the UNPIVOT to apply. If you have 20 or 30 fields, that’s a lot of fields to list.
So, what we need is a way to produce a list of this form, that can adapt to differing column types, can be applied to all fields or some fields in a table with a minimum of column-name-typing, and which can be copied-and-repurposed without requiring a complete retype.
Here’s one way to do it, using XML. In terms of performance, it is the slowest of the three options, but in terms of programmer time, it’s definitely the fastest and most versatile.
with Aas (-- We're going to return the product ID, plus an XML version of the-- entire record.select ProductId, (Select *from Production.Productwhere ProductID = pp.ProductIdfor xml auto, type) as Xfrom Production.Product pp), Bas (-- We're going to run an Xml query against the XML field, and transform it-- into a series of name-value pairs. But X2 will still be a single XML-- field, associated with this product ID.select ProductId, X.query('for $f in Production.Product/@*return<product name="{ local-name($f) }" value="{ data($f) }" />') as X2from A), Cas (-- We're going to run the Nodes function against the X2 field, splitting-- our list of "product" elements into individual nodes. We will then use-- the Value function to extract the name and value.select B.ProductId as ID, norm.product.value('@name', 'varchar(max)') as Name, norm.product.value('@value', 'varchar(max)') as Valuefrom Bcross apply B.X2.nodes('/product') as norm(product))-- Select our results.select *from C
Note a couple of things:
1. We’re using (and returning) all the fields from Production.Product, but we didn’t have to list the field names.
2. We’re using the term “Product” in C and referring to the XML node Production.Product, but those names are arbitrary: we could have formed the XML in step A in such a way that it had a generic name.
3. If you wanted to return fewer columns, you have a couple of choices: you can specify the particular columns you want in the subquery in step A, or you could filter them out at the final step. (Since the column names are in the field Name, you could even join to sys.all_columns or some other list of column names, so as to filter by type or some other criteria.)
4. We are returning the Value column as varchar(max); this is because anything can be expressed as a varchar. But if we were returning, say, all ints, we could specify in the value ()function that the data should be returned as an type int.
5. Nulls are automatically filtered out.
Here’s what the results look like, by the way:
Not bad, I think.
This series of posts is about using SQL Server’s Xml features to do string manipulations. Part I talked about creating comma-separated (CSV) lists from SQL database data. Part II, this post, will talk about parsing a comma-separated list.
Until Xml-typed parameters came along in SQL Server 2005, the only easy way to pass a set of information into a stored procedure was by passing a string parameter that represented serialized data. The two most common forms were an Xml fragment, and a CSV string. Parsing these CSV strings usually required a custom table-valued function that used T-SQL string manipulation operations to parse out the information and build the return table.
This is a perfectly good solution, but there’s a faster one. I’ll give you the code for it, right up front:
CREATE FUNCTION ParseCsv(@Csv varchar(max))RETURNS TABLEASRETURN(-- We're being passed in a CSV string; we'll replace the commas with-- endtag/start tag pairs.WITH AAS (SELECT REPLACE(@Csv, ',', '</dummy><dummy>') as XmlFrag1), BAS (-- We're building an XML fragment out of the CSV, in fact.SELECT '<dummy>' + XmlFrag1 + '</dummy>' as XmlFrag2FROM A), CAS (-- Convert it actually to type XMLSELECT CONVERT(xml, XmlFrag2) as XmlFrag3FROM B)-- For more on XmlFragment, see below. Right now, we want the-- content of the fragment, which we're getting with the value()-- function, and converting it to "int".SELECT D.XmlFragment.value('.', 'int') as ValueFROM C-- When you want to access the Xml field in every record, and-- apply the "nodes" function to it, you need to remember that-- nodes() is a *function*, and thus can be used with CROSS APPLY-- to run the function for every record in the record source (C).-- Hence, we run nodes() on C.XmlFrag3 for every record, and call-- the resulting set of data D, with the results of "nodes" being-- the field XmlFragment.CROSS APPLY C.XmlFrag3.nodes('/dummy') as D(XmlFragment))
You’ll want to test it, so here’s code (using the AdventureWorksLT database) that generates a CSV string.
DECLARE @csv varchar(max)WITH Aas (SELECT (SELECT ProductIDFROM SalesLT.ProductFOR XML AUTO, ELEMENTS, TYPE) as ProductXml), BAS (SELECT A.ProductXml.query('data(*)') AS ProductSeriesFROM A)SELECT @csv = REPLACE(CONVERT(varchar(max), ProductSeries),' ',',')FROM BSELECT @csvselect * from ParseCsv(@csv)
And here’s the result you get from the test. The first recordset is just the select on @Csv, the second one is the result of the function.
In tests, this consistently ran in half the time of a T-SQL string manipulation solution, regardless of the number of values in the CSV.
Contrary to a lot of what you’ll find said online, Linq to SQL seems to work fairly happily with tables from different SQL Server databases (but the same server) in the same model. You just have to put the full name (including the database) into the Name property of the entities representing tables in the “foreign” database. You can even build associations between tables from different databases: Linq to SQL will take your word for it that those relationships exist.
A caveat: I have only tested this on VS 2008 (what I happen to be working with right now) and I haven’t attempted anything fancy with updates or inserts (since what I’m working on doesn’t require them). But, since I found people saying emphatically that cross-database modeling couldn’t be done in Linq to SQL, I wanted to report that it can.
select CustomerId, Name, (Select ContactName = FirstName + ' ' + ISNULL(MiddleName + ' ','') + LastNamefrom Person.Contact pcinner joinSales.StoreContact sscon pc.ContactID = ssc.ContactIDwhere ssc.CustomerID = ss.CustomerIDfor xml auto, elements, type) as ContactList1from Sales.Store ss
With Aas (Select ContactName = FirstName + ' ' + ISNULL(MiddleName + ' ','') + LastName, ssc.CustomerIDfrom Person.Contact pcinner joinSales.StoreContact sscon pc.ContactID = ssc.ContactID), Bas (select CustomerId, Name, (select ContactName = REPLACE(A.ContactName, ' ','|')from Awhere CustomerID = ss.CustomerIDfor xml auto, elements, type) as ContactList1from Sales.Store ss)select B.*, ContactList2 = B.ContactList1.query('data(*)')from B
With Aas (Select ContactName = FirstName + ' ' + ISNULL(MiddleName + ' ','') + LastName, ssc.CustomerIDfrom Person.Contact pcinner joinSales.StoreContact sscon pc.ContactID = ssc.ContactID), Bas (select CustomerId, Name, (select ContactName = REPLACE(A.ContactName, ' ','|')from Awhere CustomerID = ss.CustomerIDfor xml auto, elements, type) as ContactList1from Sales.Store ss), Cas (select B.*, ContactList2 = B.ContactList1.query('data(*)')from B)select C.CustomerID, C.Name, ContactList = REPLACE(REPLACE(CONVERT(varchar(max), C.ContactList2), ' ' -- convert spaces ..., ', ' -- to commas (followed by spaces, if youlike)), '|' -- replace the pipe char ..., ' ' -- with the original spaces!)from C
When it comes to XML, SQL Server won't always eat its own cooking.
What I mean by that is that it's quite possible to generate syntactically valid XML from valid SQL Server data, via SQL Server queries, that is semantically invalid. SQL Server will deliver this data, but it won’t consume it or allow it to be assigned to an XML-type variable.
Here's how. Suppose you have the bad luck to have some non-printable characters in some of your text fields. XML only tolerates a few non-printable characters -- although I'd have to look it up, I think the ASCII characters for 9, 10, and 13 are the only ones it does tolerate.
So consider this SQL:
create table BadChars(TextData varchar(255))insert into BadCharsvalues(char(4)) -- bad char, (char(6)) -- bad char, (char(10)) -- good non-printable char, (char(13)) -- good non-printable char, (char(33)) -- regular character
Let's retrieve this data as XML:
select *from BadCharsfor xml auto, root('BadCharTable')
This will run fine, and return you a chunk of XML in SSMS, like so:
<BadCharTable><BadChars TextData=""><BadChars TextData=""><BadChars TextData=" "><BadChars TextData=" "><BadChars TextData="!"><BadCharTable>
Just try doing anything with it, though:
declare @xml Xmlselect @xml =(select *from BadCharsfor Xml auto, root('BadCharTable'))
The error you'll get is this:
Msg 9420, Level 16, State 1, Line 3XML parsing: line 1, character 40, illegal xml character
That's because the XML it so happily generated a few moments ago is not valid, due to those non-printable characters, that even in escaped form are not allowed.
So SQL Server will return invalid XML in a result set, if the source data contains invalid chars. It just won't consume it, or allow it to be assigned to an XML variable.