Saturday, October 30, 2010

Useful Things to Know, #427

When it comes to XML, SQL Server won't always eat its own cooking.

What I mean by that is that it's quite possible to generate syntactically valid XML from valid SQL Server data, via SQL Server queries, that is semantically invalid.  SQL Server will deliver this data, but it won’t consume it or allow it to be assigned to an XML-type variable.  

Here's how.  Suppose you have the bad luck to have some non-printable characters in some of your text fields. XML only tolerates a few non-printable characters -- although I'd have to look it up, I think the ASCII characters for 9, 10, and 13 are the only ones it does tolerate.

So consider this SQL:

create table BadChars
TextData varchar(255)
insert into BadChars
  (char(4)) -- bad char
, (char(6)) -- bad char
, (char(10)) -- good non-printable char
, (char(13)) -- good non-printable char
, (char(33)) -- regular character

Let's retrieve this data as XML:

select *
from BadChars
for xml auto, root('BadCharTable')

This will run fine, and return you a chunk of XML in SSMS, like so:

  <BadChars TextData="&#4;">
  <BadChars TextData="&#6;">
  <BadChars TextData="&#10;">
  <BadChars TextData="&#13;">
  <BadChars TextData="!">

Just try doing anything with it, though:

declare @xml Xml
select @xml =
(select *
 from BadChars
 for Xml auto, root('BadCharTable')

The error you'll get is this:

Msg 9420, Level 16, State 1, Line 3
XML parsing: line 1, character 40, illegal xml character

That's because the XML it so happily generated a few moments ago is not valid, due to those non-printable characters, that even in escaped form are not allowed.

So SQL Server will return invalid XML in a result set, if the source data contains invalid chars. It just won't consume it, or allow it to be assigned to an XML variable.

Monday, October 25, 2010

The Joy of HashSets

A while back, I mentioned to a colleague that I was going to write a post about hash sets and their general coolness.  He said that hash sets were old hat:  every C++ developer had been using them for years.

“Yes,” I said, “but not all of us were C++ developers!”

Here, then, is the MSDN description:  HashSet<T> Class.  In brief, a hash set is a collection, one that you can very quickly check for membership.  It’s a way to store stuff you find important, when you’re interested in membership in the collection, but not (particularly) in the data or attributes of what you’re storing. 

The key difference between the hash set and its sibling, the hash table, is that with the hash set you’re not looking anything up by key.  You’re just looking it up by itself, or by key without any interest in what the key refers to.  

So what’s so cool about yet another collection class?  This:  those suckers are fast.  Consider the following scenario:

  1. You are processing some data;
  2. You don’t want to process duplicates;
  3. Ergo, you need to keep a list of what you’ve already processed.

This – keeping a collection of stuff you find important, that you can check for membership – is what hash sets are for.   The items in them are in not in any order (although, if you need that, consider the SortedSet class).   And lookups are fast.   I shaved a large percentage of time off one application, just by switching from using a List to store my processed ids to a HashSet.   

Friday, October 22, 2010

Reverse foreign key relationships

Sometimes you have a master table and some child tables, which have foreign key relationships to the master table.   The child table can’t have a value for the foreign key that is not in the master table, and the master table can’t delete a record that is referred to by records in the child table.  All this is well and good.

But sometimes, a correct representation of the entities the data is embodying requires that every master table record have one (or more) records in every child table.  An example might be a table of products for sale and a table of prices for those products.   A price must refer to a specific product, but every product must have at least one price.

How does one enforce this in the database?

Now you can say, “That’s business logic!  That doesn’t belong in the database!”  But I tend to feel that there’s a fuzzier line between business logic and database integrity rules than is sometimes acknowledged.  Why is “a price must refer to a product” a question of database integrity, but “a product must have at least one price” a business logic issue?

Sunday, October 17, 2010

Things I’d Like to See, #766

I’d like a class derived from IEnumerable<T> to which it was possible to add an index.

Hear me out.  I have a project that involves generating and capturing a lot of data.  It’s not being retrieved from a database, and it’s not being saved to a database:  it’s being generated, analyzed, and ultimately discarded. 

While it lives, it lives in some collection classes, Lists and Lookups and the like.  While I’m analyzing it, I often have to summarize in multiple different ways, and do cross-verification and lookups by properties of the classes in the collections.  It’s (relatively) slow.  Any given lookup is, of course, lightning fast, but if it occurs over a big enough data set enough times, it gets slow.

Well, why not use a literal DataSet, then, you ask?  Those have indexes and primary keys and so on.  And that’s true.  But I want to preserve my strongly typed properties, and create derived (anonymous) summary types.  The syntax of doing that from a DataSet can get complex.  And as for typed DataSets, they have their value, but they’re tough to customize, even with the wonders that are partial classes.

Moreover, if I went the DataSet route, I’d need to define a DataTable for the result of every Linq summary class that I create, rather than using an anonymous type.  That does away with all the benefits of anonymous types, including the rather significant one that the compiler will adapt to any new fields I create in the source types (the ones being summarized), and either change the definition of the anonymous type or let me know I have a problem.  If I’m relying on DataTable that I create programmatically (or even with an XSD), I don’t find out I have a conflict in definitions until run time.  (Hopefully, test time, but once in a while something slips by you.) 

So:  I’d like a IEnumerable<T>-based type that supports adding indexes for performance purposes. 

Friday, October 15, 2010

Useful Things to Know, #426

You can add an extension  method to an enumerated type.

You can.  The code looks like that of any other extension method:

  public enum SalesCallObjective
        NewCustomer = 1,
        RetainCustomer = 2,
        KeepInTouch = 3
    public static class SalesCallObjectiveExtensions
        public static ProductCategory ToProductCategory(this SalesCallObjective value)
            switch (value)
                case SalesCallObjective.NewCustomer:
                    return ProductCategory.CompetitiveProduct;
                case SalesCallObjective.RetainCustomer:
                    return ProductCategory.UpgradeProduct;
                case SalesCallObjective.KeepInTouch:
                    return ProductCategory.PromotionalProduct;
                    return ProductCategory.Unknown;

The above is just an example of what you can do, not a real-life attempt to solve a business problem.  But it does show one potential use:  mapping an enum from one domain (sales call objective) to another. 

Friday, October 08, 2010

Malfunctioning scrollbars in Panel/ScrollableControl with AutoScroll turned on

I have recently been working on a WinForms app that involved a Panel control with dynamically-added child controls.  AutoScroll was turned on.  When you scrolled vertically, the “thumb” changed size and position seemingly at random:  large amounts of white space were added as you scrolled to the bottom, which would then disappear as you scrolled in the other direction.   Sometimes the scrollbar would disappear entirely.  The “thumb” seemed to jump around, as if the panel were somehow changing size every time we scrolled.

Googling turned up a lot of complaints and a few workarounds, but none of them worked for us.  (Our dynamic controls were capable of changing size, and most of the workarounds didn’t work with that.  )

Well, after some investigation with Reflector, I think I’ve figured it out.  There appears to be a bug with ScrollableControl, the base of the Panel control.   Scrolling vertically causes the DisplayRectangle property’s Height to increase (incorrectly), which in turn causes the Maximum property of the VerticalScroll object to increase.

What can you do about it?  Well, you can’t change DisplayRectangle, which is read-only, and if you have AutoScroll turned on, attempts to set the VerticalScroll.Maximum property are ignored.  But you can do the following:

If you’re hosting a Panel in something else, put the following code in your Scroll event handler.  If you’re deriving a control from Panel, you’ll have to adapt it slightly for the OnScroll method  (probably by replacing references to “pnl” with “this”):

System.Windows.Forms.Panel pnl = sender as System.Windows.Forms.Panel;
int maxHeight = pnl.Controls.Cast<Control>().Sum(c => c.Height);
if (pnl.VerticalScroll.Maximum > maxHeight)
    MethodInfo mi = typeof(ScrollableControl).GetMethod("SetDisplayRectangleSize", BindingFlags.Instance | BindingFlags.NonPublic, null, new Type[] { typeof(Int32), typeof(Int32) }, null);   
    if (mi != null)
        mi.Invoke(pnl, new object[] { pnl.DisplayRectangle.Width, maxHeight });
    mi = typeof(ScrollableControl).GetMethod("SyncScrollbars", BindingFlags.Instance | BindingFlags.NonPublic, null, new Type[] { typeof(bool) }, null);
    if (mi != null)
        mi.Invoke(pnl, new object[] { pnl.AutoScroll });

ETA:  Actually, depending on how your child controls are arranged, you might want to calculate maxHeight as Max(c => c.Bottom) rather than Sum(c => c.Height).  Ours were arranged in a single column with no intervening space between them, but your situation might be different.