I could have sworn I wrote a collation article a while ago. Anyway, here are some links:
  • an article about getting and setting the collation for Transact SQL databases and tables:TSQL to change collation of database
  • A list of available collations
  • if you get a collation conflict error, you need to collate every join or where like in this example:select * from profile, userinfo
    where profile.custid collate database_default = userinfo.custid collate database_default

I was trying to do this update, but I wanted done in a specific order. I have found two solutions. One is to use a Common Table Expression, which would look like this:
WITH q AS (
SELECT x,y,z FROM SomeTable
ORDER BY z
) UPDATE q
SET x=y+z
Strangely enough this updates the SomeTable table in the order of z.

However, I had to do it in SQL 2000, so Common Table Expressions were not available. The only other solution I could think of was a temporary table with an identity column:
CREATE #temp (
realid INT IDENTITY(1,1),
x INT,
y INT,
z INT
)
INSERT INTO #temp (x,y,z)
SELECT x,y,z FROM SomeTable
ORDER BY z
UPDATE #temp SET x=y+z


The most difficult part of the task was to not think procedurally. SQL is a functional language and one must think differently. Weird is I knew that I had to think in a functional way and I said it out loud when I started the task. It took a few hours of trying to create recursive functions or emulate loops in SQL before I could change my thinking. Using SQL in a procedural way, even if possible, means using something in the wrong way.

It seems that there is a renewal of the non-relational database movement. Since I know nothing about the specifics I will list only a few links that I found relevant:
No to SQL? Anti-database movement gains steam
NOSQL debrief
Neo4j - The Benefits of Graph Databases

Of course, this "revolution" is only gaining momentum because of the huge quantity of data being moved around on the web. There can be no centralised system that could handle Google, Facebook, Amazon data sizes and therefore they all move to distributed systems. Tables are now a hinderance, rather than a programming advantage and object oriented schemas step forward.

Will that change our daily work as developers? Well, only if we decide to use some "live" database system to store our data. As proven by the previous "revolution", relational works pretty well for small systems or networks.

It was not a problem I encountered (maybe) but a blog post from Chris Brandsma from ElegantCode. Apparently, because of attempts by the SQL Server (not only Microsoft's, btw) to "optimize" a query based on input parameters, query execution time may vary immensely without any obvious reason. Imagine debugging that baby!

Anyway, here is the blog post, one important enough for Chris to repost it these last few days.

I have been working on a silly little project that involved importing files and then querying the data in a very specific way. And I wanted to do it with the latest technologies so I used The Entity Framework! (imagine a little glowing halo around that name and a choir in the background).

Well, how do I do an efficient import in Linq to Entities? I can't! At most I can instantiate a lot of classes and add them to the DataModel, then SaveChanges. In the background this translates to a lot of insert statements. So it occurred to me that I don't really need Entities here. All I needed is good old fashioned ADO.Net and a SqlBulkCopy object. So I used it like that. A bit of unfortunate translation of objects to a DataTable because the SqlBulkCopy class knowns how to import only a DataTable and I was set.

Ok, now back to the querying the data. I could have used ADO.Net, of course, and in this project, I would probably have been right, but I suspected the requirements for the project will change so I used Entities. It worked like a charm and yes, the requirements did get bigger and stranger as I went. But then I had to select the users that have amassed a number of rows in two related tables (or use a value in the user table) but only if the total number of amassed rows would satisfy a formula based on a string column in the user table that mapped to a certain value stored in the web.config a complicated query. I did it (with some difficulty) in Linq, then I had to solve all kind of weird issues like not being able to compare a class variable with an enum value because the number of types that can be used in a Linq to Entities query is pretty limited at the moment.

Well, was it the best way? I don't know. The generated SQL is something containing a lot of select from select from select sometimes 6 or 7 levels deep. Even the joining is done with select from tables. Shouldn't I have used a stored procedure instead?

To top it off, I listened to a podcast today about Object Databases. They do away with ORMs because there is no relational to begin with. The guy argued that if you need to persist objects, wouldn't an Object Database be more appropriate? And, as reporting would be a bitch when having to query large amounts of tabular data, shouldn't one use a Relational Database for this particular task in the project?

So this is it. It got me thinking. Is the database/data access layer the biggest golden hammer out there? Are we trying to use a single data access model and then build our applications around it like big twisting spaghetti golden nails?

I had this really old site that I was asked to "upgrade". Net 1.1 to 3.5. So I had to change old ADO.Net SqlConnection to Linq-to-Sql. A lot of unforseen issues. Sometimes it is close to impossible to recreate a simple SQL query without changing the database, as in the case of updating tables or views without primary keys.

Anyway, I got stuck in a very simple issue: How to sort a Linq query based on the string returned by the GridView Sorting event. After a lot of tries and Googling I found it was easier to use third party software (although it's more of a 2.5rd party software, as it is made by Scott Guthrie from Microsoft). Here is the link to the blog entry: Dynamic LINQ (Part 1: Using the LINQ Dynamic Query Library) . You can find there a sample project to download, with the LINQ Dynamic Library inside.

With this library and the OrderBy extension methods the code becomes:
var somethings=from something in db.Somethings....;
var data = somethings.OrderBy(Sort + " asc");
gv.DataSource=data;
gv.DataBind();

This post will be quite lengthy and it will detail my findings on best practices with SQL, specifically Microsoft SQL Server.

I started with an article written by Vyas Kondreddi in 2001: SQL Server TSQL Coding Conventions, Best Practices, and Programming Guidelines. In 2001 people were not microblogging!

Well, to summarize the article and bring it up to date a little, here are some of the most important points (in my view):
  • Decide upon a database naming convention, standardize it across your organization, and be consistent in following it. It helps make your code more readable and understandable.
  • Write comments in your stored procedures, triggers and SQL batches generously, whenever something is not very obvious.
  • Try to avoid server side cursors as much as possible.
    As Vyas Kondreddi himself says: "I have personally tested and concluded that a WHILE loop is always faster than a cursor"
  • Avoid the creation of temporary tables while processing data as much as possible, as creating a temporary table means more disk I/O. Consider using advanced SQL, views, SQL Server 2000 table variable, or derived tables, instead of temporary tables.
    This is interesting, because I usually use a lot of temporary tables in my stored procedures to make the code more orderly. I guess that in the case of SQL Server 2005 and later one can always use Common Table Expressions to make the code more readable. For SQL 2000 and such I found two interesting articles about not using temporary tables and replacing them with either derived tables (selects in selects) or with table variables, although they do have some limitations, thoroughly explained in the latter post. Here are the links: Eliminate the Use of Temporary Tables For HUGE Performance Gains and Should I use a #temp table or a @table variable?
  • Try to avoid wildcard characters at the beginning of a word while searching using the LIKE keyword, as that results in an index scan, which defeats the purpose of an index.
    For a short analysis of index scans go to SQL SERVER - Index Seek Vs. Index Scan (Table Scan).
  • Use the graphical execution plan in Query Analyzer or SHOWPLAN_TEXT or SHOWPLAN_ALL commands to analyze your queries.
  • Use SET NOCOUNT ON at the beginning of your SQL batches, stored procedures and triggers in production environments, as this suppresses messages like '(1 row(s) affected)' after executing INSERT, UPDATE, DELETE and SELECT statements. This improves the performance of stored procedures by reducing network traffic.
  • Use the more readable ANSI-Standard Join clauses instead of the old style joins.
  • Incorporate your frequently required, complicated joins and calculations into a view so that you don't have to repeat those joins/calculations in all your queries.
  • Use User Defined Datatypes if a particular column repeats in a lot of your tables, so that the datatype of that column is consistent across all your tables.
    Here is a great article about Sql UDTs (not the new .NET CLR types): What's the Point of [SQL Server] User-Defined Types?. Never used them, myself, but then again I am not an SQL guy. For me it seems easier to control data from .Net code
  • Do not let your front-end applications query/manipulate the data directly using SELECT or INSERT/UPDATE/DELETE statements. Instead, create stored procedures, and let your applications access these stored procedures.
    I am afraid I also fail at this point. I don't use stored procedures for simple actions like selecting a specific item or deleting a row. Many time I have to build search pages with lots of parameters and I find it really difficult to add a variable number of parameters to a stored procedure. For example a string that I have to split by spaces and search for all found words. Would it be worth to use a stored procedure in such a situation?
  • Avoid dynamic SQL statements as much as possible. Dynamic SQL tends to be slower than static SQL, as SQL Server must generate an execution plan every time at runtime.
    Personally, I never use dynamic SQL. If I need to create an SQL string I do it from .Net code, not from SQL.
  • Consider the following drawbacks before using the IDENTITY property for generating primary keys. IDENTITY is very much SQL Server specific, and you will have problems porting your database application to some other RDBMS. IDENTITY columns have other inherent problems. For example, IDENTITY columns can run out of numbers at some point, depending on the data type selected; numbers can't be reused automatically, after deleting rows; and replication and IDENTITY columns don't always get along well.
    So, come up with an algorithm to generate a primary key in the front-end or from within the inserting stored procedure. There still could be issues with generating your own primary keys too, like concurrency while generating the key, or running out of values. So, consider both options and go with the one that suits you best.
    This is interesting because I always use identity columns for primary keys. I don't think a data export or a database engine change justify creating a custom identity system. However I do have to agree that in the case that data is somehow corrupted a GUID or some other identifier would be more useful. I am sticking with my IDENTITY columns for now.
  • Use Unicode datatypes, like NCHAR, NVARCHAR, or NTEXT.
  • Perform all your referential integrity checks and data validations using constraints (foreign key and check constraints) instead of triggers, as they are faster.
  • Always access tables in the same order in all your stored procedures and triggers consistently. This helps in avoiding deadlocks. Other things to keep in mind to avoid deadlocks are: Keep your transactions as short as possible. Touch as few data as possible during a transaction. Never, ever wait for user input in the middle of a transaction. Do not use higher level locking hints or restrictive isolation levels unless they are absolutely needed. Make your front-end applications deadlock-intelligent, that is, these applications should be able to resubmit the transaction incase the previous transaction fails with error 1205. In your applications, process all the results returned by SQL Server immediately so that the locks on the processed rows are released, hence no blocking.
    I don't have much experience with transactions. Even if I would need transactions in some complex scenarios, I would probably use the .Net transaction system.
  • Offload tasks, like string manipulations, concatenations, row numbering, case conversions, type conversions etc., to the front-end applications.
    Totally agree, except the row numbering, where SQL 2005 added all those nice Getting the index or rank of rows in SQL Server 2005 aggregate ranking options
  • Always add a @Debug parameter to your stored procedures. This can be of BIT data type. When a 1 is passed for this parameter, print all the intermediate results, variable contents using SELECT or PRINT statements and when 0 is passed do not print anything. This helps in quick debugging stored procedures, as you don't have to add and remove these PRINT/SELECT statements before and after troubleshooting problems.
    Interesting, I may investigate this further, although the SQL debugging methods have improved significantly since the article was written.
  • Make sure your stored procedures always return a value indicating their status. Standardize on the return values of stored procedures for success and failures. The RETURN statement is meant for returning the execution status only, but not data. If you need to return data, use OUTPUT parameters
  • If your stored procedure always returns a single row resultset, consider returning the resultset using OUTPUT parameters instead of a SELECT statement, as ADO handles output parameters faster than resultsets returned by SELECT statements.
  • Though T-SQL has no concept of constants (like the ones in the C language), variables can serve the same purpose. Using variables instead of constant values within your queries improves readability and maintainability of your code.



The next stop was SQL Server Best Practices from Microsoft.

Here are the articles I found most important, covering stuff from testing the I/O system of the system you want to install SQL server to up to Database backup, mirroring and maintainance:
Predeployment I/O Best Practices
SQL Server 2005 Deployment Guidance for Web Hosting Environments
SQL Server 2005 Security Best Practices - Operational and Administrative Tasks
Comparing Tables Organized with Clustered Indexes versus Heaps
Troubleshooting Performance Problems in SQL Server 2005
Implementing Application Failover with Database Mirroring
SQL Server 2005 Waits and Queues
TEMPDB Capacity Planning and Concurrency Considerations for Index Create and Rebuild
The Impact of Changing Collations and of Changing Data Types from Non-Unicode to Unicode
XML Best Practices for Microsoft SQL Server 2005
Performance Optimizations for the XML Data Type in SQL Server 2005
Top 10 Hidden Gems in SQL Server 2005

Lastly some links that I will not go in depth on:

SQL Server 2000 Best Practices
SQL SERVER - 2005 Best Practices Analyzer Tutorial - Sample Example describes the Microsoft Best Practices Analyser application. I tried it myself, it's not much. It touches mainly on the maintainance and security issues that I don't really concern myself with.
Top 10 Best Practices for Building a Large Scale Relational Data Warehouse. I don't think I will need it soon, but it is a short and interesting read.
SQL Server Pre-Code Review Tips. This Pinal Dave guy is pretty cool. He seems like a good resource for SQL related issues.
CMS Database Administration SQL Server Standards, a set of SQL coding standards for a medical government agency.

I am sure there are a lot of interesting resources on the net. I will update this post with new information once I get to it.

A little used thingie on the SqlConnection object called the InfoMessage event fires whenever there were info messages (duh!) from the last query or stored procedure execution. That means errors, of course, but it also means warnings and simple prints! You get where I am going with this?

Instead of changing stored procedures, datalayers and code whenever I need to get some new information from SQL, I can just add some nicely formatted PRINT commands and get all the information I need! Here is some code:

using System;
using System.Collections.Generic;
using System.Data.SqlClient;
using System.Windows.Forms;

namespace SqlPrintCommands
{
public partial class Form2 : Form
{
public Dictionary<string, string> values;

public Form2()
{
InitializeComponent();
values = new Dictionary<string, string>();
}

private void button1_Click(object sender, EventArgs e)
{
SqlConnection connection =
new SqlConnection("[connectionString]");
connection.Open();
connection.InfoMessage += sc_InfoMessage;
SqlCommand comm =
new SqlCommand("pr_Test", connection);
comm.ExecuteNonQuery();
connection.Close();
string s = "";
foreach (KeyValuePair<string, string> pair in values)
{
s += string.Format("{0} : {1}\r\n",
pair.Key, pair.Value);
}
label1.Text = s;
}

private void sc_InfoMessage(object sender,
SqlInfoMessageEventArgs e)
{
string commandPrefix = "Eval: ";
foreach (SqlError err in e.Errors)
{
if ((err.Message ?? "").StartsWith(commandPrefix))
{
string command =
err.Message.Substring(commandPrefix.Length);
string[] cmd = command.Trim().Split('=');
string commandArgument = cmd[0];
string commandValue = cmd[1];
values[commandArgument] = commandValue;
}
}
}
}
}


In this scenario I have a simple form with a button and a label. I execute a pr_Test stored procedure and then I parse the messages it returns. If the messages are of the format
Eval: Name=Value
I store the keys and values in a Dictionary. Not the nicest code, but it's for demo purposes.

So, you want to know the count of whatever operation you executed? Add
PRINT 'Eval: RowCount='+cast(@@rowcount as varchar)
in your stored procedure. Pretty cool huh?

Unfortunately I haven't been able to send messages asynchronously, even if the connection was async and the running was async and the messages were generated with
RAISERROR('message',1,1) WITH NOWAIT
. BTW, who is the idiot that spelled RAISERROR with only one E? What's a Rror and why would I raise it?

Update 19 February 2016:
I've done the test again, using another computer and .Net 4.6.1. The speed of filling the DataTableReplacement class given at the end of the article, plus copying the data into a DataTable object is 30% faster than using a DataTable directly with BeginLoadData/EndLoadData and 50% faster than using DataTable without the LoadData methods.

Now for the original post:

It was about time I wrote a smashing IT entry. Here is to the obnoxious DataTable object, something about I have written before of bugs and difficulty in handling. Until now I haven't really thought about what kind of performance issues I might face when using it. I mean, yeah, everybody says it is slow, but how slow can it be? Twice as slow? Computers are getting faster and faster, I might not need a personal research into this. I tried to make a DataTable replacement object once and it was not really compatible with anything that needed DataTables so I gave up. But in this article I will show you how a simple piece of code became 7 times faster when taking into account some DataTable issues.

But let's get to the smashing part :) I was using C# to transform the values in a column from a DataTable into columns. Something like this:

NameColumnValue
GeorgeMoney100
GeorgeAge31
GeorgeChildren1
JackMoney150
JackAge26
JackChildren0
JaneMoney300
JaneAge33
JaneChildren2



and it must look like this:

NameMoneyAgeChildren
George100311
Jack150260
Jane300332



I have no idea how to do this in SQL, if you have any advice, please leave a comment.
Update: Here are some links about how to do it in SQL and SSIS:
Give the New PIVOT and UNPIVOT Commands in SQL Server 2005 a Whirl
Using PIVOT and UNPIVOT
Transposing rows and columns in SQL Server Integration Services

Using PIVOT, the SQL query would look like this:

SELECT * 
FROM #input
PIVOT (
MAX([Value])
FOR [Column]
IN ([Money],[Age],[Children])
) as pivotTable


Anyway, the solution I had was to create the necessary table in the code behind add a row for each Name and a column for each of the distinct value of Column, then cycle through the rows of the original table and just place the values in the new table. All the values are present and already ordered so I only need to do it using row and column indexes that are easily computed.

The whole operation lasted 36 seconds. There were many rows and columns, you see. Anyway, I profiled the code, using the great JetBrains dotTrace program, and I noticed that 30 seconds from 36 were used by DataRow.set_Item(int, object)! I remembered then that the DataTable object has two BeginLoadData and EndLoadData methods that disable/enable the checks and constraints in the table. I did that and the operation went from 36 to 27 seconds.

Quite an improvement, but the bottleneck was still in the set_Item setter. So, I thought, what will happen if I don't use a DataTable at all. After all, the end result was being bound to a GridView and it, luckily, knows about object collections. But I was too lazy for that, as there was quite a complicated binding code mess waiting for refactoring. So I just used a List of object arrays instead of the DataTable, then I used DataTable.Rows.Add(object[]) from this intermediary list to the DataTable that I originally wanted to obtain. The time spent on the operation went from... no, wait

The time spent on the operation went from the 27 seconds I had obtained to 5! 5 seconds! Instead of 225.351 calls to DataRow.set_Item, I had 1533 calls to DataRowCollection.Add, from 21 seconds to 175 miliseconds!

Researching the reflected source of System.Data.dll I noticed that the DataRow indexer with an integer index was going through

DataColumn column=_columns[index]; return this[column];

How bad can it get?! I mean, really! There are sites that recommend you find the integer index of table columns and then use them as integer variables. Apparently this is NOT the best practice. Best is to use the DataColumn directly!

So avoid the DataRow setter.

Update July 18, 2013:

Someone requested code, so here is a console application with some inline classes to replace the DataTable in GridView situations:

class Program
{
    static void Main(string[] args)
    {
        fillDataTable(false);
        fillDataTable(true);
        fillDataTableWriter();
        Console.ReadKey();
    }

    private static void fillDataTable(bool loadData)
    {
        var dt = new DataTable();
        dt.Columns.Add("cInt", typeof(int));
        dt.Columns.Add("cString", typeof(string));
        dt.Columns.Add("cBool", typeof(bool));
        dt.Columns.Add("cDateTime", typeof(DateTime));
        if (loadData) dt.BeginLoadData();
        for (var i = 0; i < 100000; i++)
        {
            dt.Rows.Add(dt.NewRow());
        }
        var now = DateTime.Now;
        for (var i = 0; i < 100000; i++)
        {
            dt.Rows[i]["cInt"] = 1;
            dt.Rows[i]["cString"] = "Some string";
            dt.Rows[i]["cBool"] = true;
            dt.Rows[i]["cDateTime"] = now;
        }
        if (loadData) dt.EndLoadData();
        Console.WriteLine("Filling DataTable"+(loadData?" with BeginLoadData/EndLoadData":"")+": "+(DateTime.Now - now).TotalMilliseconds);
    }

    private static void fillDataTableWriter()
    {
        var dt = new DataTableReplacement();
        dt.Columns.Add("cInt", typeof(int));
        dt.Columns.Add("cString", typeof(string));
        dt.Columns.Add("cBool", typeof(bool));
        dt.Columns.Add("cDateTime", typeof(DateTime));
        for (var i = 0; i < 100000; i++)
        {
            dt.Rows.Add(dt.NewRow());
        }
        var now = DateTime.Now;
        for (var i = 0; i < 100000; i++)
        {
            dt.Rows[i]["cInt"] = 1;
            dt.Rows[i]["cString"] = "Some string";
            dt.Rows[i]["cBool"] = true;
            dt.Rows[i]["cDateTime"] = now;
        }
        var fillingTime = (DateTime.Now - now).TotalMilliseconds;
        Console.WriteLine("Filling DataTableReplacement: "+fillingTime);
        now = DateTime.Now;
        var newDataTable = dt.ToDataTable();
        var translatingTime = (DateTime.Now - now).TotalMilliseconds;
        Console.WriteLine("Transforming DataTableReplacement to DataTable: " + translatingTime);
        Console.WriteLine("Total filling and transforming: " + (fillingTime+translatingTime));
    }
}

public class DataTableReplacement : IEnumerable<IEnumerable<object>>
{
    public DataTableReplacement()
    {
        _columns = new DtrColumnCollection();
        _rows = new DtrRowCollection();
    }

    private readonly DtrColumnCollection _columns;
    private readonly DtrRowCollection _rows;

    public DtrColumnCollection Columns
    {
        get { return _columns; }
    }

    public DtrRowCollection Rows { get { return _rows; } }

    public DtrRow NewRow()
    {
        return new DtrRow(this);
    }

    public DataTable ToDataTable()
    {
        var dt = new DataTable();
        dt.BeginLoadData();
        _columns.CreateColumns(dt);
        _rows.CreateRows(dt);
        dt.EndLoadData();
        return dt;
    }

    #region Implementation of IEnumerable

    public IEnumerator<IEnumerable<object>> GetEnumerator()
    {
        foreach (var row in _rows)
        {
            yield return row.ToArray();
        }
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }

    #endregion
}

public class DtrRowCollection : IEnumerable<DtrRow>
{
    private readonly List<DtrRow> _rows;

    public DtrRowCollection()
    {
        _rows = new List<DtrRow>();
    }

    public void Add(DtrRow newRow)
    {
        _rows.Add(newRow);
    }

    public DtrRow this[int i]
    {
        get { return _rows[i]; }
    }

    public void CreateRows(DataTable dt)
    {
        foreach (var dtrRow in _rows)
        {
            dt.Rows.Add(dtrRow.ToArray());
        }
    }

    #region Implementation of IEnumerable

    public IEnumerator<DtrRow> GetEnumerator()
    {
        return _rows.GetEnumerator();
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }

    #endregion
}

public class DtrRow
{
    private readonly object[] _arr;
    private readonly DataTableReplacement _dtr;

    public DtrRow(DataTableReplacement dtr)
    {
        _dtr = dtr;
        var columnCount = _dtr.Columns.Count;
        _arr = new object[columnCount];
    }

    public object this[string columnName]
    {
        get
        {
            var index = _dtr.Columns.GetIndex(columnName);
            return _arr[index];
        }
        set
        {
            var index = _dtr.Columns.GetIndex(columnName);
            _arr[index] = value;
        }
    }

    public object this[int columnIndex]
    {
        get
        {
            return _arr[columnIndex];
        }
        set
        {
            _arr[columnIndex] = value;
        }
    }

    public object[] ToArray()
    {
        return _arr;
    }
}

public class DtrColumnCollection
{
    private readonly Dictionary<string, int> _columnIndexes;
    private readonly Dictionary<string, Type> _columnTypes;

    public DtrColumnCollection()
    {
        _columnIndexes = new Dictionary<string, int>();
        _columnTypes = new Dictionary<string, Type>();
    }

    public int Count { get { return _columnIndexes.Count; } }

    public void Add(string columnName, Type columnType)
    {
        var index = _columnIndexes.Count;
        _columnIndexes.Add(columnName, index);
        _columnTypes.Add(columnName, columnType);
    }

    public int GetIndex(string columnName)
    {
        return _columnIndexes[columnName];
    }

    public void CreateColumns(DataTable dt)
    {
        foreach (var pair in _columnTypes)
        {
            dt.Columns.Add(pair.Key, pair.Value);
        }
    }
}


As you can see, there is a DataTableReplacement class which uses three other classes instead of DataColumnCollection, DataRowCollection and DataRow. For this example alone, the DtrRowCollection could have been easily replaced with a List<DtrRow>, but I wanted to allow people to replace DataTable wherever they had written code without any change to the use code.

In the example above, on my computer, it takes 1300 milliseconds to populate the DataTable the old fashioned way, 1000 to populate it with BeginLoadData/EndLoadData, 110 seconds to populate the DataTableReplacement. It takes another 920 seconds to create a new DataTable with the same data (just in case you really need a DataTable), which brings the total time to 1030. So this is the overhead the DataTable brings for simple scenarios such as these.

A bit slow on the wagon, but I didn't need this until now. It is all about creating a .NET assembly for use by the SQL Server. These are the quick and dirty steps to it.

C# Steps
1. Create a Class Library project (some links suggest an Sql Server Project, but that is not available for Visual Studio versions below Professional)
2. Add a public class
3. Add a public static method and decorate it with [SqlFunction]
4. Do not use static fields in the said class
Compile the assembly.

SQL Steps
1. Define the assembly in SQL:
CREATE ASSEMBLY MyAssembly FROM 'C:\SqlCLR\MyAssembly.dll'

2. Create the SQL function to use the method in the assembly:
CREATE FUNCTION MyUserDefinedFunction(
@s1 NVARCHAR(4000),@s2 NVARCHAR(4000) ... other parameters )
RETURNS FLOAT AS
EXTERNAL NAME MyAssembly.[MyNamespace.MyClass].MyUserDefinedFunction

3. Enable CLR execution in SQL Server:
EXEC sp_configure 'clr enabled', 1
GO
RECONFIGURE
GO

4. use the function like
SELECT dbo.MyUserDefinedFunction('test','test2'...)


Things to remember:
1. Make sure the parameter types are the same in the .NET method and the SQL function
- the float keyword in SQL means double in .NET! I have no idea what kind of SQL type you must use in your function to match a .NET float.
- the string in .NET is matched to nvarchar in SQL
- the bit is matched to a bool as expected
2. Whenever you change the DLL you must DROP all the functions, then DROP the assembly, then create it again. If there are no signature changes, I guess just replacing the dll file could work.

There are a lot of things to say about returning tables instead of single values or about defining user defined aggregate functions, but not in this post.

I am going to quickly describe what happened in the briefing, then link to the site where all the presentation materials can be found (if I ever find it :))

The whole thing was supposed to happen at the Grand RIN hotel, but apparently the people there changed their minds suddenly leaving the briefing without a set location. In the end the brief took place at the Marriott Hotel and the MSDN people were nice enough to phone me and let me know of the change.

The conference lasted for 9 hours, with coffee and lunch breaks, and also half an hour for signing in and another 30 minutes for introduction bullshit. You know the drill if you ever went to one of such events: you sit in a chair waiting for the event to start while you are SPAMMED with video presentations of Microsoft products, then some guy comes in saying hello, presenting the people that will do the talking, then each of the people that do the talking present themselves, maybe even thank the presenter at the beginning... like a circular reference! Luckily I brought my trusted ear plugs and PDA, loaded with sci-fi and tech files.

The actual talk began at 10:00, with Petru Jucovschi presenting as well as holding the first talk, about Linq and C# 3.0. He has recently taken over from Zoltan Herczeg and he has not yet gained the necessary confidence to keep crouds interested. Luckily, the information and code were reasonably well structured and, even if I've heard them before, held me watching the whole thing.

Linq highlights:
  • is new in .NET 3.0+ and it takes advantage of a lot of the other newly introduced features like anonymous types and methods, lambda expressions, expression trees, extension methods, object initializers and many others.
  • it works over any object defined as IQueryable<T> or IEnumerable (although this last thing is a bit of a compromise).
  • simplifies our way of working with queries, bring them closer to the .NET programming languages and from the just-in-time errors into the domain of compiler errors.
  • "out of the box" it comes with support for T-Sql, Xml, Objects and Datasets, but providers can be built (easily) for anything imaginable.
  • the linq queries are actually execution trees that are only run when GetEnumerator is called. This is called "deffered execution" and it means more queries can be linked and optimised before the data is actually required.
  • in case you want the data for caching purposes, there are ToList and ToArray methods available


Then there were two back-to-back sessions from my favourite speaker, Ciprian Jichici, about Linq over SQL and Linq over Entities. He was slightly tired and in a hurry to catch the plain for his native lands of Timisoara, VB, but he held it through, even if he had to talk for 2.5 hours straight. He went through the manual motions of creating mappings between Linq to SQL objects and actualy database data; it wouldn't compile, but the principles were throughly explained and I have all the respect for the fact that he didn't just drag and drop everything and not explain what happened in the background.

Linq to SQL highlights:
  • Linq to SQL does not replace SQL and SQL programming
  • Linq to SQL supports only T-SQL 2005 and 2008 for now, but Linq providers from the other DB manufacturers are sure to come.
  • Linq queries are being translated, wherever possible, to the SQL server and executed there.
  • queries support filtering, grouping, ordering, and C# functions. One of the query was done with StartsWith. I don't know if that translated into SQL2005 CLR code or into a LIKE and I don't know exactly what happends with custom methods
  • using simple decoration, mapping between SQL tables and C# objects can be done very easily
  • Visual Studio has GUI tools to accomplish the mapping for you
  • Linq to SQL can make good use of automatic properties and object initialisers and collection initialisers
  • an interesting feature is the ability to tell Linq which of the "child" objects to load with a parent object. You can read a Person object and load all its phone numbers and email addresses, but not the purchases made in that name


Linq to Entities highlights:
  • it does not ship with the .NET framework, but separately, probably a release version will be unveiled in the second half of this year
  • it uses three XML files to map source to destination: conceptual, mapping and database. The conceptual file will hold a schema of local object, the database file will hold a schema of source objects and the mapping will describe their relationship.
  • One of my questions was if I can use Linq to Entities to make a data adapter from an already existing data layer to another, using it to redesign data layer architecture. The answer was yes. I find this very interesting indeed.
  • of course, GUI tools will help you do that with drag and drop operations and so on and so on
  • the three level mapping allows you to create objects from more linked tables, making the internal workings of the database engine and even some of its structure irrelevant
  • I do not know if you can create an object from two different sources, like SQL and an XML file
  • for the moment Linq to SQL and Linq to Entities are built by different teams and they may have different approaches to similar problems


Then it was lunch time. For a classy (read expensive like crap) hotel, the service was really badly organised. The food was there, but you had to stay in long queues qith a plate in your hand to get some food, then quickly hunt for empty tables, the type you stand in front of to eat. The food was good though, although not exceptional.

Aurelian Popa was the third speaker, talking about Silverlight. Now, it may be something personal, but he brought in my mind the image of Tom Cruise, arrogant, hyperactive, a bit petty. I was half expecting him to say "show me the money!" all the time. He insisted on telling us about the great mathematician Comway who, by a silly mistake, created Conway's Life Game. If he could only spell his name right, tsk, tsk, tsk.

Anyway, technically this presentation was the most interesting to me, since it showed concepts I was not familiar with. Apparently Silverlight 1.0 is Javascript based, but Silverlight 2.0, which will be released by the half of this year, I guess, uses .NET! You can finally program the web with C#. The speed and code protection advantages are great. Silverlight 2.0 maintains the ability to manipulate Html DOM objects and let Javascript manipulate its elements.

Silverlight 2.0 highlights:
  • Silverlight 2.0 comes with its own .NET compact version, independent on .NET versions on the system or even on operating system
  • it is designed with compatibility in mind, cross-browser and cross-platform. One will be able to use it in Safari on Linux
  • the programming can be both declarative (using XAML) and object oriented (programatic access with C# or VB)
  • I asked if it was possible to manipulate the html DOM of the page and, being written in .NET, work significantly faster than the same operations in pure Javascript. The answer was yes, but since Silverlight is designed to be cross-browser, I doubt it is the whole answer. I wouldn't put it past Microsoft to make some performance optimizations for IE, though.
  • Silverlight 2.0 has extra abilities: CLR, DLR (for Ruby and other dynamic languages), suport for RSS, SOAP, WCF, WPF, Generics, Ajax, all the buzzwords are there, including DRM (ugh!)


The fourth presentation was just a bore, not worth mentioning. What I thought would enlighten me with new and exciting WCF features was something long, featureless (the technical details as well as the presenter) and lingering on the description would only make me look vengeful and cruel. One must maintain apparences, after all.

WCF highlights: google for them. WCF replaces Web Services, Remoting, Microsoft Message Queue, DCOM and can communicate with any one of them.

Nullable types are new to .NET 2.0 and at first glance they seem great. The idea that you can now wrap any type into a generic one that also allows for null values seems valuable enough, especially for database interactions and the dreaded SQL null.

However, having an object that can be both instantiated and null can cause a lot of issues. First of all, boxing! You can define a Nullable<int>, set it to null, then access one of its properties (HasValue). So suddenly a piece of code like obj=null; obj.Property=...; makes sense. But if you want to send it as a parameter for a method, one that receives an object and then does stuff with it, the object must represent the null value, which means it is no longer an instance of anything. Therefore you can't get the type of the variable that was passed to the method!

Quick code snippet:
int? i=null;
DbWrapper.Save("id",i);

With Save defined as:
public void Save(string name,object value)

Now, I want to know what kind of nullable type was sent to the method. I can't see that if the parameter signature is object, so I am creating another signature for the method:
public void Save(string name,Nullable value)

At this time I get an error: static types cannot be used as parameters. And of course I can't, because Nullable is a generic static type that needs the type of the wrapped value. So the type is actually Nullable<int>. My only solution now is to create a signature for every value type: integers, floating point values, booleans, chars and IntPtrs. String and Object are value types, but they accept null, so Nullable<string> is not used on them, but if you count all the other value types , there are 14 of them!

There is another option that I just thought of: a custom object that implicitly converts from a nullable. Then the method would use this object type as a parameter.
I tested it and it works. Here is the code for the object:
Click to show


Update:
There are issues regarding this object, mainly refering to constants. If you pass a constant value like the number 6 a method that expects either Object or NullableWrapper, will choose NullableWrapper. More than that, it will choose a NullableWrapper<byte> since the value is under 256. Adding signatures for int, double, etc causes Ambiguos reference errors. So my only solution so far is to consider the UnderlyingType of even Nullable<byte> as Int32. It is obviously a hack, but I haven't found a good programming solution to it, yet. If you have an idea, please let me know! Thanks.

Links:
Nullable Types (C# Programming Guide)
Check If Type is a Nullable Type - Convert Nullable Type to Underlying Type - C# 2.0 Generics

A while ago I wrote a little post about changing a paging GridView in order to show certain number of pages and page index, but extract from the database only the data you needed. I was looking there for something in SQL Server like the MySql LIMIT, the T-Sql equivalent of TOP, but which accepts two parameters instead of only one.

I just found out that there is a way of doing this in Sql Server 2005 by employing functions that return the position of a row, depending on a certain ordering and partition, called Ranking Functions. From the little googling I did, it seems that Microsoft borrowed this technique from Oracle, but one never knows, maybe it was the other way around.

I will write a short example on ranking functions, then link to the appropiate articles. Given this SQL query:
select *,
Row_Number() OVER (ORDER BY name) as [Index],
Rank() OVER (ORDER BY name) as Rank,
Dense_Rank() OVER (ORDER BY name) as DenseRank,
NTile(3) OVER (ORDER BY name) as NTile,
Row_Number() OVER (PARTITION BY nr ORDER BY name) as IndexNr,
Rank() OVER (PARTITION BY nr ORDER BY name) as RankNr,
Dense_Rank() OVER (PARTITION BY nr ORDER BY name) as DenseRankNr,
NTile(3) OVER (PARTITION BY nr ORDER BY name) as NTileNr
from test
ORDER BY ID

you get the following result:
IDNameNrNr2IndexRankDRankNtileIndexNrRankNrDRankNrNtileNr
1Mark1787422222
2Mike141111633333
3John2855321111
4Charles3211111111
5Ellen3643212222
6Mary4199532222
7Mark41777421111
8Mike2411211632222
9John68365321111
10Charles17221111111
11Ellen06833211111
12Mary321109533333


As you can see, Row_Number returns the row index, Rank returns the rank, Dense_Rank returns consecutive rank (no gaps between rank numbers) while NTile puts each row in a category using a given number of total categories. Partition by makes the operations work for each distinct value of a certain column, in this case nr. If the partition would have been on nr2, all the ranking values would have equaled 1, since there are only distinct values on the nr2 column. The Over clause can be used on more than just ranking functions; it also works on Aggregate functions. Yummy!

Links:
Ranking Functions (Transact-SQL)
OVER Clause (Transact-SQL)
Aggregate Functions (Transact-SQL)
This article also shows a similar method in Sql Server 2000 of which I knew nothing until today: Row_Number() function in SQL Server 2005
Returning Ranked Results with Microsoft SQL Server 2005

Quick fix: look for indexed views or indexes on tables with computed columns and delete the index or change it to not reference the view or the computed column.

I got this error while trying to optimize an SQL server. I did the trace, I used the Database Engine Tuning Advisor, it gave me an sql file of magical Create Statistics and Create Index lines. But after I applied it, happy that I will get a 38% increase in my SQL speed, I got this weird error:
UPDATE failed because the following SET options have incorrect settings: 'ARITHABORT'

I wasn't aware that a cluster could break your database! So I found this little Microsoft article: PRB: Error "INSERT Failed" When You Update Table Referenced in an Indexed View.

Yet it only speaks about Indexed Views and it somehow shifts the blame to some missing ArithAbort setting. Not so. In my case it was about one of the indexes referencing a computed column in the INCLUDE statement.

In my particular case, changing the offending index to not reference the computed column was the solution. Of course, indexing computed columns and views is totally possibly, but it depends on how you create those indexes. In my case, SET ARITHABORT ON was set before creating the index. The solution in the Microsoft Support article might be better, even if less attractive to lazy people as myself.

It's very complicated to change the default culture for SQL Server: SET LANGUAGE 'English'. Duh!