Converting FLOAT values to string in T-SQL

Published Apr 10, 2014

Posted in
database
programming

I seem to remember that I blogged about this before, but I can't find it anymore. Probably it was just a missed intention. This is simply a warning on how T-SQL converts FLOAT values to string. Here are some Transact SQL queries and their results:

DECLARE @aFloat FLOAT = 1234.123456789
DECLARE @aDecimal DECIMAL(18,9) = 1234.123456789
DECLARE @aNumeric NUMERIC(18,9) = 1234.123456789
DECLARE @aString NVARCHAR(20) = 1234.123456789

SELECT @aFloat,@aDecimal,@aNumeric,@aString -- result: 1234.123456789    1234.123456789    1234.123456789    1234.123456789
SELECT CAST(@aFloat as NVARCHAR(20)),CAST(@aDecimal as NVARCHAR(20)),CAST(@aNumeric as NVARCHAR(20)),CAST(@aString as NVARCHAR(20)) -- result: 1234.12    1234.123456789    1234.123456789    1234.123456789

Wait! What happened there? The FLOAT was the only numeric format that lost precision to only 2 decimals (it is actually a loss of scale, 12345.123456789 would be converted to 12345.1). The solution is either to either convert to DECIMAL or NUMERIC values before converting to NVARCHAR or to use the STR function, which receives the scale and precision parameters. Like this:

SELECT CAST(@aFloat as NVARCHAR(20)), CAST(CAST(@aFloat as DECIMAL(18,9)) as NVARCHAR(20)), STR(@aFloat,18,9) -- result: 1234.12    1234.123456789        1234.123456789

The first conversion to DECIMAL and the STR function later on are equivalent.

I have looked into SQL options to somehow set the default precision that is used when converting a float to string, but I could not find anything usefule. Neither did settings like

SET ARITHIGNORE OFF
SET NUMERIC_ROUNDABORT ON
SET ARITHABORT ON

have any effect on the queries above. No error and the same result.

You don't want to know what happens with 123456789.123456789!

SELECT @aFloat, CAST(@aFloat as NVARCHAR(20)), CAST(CAST(@aFloat as DECIMAL(30,10)) as NVARCHAR(30)), STR(@aFloat,30,10) -- result: 123456789.123457    1.23457e+008    123456789.1234567900    123456789.1234567900

Not only the digits are cut even when selecting the actual value!!, but the scientific notation rears its ugly head. And look at the beautiful STR function returning ugly extra zeroes! Same issue appears when trying to use XML functions. The resulting XML has really ugly strings of the float values.

Bottom line: as much as I hate it, you probably should not use FLOAT when trying to display values. Ever.

Code School - a very nice site with the purpose of teaching you to code

Published Apr 3, 2014

Posted in
programming
software

and has 0 comments

I've heard of for some time now, but never actually tried anything on the site. Today I tried a course that teaches the basics of R, a statistics programming language akin to Matlab, and I thought the site was great. I suspect it all depends on the quality of the course, but at least this one was very nice. You can see my "report card" here, although I doubt I am going to visit the site very often. However, for beginners or people who quickly want to "get" something, it is a good place to start, as it gives one a "hands on" experience, like actually coding to get results, but in a carefully explained step by step tutorial format.

Incredible loss of performance when encapsulating a DOM element in jQuery (1.7.2)

Published Mar 27, 2014

and has 0 comments

I was working on a pretty nice task that involved translating the text in a page in real time. For this I created a one page function that would do magic on elements that were added or changed in the page. On specific pages it moved with abysmal speed and I had no idea why. So I went to profile the thing and I was shocked to see that the problem did not come from my long piece of code, but from a simple encapsulation of an element in a jQuery object. I was using it only to have a nicer interface for getting the name of the element and changing an attribute. Here is the code:

var j=jQuery(elem);
if (j.is('img[alt]')) {
   j.attr('alt',translate(j.attr('alt')));
}

Replaced it with:

if (/^img$/i.test(elem.tagName)) {
  var alt=elem.getAttribute('alt');
  if (alt) {
    elem.setAttribute('alt',translate(alt));
  }
}

And it worked very fast indeed. The element might have been body so maybe the encapsulation tries to also parse the children or something like that or perhaps the problem was fixed with later versions of the library. However, think about how many times we used this kind of code without thinking twice about it. Think twice about it! :)

Accessing AngularJS services from outside AngularJS

Published Mar 25, 2014

and has 3 comments

If you read an Angular book or read a howto, you will think that Angular is the greatest discovery since fire. Everything is neatly stacked in modules, controllers, templates, directives, factories, etc. The problem comes when you want to use some code of your own, using simple Javascript that does specific work, and then you want to link it nicely with AngularJS. It is not always easy. My example concerns the simple display of a dialog which edits an object. I want it to work on every page, so I added it to the general layout template. The layout does not have a controller. Even if I add it, the dialog engine I have been using was buggy and I've decided to just use jQuery.dialog.

So here is my conundrum: How to load the content of a dialog from an Angular template, display it with jQuery.dialog, load the information with jQuery.get, then bind its input elements to an Angular scope object. I've tried the obvious: just load the template in the dialog and expect Angular to notice a new DOM element was added and parse it and work its magic. It didn't work. Why can't I just call an angular.refresh(elem); function and get it over with, I thought. There are several other solutions. One is to not create the content dynamically at all, just add it to the layout, mark it with ng-controller="something" and then, in the controller, save the object you are interested in or the scope as some sort of globally accessible object that you instantiate from jQuery.get. The dialog would just move the element around, afterwards. That means you need to create a controller, maybe in another file, to be nice, then load it into your page. Another is to create some sort of directive or script tag that loads the Angular template dynamically and to hope it works.

Long story short, none of these solutions appealed to me. I wanted a simple refresh(elem) function. And there is one. It is called angular.injector. You call it with the names of the modules you need to load ('ng' one of them and usually the main application module the second). The result is a function that can use invoke to get the same results as a controller constructor. And that is saying something: if you can do the work that the controller does in your block of code, you don't need a zillion controllers making your life miserable, nor do you need to mark the HTML uselessly for very simple functionality.

Without further ado, here is a function that takes as parameters an element and a data object. The function will force angular to compile said element like it was part of the angular main application, then bind to the main scope the properties of the data object:

function angularCompile(elem, data) {
    // create an injector
    var $injector = angular.injector(['ng','app']);
            
    // use the type inference to auto inject arguments, or use implicit injection
    $injector.invoke(function($rootScope, $compile, $document){
        var compiled = $compile(elem || $document);
        compiled($rootScope);
        if (data) {
            for (var k in data) {
                if (data.hasOwnProperty(k)) {
                    $rootScope[k]=data[k];
                }
            }
        }
           $rootScope.$digest();
    });
}

Example usage:

angularCompile(dialog[0],{editedObject: obj}); // will take the jQuery dialog element, compile it, and add to the scope the editedObject property with the value of obj.

Full code:

OpenTranslationDialog=function(Rule, onProceed, onCancel) {
  jQuery.ajax({
          type: 'GET',
          url: '/Content/ng-templates/dialogs/Translation.html',
          data: Rule,
          success: function(data) {
            var dialog=jQuery('<div></div>')
              .html(data)
              .dialog({
                resizable:true,
                width:700,
                modal:true,
                    buttons: {
                      "Save": function() {
                    var url='/some/api/url';
                    jQuery.ajax({
                        type:'PUT',
                        url:url,
                        data:Rule,
                        success:function() {
                          if (onProceed) onProceed();
                              $(this).dialog( "close" );
                        },
                        error:function() {
                          alert('There was an error saving the rule');
                        }
                      });
                      },
                      Cancel: function() {
                    if (onCancel) onCancel();
                          $(this).dialog( "close" );
                      }
                    }
              });

            angularCompile(dialog[0],{Rule:Rule});
          },
          error:function() {
              alert('There was an error getting the dialog template');
                  }
      });
}

Before you take my word on it, though, beware: I am an Angular noob and my desire here was to hack away at it in order to merge my own code with the nice structured code of my colleagues, who now hate me. Although they liked angular.injector when I showed it to them :)

Creating a truly unique value from several others in T-SQL

Published Mar 17, 2014

Posted in
database
programming

and has 0 comments

Update 2015 August 28: I've replaced the function master.sys.fn_varbintohexstr with CONVERT, with the extra parameter 2, which translates a binary field into a hexadecimal string with no leading 0x. In addition to being ugly to use, fn_varbintohexstr is very slow.

Sometimes you need to create a unique identifier for a bunch of values so that you use it as an ID in the database. The immediately obvious choice is the CHECKSUM and BINARYCHECKSUM functions. But beware, the purpose of these functions is to detect changes in a string, not to uniquely identify it. It might seem strange, but the two concepts are very different. The change modification functionality is only meant to generate very different values on small changes. The uniqueness is trying to create a value as distinctive as possible for any string. That is why when you use a checksum you will get a lot of similar values for (very) different strings.

Enter HASHBYTES, another function that has the purpose of creating a cryptographic hash for a string. It is mainly used for password hashing, but it will fit nicely for our purpose. There are some caveats, though. First, CHECKSUM gets a variable number of parameters, HASHBYTES only accepts one, so we must take care of the cumbersome concatenation of multiple values. Unfortunately SQL functions do not have the option of variable parameters, which is truly a shame, so we can't hack it. Also, the value that HASHBYTES returns is a varbinary. We could cast it to NVARCHAR, but it turns into a weird Chinese characters string. In order to turn it into a proper string, we need to use ~~the same function used by SQL Server to display varbinary when selecting it: master.sys.fn_varbintohexstr~~ the CONVERT function with a parameter of 2 (hex string without the leading 0x).

So let's compare the two usages. Suppose we have this nice table that contains company data: company name, contact first name, contact last name, phone, email, yearly value. We need to create a unique ID based on these values.
First CHECKSUM:

SELECT CHECKSUM(companyName, firstName, lastName, phone, email, yearlyValue) FROM OurTable

So easy! Just add the columns, no matter how many or what type they have, and get a value as a result. You can even use * to select all columns in a row. You also have the advantage of getting the same checksum for differently capitalized strings. If you don't want this behaviour, use BINARYCHECSUM, which works even better.

Second HASHBYTES:

SELECT CONVERT(VARCHAR(Max),HASHBYTES('SHA1',companyName+'|'+firstName+'|'+lastName+'|'+phone+'|'+email+'|'+CAST(yearlyValue as NVARCHAR(100))),2) as id,*
FROM OurTable

Ugly! You need to create a string from different types, using ugly casts. Also, this works more like BINARYCHECKSUM. If you want to get the same functionality as CHECKSUM you need to use LOWER(LTRIM(RTRIM(value))). Horrid!
However, it works.

WARNING: using CAST to NVARCHAR from a FLOAT loses precision. You should use STR instead!

A middle solution is to use XCHECKSUM. What is that, you ask? A placeholder that can be replaced with some regular expression search and replace, of course :)

Update: I've created a query that creates the script to update the value of a column called 'ValuesHash', for tables that have it, with the hash of all columns that are not in a list of names, they are not primary keys and they are not foreign keys, plus they are not computed, rowguidcol or filestream.
Imagine the scenario where you have something like this:

Table A:
1. Id: primary identity key
2. Data1: some data
3. Data2: some data
4. CreateTime: the creation time
5. ValuesHash: a VARBINARY(50) column - only 20 are required normally, but let's make sure :)
Table B:
1. Id: primary identity key
2. AId: foreign key to A
3. Data1: some data
4. Data2: some data
5. ModifyTime: the modification time
6. ValuesHash: a VARBINARY(50) column - only 20 are required normally, but let's make sure :)
Table C:
1. Id: primary identity key
2. AId: foreign key to A
3. Data1: some data
4. Data2: some data

The query below will update ValuesHash for A and B (because C doesn't have the ValuesHash column) with a hash constructed from the Data columns. The Id columns will be ignored for being primary keys (and for being in the list of columns to ignore), the AId columns will be ignored for being foreign keys, ValuesHash and CreateTime and ModifyTime will be ignored for being in a list of custom columns)

WARNING: each column data is always truncated to 4000 characters, then the corresponding string is also truncated to 4000 bytes before running HASHBYTES (which only accepts a maximum of 8000 bytes). This hash will help in determining unique records, but it is not 100%.

SELECT * 
FROM   (
SELECT t.name, 
  'UPDATE [' + t.name+ '] SET ValuesHash = HASHBYTES(''SHA1'',SUBSTRING(' 
  + Stuff( 
    (SELECT '+ ''|''+ ISNULL('+CASE 
      WHEN tp.name IN ('float', 'real') THEN 'STR('+c.name+',30,30)' 
      WHEN tp.name IN ('binary', 'varbinary') THEN 'CONVERT(NVARCHAR(4000),'+c.name+',2)'
      ELSE 'CONVERT(NVARCHAR(4000),'+c.name+')' END+','''')'
     FROM sys.all_columns c 
         INNER JOIN sys.types tp 
      ON c.system_type_id=tp.system_type_id
      AND c.user_type_id=tp.user_type_id 
     LEFT JOIN sys.index_columns ic
      ON ic.object_id=t.object_id
      AND ic.column_id=c.column_id
     LEFT JOIN sys.indexes i
      ON ic.object_id=i.object_id
      AND ic.index_id=i.index_id 
     LEFT JOIN sys.foreign_key_columns fc
      ON fc.parent_object_id=t.object_id 
            AND c.column_id=fc.parent_column_id 
     WHERE t.object_id=c.object_id
      AND ISNULL(c.is_identity, 0)=0
      AND ISNULL(c.is_computed, 0)=0
      AND ISNULL(c.is_filestream, 0)=0
      AND ISNULL(c.is_rowguidcol, 0)=0
      AND ISNULL(i.is_primary_key, 0)=0 
      AND fc.parent_column_id IS NULL
      AND c.name NOT IN ('Id', 'CreateTime' , 'AcquireTime' , 'IntermediateCreateTime', 'IntermediateModifyTime', 'IntermediateDeleteTime', 'ValuesHash')
     ORDER BY Sign(c.max_length) DESC, c.max_length, Lower(c.name)
     FOR XML PATH(''), TYPE).value('.', 'nvarchar(max)')
    , 1, 7, '') 
  + ',0,4000)) WHERE ValuesHash IS NULL' AS computed 
FROM   sys.tables t 
INNER JOIN sys.all_columns c 
  ON t.object_id = c.object_id 
WHERE  c.name = 'ValuesHash') x 
WHERE  computed IS NOT NULL 
ORDER  BY name

Change it to suit your needs. It is by no means perfect, but it's a start for whatever you need.

Update:

A new FORMAT function was introduced in SQL Server 2012, working somewhat similar to the .NET ToString method. Using that function is slightly more precise:

SELECT * 
FROM   (
SELECT t.name, 
  'UPDATE [' + t.name+ '] SET ValuesHash = HASHBYTES(''SHA1'',SUBSTRING(' 
  + Stuff( 
    (SELECT '+ ''|''+ ISNULL('+CASE 
      WHEN tp.name IN ('float', 'real') THEN 'FORMAT('+c.name+',''R'')' 
      WHEN tp.name IN ('decimal') THEN 'FORMAT('+c.name+',''G'')' 
      WHEN tp.name IN ('datetime','datetime2') THEN 'FORMAT('+c.name+',''O'')' 
      WHEN tp.name IN ('binary', 'varbinary') THEN 'CONVERT(NVARCHAR(4000),'+c.name+',2)'
      ELSE 'CONVERT(NVARCHAR(4000),'+c.name+')' END+','''')'
     FROM sys.all_columns c 
         INNER JOIN sys.types tp 
      ON c.system_type_id=tp.system_type_id
      AND c.user_type_id=tp.user_type_id 
     LEFT JOIN sys.index_columns ic
      ON ic.object_id=t.object_id
      AND ic.column_id=c.column_id
     LEFT JOIN sys.indexes i
      ON ic.object_id=i.object_id
      AND ic.index_id=i.index_id 
     LEFT JOIN sys.foreign_key_columns fc
      ON fc.parent_object_id=t.object_id 
            AND c.column_id=fc.parent_column_id 
     WHERE t.object_id=c.object_id
      AND ISNULL(c.is_identity, 0)=0
      AND ISNULL(c.is_computed, 0)=0
      AND ISNULL(c.is_filestream, 0)=0
      AND ISNULL(c.is_rowguidcol, 0)=0
      AND ISNULL(i.is_primary_key, 0)=0 
      AND fc.parent_column_id IS NULL
      AND c.name NOT IN ('Id', 'CreateTime' , 'AcquireTime' , 'IntermediateCreateTime', 'IntermediateModifyTime', 'IntermediateDeleteTime', 'ValuesHash')
     ORDER BY Sign(c.max_length) DESC, c.max_length, Lower(c.name)
     FOR XML PATH(''), TYPE).value('.', 'nvarchar(max)')
    , 1, 7, '') 
  + ',0,4000)) WHERE ValuesHash IS NULL' AS computed 
FROM   sys.tables t 
INNER JOIN sys.all_columns c 
  ON t.object_id = c.object_id 
WHERE  c.name = 'ValuesHash'
) x 
WHERE  computed IS NOT NULL 
ORDER  BY name

Quickly clustering a large number of items

Published Feb 24, 2014

Posted in
.NET
programming
C#

and has 0 comments

It just happens that I have two different projects that have the need of cluster analysis, applied in two different ways: one has uses on maps, where a large number of items needs to be displayed quickly, while another implies finding clusters of news items, where the distance between them is determined by their content. The most used clustering algorithm and the first to be found by searching the web is the k-means clustering. Its purpose is to categorize a list of items into a number of k clusters, hence the name. Setting aside the use value of the algorithm for my purposes, the biggest problem I see is the complexity: in its general form it is at least O(n²), and most of the time a lot higher. The net abounds with scientific papers investigating the k-means complexity and suggesting improvements, but they are highly mathematical and I didn't have the time to investigate further. So I just built my own algorithm. It is clearly fuzzy, imperfect, may even be wrong in some situations, but at least it is fast. I will certainly investigate this area more, maybe even try to understand the math behind it and analyse my results based on this research. When I do that I will update this post or write others. But until then, let me present my work so far.

The first problem I had was, as I said, complexity. For one million points on the map, any algorithm that takes into account the distance between any two items will have to make at least one trillion comparisons. So my solution was to limit the number of items by grouping them in a grid:
Step 1: find the min and max on each dimension (that means going through the entire item collection once or knowing beforetime the map bounds)
Step 2: determine a number of cells that would be a bit more than what I need in the end. (that's a decision I have to take, no algorithmic complexity)
Example: for my map example I have only two dimensions: X and Y. I want to display an upper bound of 1000 clusters. Therefore I find the minimum and maximum X and Y and then split each dimension into 100 slots. That means I would cluster the items I have into 10000 cells.
Step 3: for each item, find its cell based on X,Y and add the item to the cell. This is done by simple division: (X-minX)/(maxX-minX). (again that means going once through the collection)
Step 4: find the smallest cell (the complexity is reduced now to working with cells)
Step 5: find its smallest neighbour (the complexity of this on the implementation)
Step 6: merge the two cells
Until the number of cells is larger than the desired number of clusters, repeat from Step 4.
In the end, the algorithm is O(n+p*log(p)), I guess, where p is the number of cells chosen at step 2.

Optimizations are the next issue.

How does one find the neighbours of a cell? On Step 3 we also create a list of neighbors for each new cluster by looking for a cluster that is at coordinates immediately above, below, left or right. When we merge two clusters, we get a cluster that is a neighbour to all the neighbours of the merged clusters.
How does one quickly get the cluster at a certain position? We create a dictionary that has the coordinates as the key. What about when we merge two clusters? Then the new cluster will be accessible by any of the original cluster keys (that implied that each cluster has a list of keys, as well)
How does one find the smallest cell in the cluster list? After Step 3 we sort the cluster list by the number of items they contain and each time we perform a merge we find the first item larger than the merged result and we insert it in the list at that location, so that the list always remains sorted.
How do we easily find the first item larger than an item? By employing a divide-et-impera method of splitting the list in two at each step and choosing to look into one bucket based on the item count of the cluster at the middle position

Before you use the code note that there are specific scenarios where this type of clustering would look a bit off, like items in a long line or an empty polygon (the cluster will appear to be in its center). But I needed speed and I got it.

Enjoy!

Update: The performance of removing or adding items from a List is very poor, so I created a LinkedList version that seems to be even faster. Here it is. The old List version is at the end

/// <summary>
/// Generic x,y positioned item class
/// </summary>
public class Item
{
    public double X { get; set; }
    public double Y { get; set; }
}

public class ClusteringDataProcessor
{
    /// <summary>
    /// the squared root of the initial cell number (100*100 = 10000)
    /// </summary>
    private const int initialClustersSquareSide = 100;
    /// <summary>
    /// the desired number of resulting clusters
    /// </summary>
    private const int numberOfFinalClusters = 1000;
    private static Random _rnd = new Random();

    /// <summary>
    /// In this implementation, the Cluster inherits from Item, so the result is a list of Item
    /// In the case of one Item Clusters, we actually return the original Item
    /// </summary>
    /// <param name="list"></param>
    /// <returns></returns>
    public List<Item> Process(List<Item> list)
    {
        if (list.Count <= numberOfFinalClusters) return list;
        // find bounds. If already known, this can be provided as parameters
        double minY = double.MaxValue;
        double minX = double.MaxValue;
        double maxY = double.MinValue;
        double maxX = double.MinValue;
        foreach (var item in list)
        {
            var y = item.Y;
            var x = item.X;
            minY = Math.Min(minY, y);
            maxY = Math.Max(maxY, y);
            minX = Math.Min(minX, x);
            maxX = Math.Max(maxX, x);
        }
        // the original list of clusters
        var clusterArr = new List<Cluster>();

        // the dictionary used to index clusters on their position
        var clusterDict = new Dictionary<string, Cluster>();

        // the unit for finding the cell position for the initial clusters
        var qX = (maxX - minX) / initialClustersSquareSide;
        var qY = (maxY - minY) / initialClustersSquareSide;

        foreach (var item in list)
        {
            // compute cell coordinates (integer and the values used as keys in the dictionary)
            var cx = Math.Min((int)((item.X - minX) / qX), initialClustersSquareSide - 1);
            var cy = Math.Min((int)((item.Y - minY) / qY), initialClustersSquareSide - 1);
            var key = getKey(cx, cy);
            Cluster cluster;
            // if the cluster for this position does not exist, create it
            if (!clusterDict.TryGetValue(key, out cluster))
            {
                cluster = new Cluster
                {
                    Keys = new List<string> { key },
                    X = minX + cx * qX + qX / 2,
                    Y = minY + cy * qY + qY / 2,
                    //Items = new List<Item>(),
                    Count = 0,
                    Neighbors = new List<string>()
                };
                // the neighbours of this cluster are the existing clusters that are below, above, left or right. If they exist, this cluster also is added to their neighbour list
                var nkeys = new[] { getKey(cx - 1, cy), getKey(cx + 1, cy), getKey(cx, cy - 1), getKey(cx, cy - 1) };
                for (var j = 0; j < 4; j++)
                {
                    Cluster nc;
                    if (clusterDict.TryGetValue(nkeys[j], out nc))
                    {
                        cluster.Neighbors.Add(nkeys[j]);
                        nc.Neighbors.Add(key);
                    }
                }
                clusterDict[key] = cluster;
                clusterArr.Add(cluster);
            }
            // add the item to the cluster (note that the commented lines hold the items in a list, while the current implementation only remember the number of items)
            //cluster.Items.Add(item);
            cluster.Item = item;
            cluster.Count++;
            // add the item position to the sums, so that we can compute the final position of the cluster at the Finalize stage without enumerating the items (or having to hold them in an Items list)
            cluster.SumX += item.X;
            cluster.SumY += item.Y;
        }
        // if the number of items is already smaller than the desired number of clusters, just return the clusters
        if (clusterArr.Count <= numberOfFinalClusters)
        {
            return clusterArr.Select(c => c.Finalize()).ToList();
        }

        // sort the cluster list so we can efficiently find the smallest cluster
        //clusterArr.Sort(new Comparison<Cluster>((c1, c2) => c1.Items.Count.CompareTo(c2.Items.Count)));
        LinkedList<Cluster> clusterLinkedList = new LinkedList<Cluster>(clusterArr.OrderBy(c => c.Count));

        // remember last merged cluster, as next merged clusters might have similar sizes
        var lastCount = int.MaxValue;
        LinkedListNode<Cluster> lastLinkedNode = null;

        // do this until we get to the desired number of clusters
        while (clusterLinkedList.Count > numberOfFinalClusters)
        {
            // we need to get the smallest (so first) cluster that has any neighbours
            var cluster1 = clusterLinkedList.First(c => c.Neighbors.Any());
            Cluster cluster2 = null;
            // then find the smallest neighbour
            var min = int.MaxValue;
            foreach (var nkey in cluster1.Neighbors)
            {
                var n = clusterDict[nkey];
                //var l = n.Items.Count;
                var l = n.Count;
                if (l < min)
                {
                    min = l;
                    cluster2 = n;
                }
            }
            // join the clusters
            var keys = cluster1.Keys.Union(cluster2.Keys).ToList();
            var cluster = new Cluster
            {
                Keys = keys,
                // approximate cluster position, not needed
                //X = (cluster1.X + cluster2.X) / 2,
                //Y = (cluster1.Y + cluster2.Y) / 2,

                // the result holds the count of both clusters
                //Items = cluster1.Items.Union(cluster2.Items).ToList(),
                Count = cluster1.Count + cluster2.Count,
                // the neighbors are in the union of their neighbours that does not contain themselves
                Neighbors = cluster1.Neighbors.Union(cluster2.Neighbors)
                    .Distinct()
                    .Except(keys)
                    .ToList(),
                // compute the sums for the final position
                SumX = cluster1.SumX + cluster2.SumX,
                SumY = cluster1.SumY + cluster2.SumY
            };
            foreach (var key in keys)
            {
                clusterDict[key] = cluster;
            }
            // efficiently remove clusters since LinkedList removals are fast
            clusterLinkedList.Remove(cluster1);
            clusterLinkedList.Remove(cluster2);

            // a little bit of magic to make the finding of the insertion point faster (LinkedLists go through the entire list to find an item)
            // if the last merged cluster is smaller or equal to the new merged cluster, then start searching from it.
            // this halves the insert time operation, but I am sure there are even better implementations, just didn't think it's so important
            LinkedListNode<Cluster> start;
            if (lastCount <= cluster.Count && lastLinkedNode.Value != cluster1 && lastLinkedNode.Value != cluster2)
            {
                start = lastLinkedNode;
            }
            else
            {
                start = clusterLinkedList.First;
            }
            var insertionPoint = nextOrDefault(clusterLinkedList, start, c => c.Count >= cluster.Count);
            // remember last merged cluster
            LinkedListNode<Cluster> link;
            if (insertionPoint == null)
            {
                link = clusterLinkedList.AddLast(cluster);
            }
            else
            {
                link = clusterLinkedList.AddBefore(insertionPoint, cluster);
            }
            lastLinkedNode = link;
            lastCount = cluster.Count;
        }
        return clusterLinkedList.Select(c => c.Finalize()).ToList();
    }

    private LinkedListNode<T> nextOrDefault<T>(LinkedList<T> list, LinkedListNode<T> start, Func<T, bool> condition)
    {
        while (start.Next != null)
        {
            if (condition(start.Value)) return start;
            start = start.Next;
        }
        return null;
    }

    private string getKey(int cx, int cy)
    {
        return cx + ":" + cy;
    }

    private class Cluster : Item
    {
        public Cluster()
        {
            SumX = 0;
            SumY = 0;
        }

        public double SumX { get; set; }
        public double SumY { get; set; }

        public List<string> Keys { get; set; }

        //public List<Item> Items { get; set; }
        public int Count { get; set; }
        public Item Item { get; set; }

        public List<string> Neighbors { get; set; }


        /// <summary>
        /// the function that finalizes the computation of the values or even decides to return the only item in the cluster
        /// </summary>
        /// <returns></returns>
        public Item Finalize()
        {
            //if (Items.Count == 1) return Items[0];
            if (Count == 1) return Item;
            /*Y = SumY / Items.Count;
            X = SumX / Items.Count;
            Count = Items.Count;*/
            Y = SumY / Count;
            X = SumX / Count;
            Count = Count;
            return this;
        }
    }
}

old List based code (click to show)

/// <summary>
    /// Generic x,y positioned item class
    /// </summary>
    public class Item
    {
        public double X { get; set; }
        public double Y { get; set; }
    }

    public class ClusteringDataProcessor
    {
        /// <summary>
        /// the squared root of the initial cell number (70*70 = 4900)
        /// </summary>
        private const int initialClustersSquareSide = 70;
        /// <summary>
        /// the desired number of resulting clusters
        /// </summary>
        private const int numberOfFinalClusters = 1000;
        private static Random _rnd = new Random();

        /// <summary>
        /// In this implementation, the Cluster inherits from Item, so the result is a list of Item
        /// In the case of one Item Clusters, we actually return the original Item
        /// </summary>
        /// <param name="list"></param>
        /// <returns></returns>
        public List<Item> Process(List<Item> list)
        {
            if (list.Count <= numberOfFinalClusters) return list;
            // find bounds. If already known, this can be provided as parameters
            double minY = double.MaxValue;
            double minX = double.MaxValue;
            double maxY = double.MinValue;
            double maxX = double.MinValue;
            foreach (var item in list)
            {
                var y = item.Y;
                var x = item.X;
                minY = Math.Min(minY, y);
                maxY = Math.Max(maxY, y);
                minX = Math.Min(minX, x);
                maxX = Math.Max(maxX, x);
            }
            // the list of clusters
            var clusterArr = new List<Cluster>();

            // the dictionary used to index clusters on their position
            var clusterDict = new Dictionary<string, Cluster>();

            // the unit for finding the cell position for the initial clusters
            var qX = (maxX - minX) / initialClustersSquareSide;
            var qY = (maxY - minY) / initialClustersSquareSide;

            foreach (var item in list)
            {
                // compute cell coordinates (integer and the values used as keys in the dictionary)
                var cx = Math.Min((int)((item.X - minX) / qX), initialClustersSquareSide - 1);
                var cy = Math.Min((int)((item.Y - minY) / qY), initialClustersSquareSide - 1);
                var key = getKey(cx, cy);
                Cluster cluster;
                // if the cluster for this position does not exist, create it
                if (!clusterDict.TryGetValue(key, out cluster))
                {
                    cluster = new Cluster
                    {
                        Keys = new List<string> { key },
                        X = minX + cx * qX + qX / 2,
                        Y = minY + cy * qY + qY / 2,
                        //Items = new List<Item>(),
                        Count = 0,
                        Neighbors = new List<string>()
                    };
                    // the neighbours of this cluster are the existing clusters that are below, above, left or right. If they exist, this cluster also is added to their neighbour list
                    var nkeys = new[] { getKey(cx - 1, cy), getKey(cx + 1, cy), getKey(cx, cy - 1), getKey(cx, cy - 1) };
                    for (var j = 0; j < 4; j++)
                    {
                        Cluster nc;
                        if (clusterDict.TryGetValue(nkeys[j],out nc))
                        {
                            cluster.Neighbors.Add(nkeys[j]);
                            nc.Neighbors.Add(key);
                        }
                    }
                    clusterDict[key] = cluster;
                    clusterArr.Add(cluster);
                }
                // add the item to the cluster (note that the commented lines hold the items in a list, while the current implementation only remember the number of items)
                //cluster.Items.Add(item);
                cluster.Item = item;
                cluster.Count++;
                // add the item position to the sums, so that we can compute the final position of the cluster at the Finalize stage without enumerating the items (or having to hold them in an Items list)
                cluster.SumX += item.X;
                cluster.SumY += item.Y;
            }
            // if the number of items is already smaller than the desired number of clusters, just return the clusters
            if (clusterArr.Count > numberOfFinalClusters)
            {
                // sort the cluster list so we can efficiently find the smallest cluster
                //clusterArr.Sort(new Comparison<Cluster>((c1, c2) => c1.Items.Count.CompareTo(c2.Items.Count)));
                clusterArr.Sort(new Comparison<Cluster>((c1, c2) => c1.Count.CompareTo(c2.Count)));
                // do this until we get to the desired number of clusters
                while (clusterArr.Count > numberOfFinalClusters)
                {
                    // we need to get the smallest (so first) cluster that has any neighbours
                    var cluster1Index = clusterArr.FindIndex(c => c.Neighbors.Any());
                    if (cluster1Index < 0) break;
                    var cluster1 = clusterArr[cluster1Index];
                    Cluster cluster2 = null;
                    // then find the smallest neighbour
                    var min = int.MaxValue;
                    foreach (var nkey in cluster1.Neighbors)
                    {
                        var n = clusterDict[nkey];
                        //var l = n.Items.Count;
                        var l = n.Count;
                        if (l < min)
                        {
                            min = l;
                            cluster2 = n;
                        }
                    }
                    // join the clusters
                    var keys = cluster1.Keys.Union(cluster2.Keys).ToList();
                    var cluster = new Cluster
                    {
                        Keys = keys,
                        // approximate cluster position, not needed
                        //X = (cluster1.X + cluster2.X) / 2,
                        //Y = (cluster1.Y + cluster2.Y) / 2,

                        // the result holds the count of both clusters
                        //Items = cluster1.Items.Union(cluster2.Items).ToList(),
                        Count = cluster1.Count + cluster2.Count,
                        // the neighbors are in the union of their neighbours that does not contain themselves
                        Neighbors = cluster1.Neighbors.Union(cluster2.Neighbors)
                            .Distinct()
                            .Except(keys)
                            .ToList(),
                        // compute the sums for the final position
                        SumX = cluster1.SumX + cluster2.SumX,
                        SumY = cluster1.SumY + cluster2.SumY
                    };
                    foreach (var key in keys)
                    {
                        clusterDict[key] = cluster;
                    }
                    // efficiently remove first cluster since we know its index
                    clusterArr.RemoveAt(cluster1Index);
                    // remove this cluster from the list (perhaps some sort of caching can speed this up, too)
                    clusterArr.Remove(cluster2);

                    // find the index of the cluster before which we want to insert our merged result
                    // the first comment is the naive implementation
                    // the current implementation is a divide-et-impera quick find in a sorted list

                    //var index = clusterArr.FindIndex(c => c.Items.Count > cluster.Items.Count);
                    //var index = findFastIndex(clusterArr, cluster.Items.Count, 0, clusterArr.Count);
                    var index = findFastIndex(clusterArr, cluster.Count, 0, clusterArr.Count);
                    if (index < 0)
                    {
                        // if not found, just add it at the end, it's bigger than all existing clusters
                        clusterArr.Add(cluster);
                    }
                    else
                    {
                        clusterArr.Insert(index, cluster);
                    }
                }
            }
            return clusterArr.Select(c=>c.Finalize()).ToList();
        }

        /// <summary>
        /// Quickly find the insertion index in an ordered list based on a count
        /// </summary>
        /// <param name="clusterArr"></param>
        /// <param name="count"></param>
        /// <param name="start"></param>
        /// <param name="length"></param>
        /// <returns></returns>
        private int findFastIndex(List<Cluster> clusterArr, int count, int start, int length)
        {
            if (length == 0) return start;
            if (start < 0 || start >= clusterArr.Count) return -1;
            //var currCount = clusterArr[start].Items.Count;
            var currCount = clusterArr[start].Count;
            if (currCount < count)
            {
                return findFastIndex(clusterArr, count, start + length / 2, length / 2);
            }
            if (currCount > count)
            {
                return findFastIndex(clusterArr, count, start - length / 2, length / 2);
            }
            return start;
        }

        private string getKey(int cx, int cy)
        {
            return cx + ":" + cy;
        }

        private class Cluster:Item
        {
            public Cluster()
            {
                SumX = 0;
                SumY = 0;
            }

            public double SumX { get; set; }
            public double SumY { get; set; }

            public List<string> Keys { get; set; }

            //public List<Item> Items { get; set; }
            public int Count { get; set; }
            public Item Item { get; set; }

            public List<string> Neighbors { get; set; }


            /// <summary>
            /// the function that finalizes the computation of the values or even decides to return the only item in the cluster
            /// </summary>
            /// <returns></returns>
            public Item Finalize()
            {
                //if (Items.Count == 1) return Items[0];
                if (Count == 1) return Item;
                /*Y = SumY / Items.Count;
                X = SumX / Items.Count;
                Count = Items.Count;*/
                Y = SumY / Count;
                X = SumX / Count;
                Count = Count;
                return this;
            }
        }

    }

How to automatically log in through Remote Desktop Connection when the admin disabled saving credentials

Published Feb 24, 2014

Posted in
misc
programming

and has 2 comments

Well, sometimes an admin will try to make the system secure by annoying the people who have to use it. Yeah, that always works. My situation is that I have to login every day into a virtual machine that is on a "secure network". So after using a very restrictive password policy that forces everybody to be creative in the way they write "password" and "123456", he also disallowed the saving credentials in Remote Desktop Connection. So every day I have to enter the damn complicated password. I couldn't have that. Here is a .js script that you execute with WScript and it logs you in automatically:

var shell = WScript.CreateObject("WScript.Shell");
shell.Run("mstsc /v:[remote server] /console");
while (!shell.AppActivate("Windows Security")) {
    WScript.Sleep(100);
}
WScript.Sleep(100);
shell.SendKeys("[password]{enter}");

Save this into a Javascript file and replace [remove server] and [password] with your settings and either double click the .js file or create a batch file like this:
@echo off
start "Auto log on!" wscript c:\Batches\autologin.js

Of course, this means your secure password will be stored in a stupid text file somewhere, so be warned.

The trailing spaces in T-SQL strings

Published Feb 21, 2014

and has 0 comments

This is one of those WTF moments. After more than a decade of working in software development I learn something this basic about T-SQL (or rather, any flavour based on SQL-92). What would you think happens when running this script?

IF ''='                 ' SELECT 'WTF?!' -- empty string compared to a bunch of spaces
IF '      '='           ' SELECT 'WTF?!' -- bunch or spaces compared to another bunch of spaces of different length
IF 'x'='x               ' SELECT 'WTF?!' -- 'x' compared to 'x' followed by a bunch of spaces
IF 'x'='               x' SELECT 'WTF?!' -- 'x' compared to 'x' preceded by a bunch of spaces

There will be three WTF rows returned, for the first three queries. You don't believe me? Try it yourself. The motive is explained here: INF: How SQL Server Compares Strings with Trailing Spaces. Short story shorter: in order for SQL to compare two strings of different lengths, it first right-pads the shorter one with spaces.

So what can you do to fix it? Easy enough, use LEN ,right? Nope. Read the definition carefully: Returns the number of characters of the specified string expression, excluding trailing blanks. A possible but weird solution is to use DATALENGTH. A string is empty only is it has a datalength of 0. In the case of NVARCHAR you could even divide the resulting number to 2 in order to get the true length of the string. WTF, right?

Careful when using equality in T-SQL

Published Feb 18, 2014

Posted in
database
programming

and has 1 comment

Well, it's pretty obvious, but I wanted to post it here, as well. You see, we have this query that was supposed to filter some values based on a parameter. The query was done like this: SELECT * FROM table WHERE value=(CASE WHEN @filter IS NULL THEN value ELSE @filter). Can you spot the problem? Indeed, if value is NULL, then value is NOT equal to value. Not only is this incorrect, but also bad from the standpoint of the SQL execution plan. It is much faster to do SELECT * FROM table WHERE (@filter is NULL OR value=@filter). If, for whatever reason, you need the first syntax, you need to do it like this: SELECT * FROM table WHERE ISNULL(value,@impossibleValueThatIsNotNull)=COALESCE(@filter, value, @impossibleValueThatIsNotNull). Yeah, a bit of a show off, but when the "no filter" value is null, it's better to use ISNULL and COALESCE wherever possible.

Using Prixovy to forward to an authenticated HTTP proxy without the annoyance of entering the username and password every time

Published Feb 17, 2014

and has 2 comments

I have at work a very annoying HTTP proxy that requires a basic authentication. This translates in very inconsistent behaviour between applications and the annoying necessity of entering the username and password whenever the system wants it. So I've decided to add another local proxy to the chain that handles the authentication for me forever.

This isn't as easy as it sounds. Sure, proxies are a dime a dozen, but for some reason most of them seem to be coming from the Linux world. That is really bad user interaction, vague documentation and unhelpful forums where any request for help ends up in some version of "read the fucking manual". Hence I've decided to help out by creating this lovely post that explains how you can achieve the purpose described above with very little headache. These are the very easy steps that you have to undertake:

Download and install Privoxy
Go to the Program Files folder and look for Privoxy. You will need to edit two files: config.txt and user.action
Optional: change the listen-address port, otherwise the proxy will function on port 8118
Enter your proxy authentication username and password in the fields below and press Help me configure Privoxy - this is strictly a client base Javascript so don't worry that I am going to steal your proxy credentials...
Edit user.action and add the bit of text that appeared as destined for that file.
Edit config.txt, look for examples of forward and add to it the bit that belongs to config.txt and replace proxy:port and the domains and IP masks with the correct values for you
Restart Privoxy
Configure your internet settings to use a proxy on 127.0.0.1 and the port you configured in step 2 (or the default 8118)

This should be it. Enjoy!

Username:
Password:

'decimal' is not a recognized built-in function name when using TRY_CONVERT (or some other SQL 2012 function)

Published Feb 17, 2014

Posted in
database
programming

and has 0 comments

I made a function in T-SQL that parsed some string and returned a decimal. It all worked fine until one of my colleagues wanted to use it on the test server. And here there was, a beautiful error message: 'decimal' is not a recognized built-in function name. I isolated the line, executed it, same error. It was like the server did not understand decimals anymore. The line in question was a simple SELECT TRY_CONVERT(DECIMAL(18,6),'10.33'). If I used CONVERT, though, it all worked fine. Another hint was that the function worked perfectly fine on the same server, in another database. The problem was that for that particular database, the defined SQL server version was 2008, not 2012. We changed it and it all worked fine after that. The setting in question is found in the Properties of the database, Options, Compatibility level.

Animating a WPF property when it changes to a non specified value

Published Feb 16, 2014

and has 2 comments

While working on a small personal project, I decided to make a graphical tool that displayed a list of items in a Canvas. After making it work (for details go here), I've decided to make the items animate when changing their position. In my mind it had to be a simple solution, akin to jQuery animate or something like that; it was not.

The final solution was to finally give up on a generic method for this and switch to the trusted attached properties. But if you are curious to see what else I tried and how ugly it got, read it here:

Click here to get ugly!

First I had to detect the change in value. This is done by specifying in the bindings for the properties of interest the option NotifyOnTargetUpdated="True" and then creating an EventTrigger that fires on Binding.TargetUpdated. In that trigger, an animation can be stored and/or started. The code would be something like this:

<EventTrigger RoutedEvent="Binding.TargetUpdated">
    <EventTrigger.Actions>
        <BeginStoryboard>
            <Storyboard>
                <local:SafeDoubleAnimation Storyboard.TargetProperty="(Canvas.Left)" Duration="0:0:1" FillBehavior="Stop" />
                <local:SafeDoubleAnimation Storyboard.TargetProperty="(Canvas.Top)" Duration="0:0:1" FillBehavior="Stop" />
            </Storyboard>
        </BeginStoryboard>
    </EventTrigger.Actions>
</EventTrigger>

The problem appears when the bound property that causes the trigger is the animated property. Basically the effect is for the animation to only work if you cause two animations to follow one another, in which case you see the second one. Otherwise the effect is an immediate and brusque change in the position. You may notice that instead of DoubleAnimation I used SafeDoubleAnimation, which is a custom class that inherits from DoubleAnimation and overrides the GetCurrentValueCore method to not throw stupid validation errors. I am sure this can be done in a nicer way, but I didn't research it. Here is the code of the class:

public class SafeDoubleAnimation:DoubleAnimation
{
    protected override double GetCurrentValueCore(double defaultOriginValue, double defaultDestinationValue, AnimationClock animationClock)
    {
        if (double.IsNaN(defaultOriginValue)) defaultOriginValue = 0;
        if (double.IsNaN(defaultDestinationValue)) defaultDestinationValue = defaultOriginValue;
        try
        {
            return base.GetCurrentValueCore(defaultOriginValue, defaultDestinationValue, animationClock);
        }
        catch
        {
            return 0;
        }
    }
}

Back to the problem at hand. I tried to set a Delay to the bindings, it didn't work. I tried to set NotifyOnTargetUpdate on other bindings and randomly changed those values. Didn't work. I tried to add a StopStoryboard action before or after the BeginStoryboard action. I tried the properties for FillBehavior and HandOffBehavior. No effect. Funny enough, if I did set AutoReverse="True" it would work all the time, but it would just return to the previous values.
In the end I just decided to do it manually, which is really ugly and not MVVM at all. Thus I noticed that if I create the animation manually and run it, I get the same effect. The binding overrides the animation and the only way to make it work is to not fire PropertyChanged for the affected properties until the animation ends.
The result is so awful that I have qualms putting it here. It starts with:

public MyWindow()
{
    InitializeComponent();
    DataContextChanged += ItemGrid_DataContextChanged;
}

void ItemGrid_DataContextChanged(object sender, DependencyPropertyChangedEventArgs e)
{
    var vm = e.NewValue as INotifyPropertyChanged;
    if (vm != null)
    {
        vm.PropertyChanged += (s, ev) =>
        {
            if (ev.PropertyName == "SpecialPropertyName")
            {
                animateChanges();
            }
        };
    }
}

This assumes that the DataContext is only set once. If that would change, the property changed event handler would be a separate method and it would be removed from the previous DataContext when changing it. What it does is start a method when a special key is sent as the name of property changed.
In the ViewModel, I removed the OnPropertyChanged notifications for the variables that I am interested in. Therefore animateChanges will be able to access the items with their values unchanged, even if they're databound to items that have the correct values.
The animateChanges method looks like this:

private void animateChanges()
{
    var vm = (MyViewModel)DataContext;
    var zoom = vm.Zoom;
    var sb = new Storyboard();
    var converter=new CoordinateConverter(); // the same converter that I am using in the XAML databinding
    foreach (var child in lvKernelItems.Items)
    {
        var item=(KernelItem)child;
        var container = (UIElement)lvKernelItems.ItemContainerGenerator.ContainerFromItem(child); // ugly way of getting the UIElement associated to a data item
        var currentVal=Canvas.GetLeft(container);
        var nextVal=(double)converter.Convert(new object[] { item.X, lvKernelItems.ActualWidth, zoom },typeof(double),null,CultureInfo.CurrentCulture);
        var anim = new DoubleAnimation(nextVal, new Duration(TimeSpan.FromSeconds(0.5)))
        {
            FillBehavior = FillBehavior.Stop
        };
        Storyboard.SetTarget(anim, container);
        Storyboard.SetTargetProperty(anim, new PropertyPath("(Canvas.Left)"));
        sb.Children.Add(anim);
        currentVal = Canvas.GetTop(container);
        nextVal = (double)converter.Convert(new object[] { item.Y, lvKernelItems.ActualHeight, zoom }, typeof(double), null, CultureInfo.CurrentCulture);
        anim = new DoubleAnimation(nextVal, new Duration(TimeSpan.FromSeconds(0.5)))
        {
            FillBehavior = FillBehavior.Stop
        };
        Storyboard.SetTarget(anim, container);
        Storyboard.SetTargetProperty(anim, new PropertyPath("(Canvas.Top)"));
        sb.Children.Add(anim);
    }
    sb.Completed += sb_Completed;
    sb.Begin();
}

void sb_Completed(object sender, EventArgs e)
{
    foreach (var child in lvKernelItems.Items)
    {
        var item = (KernelItem)child;
        item.RefreshCoordinates();  //method that finally fires the OnPropertyChanged event on the X and Y properties, causing the final value to be set.
    }
}

Now, there is an advantage to doing it like this: it is more efficient, resource-wise. There is only one storyboard with animations for all the items instead of one storyboard for each item. But it is incredibly ugly.
I tried to create some sort of generic method of animating any property. I tried to inherit from Binding, MultiBinding, BindingBase, MarkupExtension, even Setter or SetterBase. Whoever built the classes for WPF wanted them sealed and internal protected and what not.

Well, long story short: attached properties. I created two attached properties CanvasLeft and CanvasTop. When they change, I animate the real properties and, at the end of the animation, I set the value. Here is the code:

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows;
using System.Windows.Controls;
using System.Windows.Media.Animation;

namespace Siderite.AttachedProperties
{
    public static class UIElementProperties
    {
        public static readonly DependencyProperty CanvasLeftProperty = DependencyProperty.RegisterAttached("CanvasLeft", typeof(double), typeof(UIElementProperties), new FrameworkPropertyMetadata(
                                                                                            0.0,
                                                                                            FrameworkPropertyMetadataOptions.OverridesInheritanceBehavior,
                                                                                            CanvasLeftChanged));

        [AttachedPropertyBrowsableForType(typeof(UIElement))]
        public static double GetCanvasLeft(DependencyObject element)
        {
            if (element == null)
            {
                throw new ArgumentNullException("element");
            }
            return (double)element.GetValue(CanvasLeftProperty);
        }

        [DesignerSerializationVisibility(DesignerSerializationVisibility.Visible)]
        public static void SetCanvasLeft(DependencyObject element, double value)
        {
            if (element == null)
            {
                throw new ArgumentNullException("element");
            }
            element.SetValue(CanvasLeftProperty, value);
        }

        private static void CanvasLeftChanged(DependencyObject d, DependencyPropertyChangedEventArgs e)
        {
            var sb = new Storyboard();
            var oldVal = (double)e.OldValue;
            if (double.IsNaN(oldVal)) oldVal = 0;
            var newVal = (double)e.NewValue;
            if (double.IsNaN(newVal)) newVal = oldVal;
            var anim = new DoubleAnimation
            {
                From = oldVal,
                To = newVal,
                Duration = new Duration(TimeSpan.FromSeconds(1)),
                FillBehavior = FillBehavior.Stop
            };
            Storyboard.SetTarget(anim, d);
            Storyboard.SetTargetProperty(anim, new PropertyPath("(Canvas.Left)"));
            sb.Children.Add(anim);
            sb.Completed += (s, ev) =>
            {
                d.SetValue(Canvas.LeftProperty, newVal);
            };
            var fe = d as FrameworkElement;
            if (fe != null)
            {
                sb.Begin(fe, HandoffBehavior.Compose);
                return;
            }
            var fce = d as FrameworkContentElement;
            if (fce != null)
            {
                sb.Begin(fce, HandoffBehavior.Compose);
                return;
            }
            sb.Begin();
        }
        

        public static readonly DependencyProperty CanvasTopProperty = DependencyProperty.RegisterAttached("CanvasTop", typeof(double), typeof(UIElementProperties), new FrameworkPropertyMetadata(
                                                                                        0.0,
                                                                                        FrameworkPropertyMetadataOptions.OverridesInheritanceBehavior,
                                                                                        CanvasTopChanged));

        [AttachedPropertyBrowsableForType(typeof(UIElement))]
        public static double GetCanvasTop(DependencyObject element)
        {
            if (element == null)
            {
                throw new ArgumentNullException("element");
            }
            return (double)element.GetValue(CanvasTopProperty);
        }

        [DesignerSerializationVisibility(DesignerSerializationVisibility.Visible)]
        public static void SetCanvasTop(DependencyObject element, double value)
        {
            if (element == null)
            {
                throw new ArgumentNullException("element");
            }
            element.SetValue(CanvasTopProperty, value);
        }

        private static void CanvasTopChanged(DependencyObject d, DependencyPropertyChangedEventArgs e)
        {
            var sb = new Storyboard();
            var oldVal = (double)e.OldValue;
            if (double.IsNaN(oldVal)) oldVal = 0;
            var newVal = (double)e.NewValue;
            if (double.IsNaN(newVal)) newVal = oldVal;
            var anim = new DoubleAnimation
            {
                From = oldVal,
                To = newVal,
                Duration = new Duration(TimeSpan.FromSeconds(1)),
                FillBehavior = FillBehavior.Stop
            };
            Storyboard.SetTarget(anim, d);
            Storyboard.SetTargetProperty(anim, new PropertyPath("(Canvas.Top)"));
            sb.Children.Add(anim);
            sb.Completed += (s, ev) =>
            {
                d.SetValue(Canvas.TopProperty, newVal);
            };
            var fe = d as FrameworkElement;
            if (fe != null)
            {
                sb.Begin(fe, HandoffBehavior.Compose);
                return;
            }
            var fce = d as FrameworkContentElement;
            if (fce != null)
            {
                sb.Begin(fce, HandoffBehavior.Compose);
                return;
            }
            sb.Begin();
        }
    }
}

and this is how you would use them:

<ListView ItemsSource="{Binding KernelItems}" 
          SelectedItem="{Binding SelectedItem,Mode=TwoWay}"
          SelectionMode="Single"
          >
    <ListView.ItemsPanel>
        <ItemsPanelTemplate>
            <Canvas HorizontalAlignment="Stretch" VerticalAlignment="Stretch" Background="Black" />
        </ItemsPanelTemplate>
    </ListView.ItemsPanel>
    <ListView.ItemContainerStyle>
        <Style TargetType="{x:Type ListViewItem}">
            <Setter Property="FocusVisualStyle" Value="{x:Null}"/>
            <Setter Property="Foreground" Value="White"/>
            <Setter Property="att:UIElementProperties.CanvasLeft" >
                <Setter.Value>
                    <MultiBinding Converter="{StaticResource CoordinateConverter}">
                        <Binding Path="X"/>
                        <Binding Path="ActualWidth" ElementName="lvKernelItems"/>
                        <Binding Path="DataContext.Zoom" RelativeSource="{RelativeSource AncestorType={x:Type Window}}"/>
                    </MultiBinding>
                </Setter.Value>
            </Setter>
            <Setter Property="att:UIElementProperties.CanvasTop" >
                <Setter.Value>
                    <MultiBinding Converter="{StaticResource CoordinateConverter}">
                        <Binding Path="Y"/>
                        <Binding Path="ActualHeight" ElementName="lvKernelItems"/>
                        <Binding Path="DataContext.Zoom" RelativeSource="{RelativeSource AncestorType={x:Type Window}}"/>
                    </MultiBinding>
                </Setter.Value>
            </Setter>
            <Style.Resources>
                <SolidColorBrush x:Key="{x:Static SystemColors.HighlightBrushKey}" Color="Transparent" />
                <SolidColorBrush x:Key="{x:Static SystemColors.ControlBrushKey}" Color="Transparent" />
                <SolidColorBrush x:Key="{x:Static SystemColors.HighlightTextBrushKey}" Color="Black" />
                <SolidColorBrush x:Key="{x:Static SystemColors.ControlTextBrushKey}" Color="Black" />
            </Style.Resources>
            <Style.Triggers>
                <Trigger Property="IsSelected" Value="True">
                    <Setter Property="Foreground" Value="Cyan"/>
                    <Setter Property="Effect">
                        <Setter.Value>
                            <DropShadowEffect ShadowDepth="0" Color="White" Opacity="0.5" BlurRadius="10"/>
                        </Setter.Value>
                    </Setter>
                    <Setter Property="Canvas.ZIndex" Value="1000"/>
                </Trigger>
            </Style.Triggers>
        </Style>
    </ListView.ItemContainerStyle>
</ListView>

Hope it helps.

Displaying Listview/Listbox items in a Canvas in Windows Presentation Foundation

Published Feb 16, 2014

and has 0 comments

For a WPF project I wanted to create a graphical representation of a list of items. I computed some X,Y coordinates for each item and started changing the XAML of a Listview in order to reflect the position of each item on a Canvas. Well, it is just as easy as you imagine: change the ItemsPanel property to a Canvas and then style each item as whatever you want. The gotcha comes when trying to set the coordinates. The thing is that for each item in a listview a container is constructed and inside the item template is displayed. So here you have all you items displayed exactly as you want them, except the coordinates don't work, since what needs to be placed on the Canvas are the generated containers, not the items. Here is the solution:

<Window.Resources>
        <local:CoordinateConverter x:Key="CoordinateConverter"/>
        <DataTemplate DataType="{x:Type vm:KernelItem}"> <!-- the template for the data items -->
            <Grid>
                <Ellipse Fill="{Binding Background}" Width="100" Height="100" Stroke="DarkGray" Name="ellipse"
                         ToolTip="{Binding Tooltip}"/>
                <TextBlock Text="{Binding Text}" MaxWidth="75"  MaxHeight="75"
                           HorizontalAlignment="Center" VerticalAlignment="Center"
                           TextAlignment="Center"
                           TextWrapping="Wrap"
                           ToolTip="{Binding Tooltip}" />
            </Grid>
        </DataTemplate>
    </Window.Resources>
    <ListView ItemsSource="{Binding KernelItems}" Name="lvKernelItems"
              SelectedItem="{Binding SelectedItem,Mode=TwoWay}"
              SelectionMode="Single"
              >
        <ListView.ItemsPanel>
            <ItemsPanelTemplate>
                <Canvas HorizontalAlignment="Stretch" VerticalAlignment="Stretch" Background="Black" />
            </ItemsPanelTemplate>
        </ListView.ItemsPanel>
        <ListView.ItemContainerStyle>
            <Style TargetType="{x:Type ListViewItem}">
                <Setter Property="FocusVisualStyle" Value="{x:Null}"/><!-- no highlight of selected items -->
                <Setter Property="Foreground" Value="White"/>
                <Setter Property="(Canvas.Left)" >
                    <Setter.Value>
                        <MultiBinding Converter="{StaticResource CoordinateConverter}">
                            <Binding Path="X"/>
                            <Binding Path="ActualWidth" ElementName="lvKernelItems"/>
                            <Binding Path="DataContext.Zoom" RelativeSource="{RelativeSource AncestorType={x:Type Window}}"/>
                        </MultiBinding>
                    </Setter.Value>
                </Setter>
                <Setter Property="(Canvas.Top)" >
                    <Setter.Value>
                        <MultiBinding Converter="{StaticResource CoordinateConverter}">
                            <Binding Path="Y"/>
                            <Binding Path="ActualHeight" ElementName="lvKernelItems"/>
                            <Binding Path="DataContext.Zoom" RelativeSource="{RelativeSource AncestorType={x:Type Window}}"/>
                        </MultiBinding>
                    </Setter.Value>
                </Setter>
                <Style.Resources><!-- no highlight of selected items -->
                    <SolidColorBrush x:Key="{x:Static SystemColors.HighlightBrushKey}" Color="Transparent" />
                    <SolidColorBrush x:Key="{x:Static SystemColors.ControlBrushKey}" Color="Transparent" />
                    <SolidColorBrush x:Key="{x:Static SystemColors.HighlightTextBrushKey}" Color="Black" />
                    <SolidColorBrush x:Key="{x:Static SystemColors.ControlTextBrushKey}" Color="Black" />
                </Style.Resources>
                <Style.Triggers>
                    <Trigger Property="IsSelected" Value="True"><!-- custom selected item template -->
                        <Setter Property="Foreground" Value="Cyan"/>
                        <Setter Property="Effect">
                            <Setter.Value>
                                <DropShadowEffect ShadowDepth="0" Color="White" Opacity="0.5" BlurRadius="10"/>
                            </Setter.Value>
                        </Setter>
                        <Setter Property="Canvas.ZIndex" Value="1000"/>
                    </Trigger>
                </Style.Triggers>
            </Style>
        </ListView.ItemContainerStyle>
    </ListView>

As a bonus, you see the way to remove the default selection of an item: the ugly dotted line and the highlighting background.

Configuring the proxy settings via a script and testing it

Published Feb 4, 2014

and has 0 comments

If you go to the system Internet Settings (in Network Connections, or Internet Explorer or Chrome), and you advance to the Tab "Connections", then click LAN Settings, then go to Advanced... I mean, why wouldn't you? ... there is a checkbox called "Use automatic configuration script". The script is supposed to dynamically return the correct proxy for an URL. The practice is called Proxy Auto Configuration for some reason. The script is Javascript and it uses some predefined functions to return either "DIRECT" (don't use a proxy) or "PROXY address:port" (use the proxy at that address and port). You can chain the options by separating them with a semicolon like this: "PROXY 1.2.3.4:55 ; PROXY 10.20.30.40:50; DIRECT". And before you search like a madman for it, there is no way to specify the username/password for those proxy servers in your config file. You still have to type them when asked.

Use this solution to fix problems with proxies that work well for outside sites, but not for internal networks. For reasons too weird to explain here (but explained here: Understanding Web Proxy Configuration) you cannot just put your script on the local drive and use it, instead you have to read it from an http URL. If you don't have the possibility (or it's too annoying) to install IIS or some other web server in order to serve the pac file, try using it from the local drive with a file:// URL (not just C:\...). However, it is a deprecated method and you may experience issues, with .NET software or Internet Explorer 11, for example.

Here is a sample file that connects directly to any URL that is part of a domain or is part of an IP class:

function FindProxyForURL(url, host) {

var defProxy="10.20.30.40:50"; // the misbehaving or incomplete proxy

var domains=[
   ".mysite.com",
   ".xxx",
   "localhost"
];
var ipClasses=[
   "11.22.33.0",
   "55.0.0.0",
   "127.0.0.0"
];

for (var i=0; i<domains.length; i++) {
  if (dnsDomainIs(host,domains[i])) return "DIRECT";
}

var MYHOST = dnsResolve(host);

for (var i=0; i<ipClasses.length; i++) {
  var mask=getMask(ipClasses[i]);
  if (isInNet(MYHOST, ipClasses[i],mask)) return "DIRECT";
}

return "PROXY "+defProxy;

function getMask(ip) {
 var splits=ip.split('.');
 for (var i=0; i<splits.length; i++) {
  if (splits[i]!='0') splits[i]='255';
 }
    return splits.join('.');
}

}

Just add the domains or the IP classes to the arrays in order to connect directly to them. Do not forget to add the local IP classes as well for direct connection, including 127.0.0.0 to access your own localhost.

In order to test or debug your .pac files, use the PacParser open source utility. A reference to the functions you can use in your script can be found here: Using Automatic Configuration, Automatic Proxy, and Automatic Detection

On game Artificial Intelligences and specifically the MinMax algorithm

Published Feb 3, 2014

Posted in
programming
chess
essay

and has 0 comments

MinMax or Minimax, as some like to call it, is the basis of most Artificial Intelligence built for games like chess. Its basis is extremely easy to understand: a rational player will try to take the best option available to them, so whatever is good for me the adversary will take as the most likely outcome and he will find the best solution against that outcome. I, following the same pattern, will also look for his best counter move and plan against it. Therefore the thinking for a game of chess, let's say, is that I will take all possible moves, find the one that leaves me with the best position (evaluated by a function from the board position), then look for the similar best play for the adversary. I continue this way until I get to the end of the game or am out of computing resources.

Now, that sounds logical and it's crazy easy to implement. The problem is that for all but the most childish of plays, the tree of all possible moves increases exponentially. And chess isn't even one of the worst games to do that. Imagine Tic-Tac-Toe, a game played on a 3x3 board between two players. You have a total of 9 possible moves to choose from as the first player, then 8, then 7, etc. The entire game tree has a total of 9! possible moves, or 362880. But generalize the game to a board of 10x10 and a winning rule of 5 in a line and you get 100! moves, which is less than 1E+158, that is 10 followed by 158 zeros.

That's why the so called pruning was created, the most common of all being Alpha-Beta, which tries to abort the processing of leaves that seem to reach a worse situation than their parent node. Of course, all of this is the general gist. You might want to take into account a number N best moves from the opponent, as well as try a more lenient pruning algorithm (after all, sacrificing a piece brings you to a worse position than when you started, but it might win the game). All of this increases, not decreases the number of possible moves.

And now comes my thought on this whole thing: how can I make a computer play like a human when the core edict of the algorithm is that all participating players are rational? Humans are rarely so. Mathematically I could take N, the number of best moves I would consider for my opponent, to be the total number of moves my opponent could make, but it would increase the exponential base of the tree of moves. Basically it would make the algorithm think of stupid things all the time.

The pruning algorithm seems to be the most important part of the equation. Indeed, I could consider the move choice algorithm to be completely random and as long as I have a perfect pruning algorithm it will remove all the stupid choices from me and let me with the smart ones. A quote comes to mind: "you reach perfection not when you have nothing else to add, but when there is nothing left to remove". It's appropriate for this situation.

Now, before attacking an algorithm that has survived for so long in the AI industry (and making my own awesome one that will defeat all chess engines in the world - of course, that's realistic) I have to consider the alternative algorithm: the lowly human. How does a human player think in a game of chess? First he surveys the board for any easy wins. That means a broad one or two levels analysis based on a simple board evaluation function. Immediately we get something from this: there might be multiple evaluation functions, we don't need just one. The simple one is for looking for greedy wins, like "He moved his queen where I can capture it, yay!".

The same outcome for situations like this would be achieved by a MinMax algorithm, so we ignore this situation. It gets more interesting from now, though. We look for the moves of the most active pieces. I know that this is the rookie system, but I am a rookie, I will make my computer algorithm be as stupid as I am, if I am to play it, so shut up! The rookie will always try to move his queen to attack something. It's the most powerful piece and it should get the most results for the least effort. We left Greed behind, remember? We are now doing Sloth. Still, with a good pruning algorithm we eliminate stupid Queen moves from the beginning, so considering the Queen first, then Rooks, then Bishops, then Knights, etc. is not a bad idea. The order of the pieces can be changed based on personal preferences as well as well established chess rules, like Knights being better that Bishops in closed games and so on.

This is a small optimization, one that probably most game engines have. And we haven't even touched pruning; boy, this is going to be a long article! Now, what does the human do? He does the depth first tree searches. Well, he doesn't think of them like that, he thinks of them as narrative, but it's basically a depth first search. This is the casual "What if...?" type of play. You move the Queen, let's say, bringing it right in the enemy territory. You don't capture anything important, but to bring a strong piece this uncomfortably near to the enemy king is scary. You don't play for game points, but for emotion points, for special effects, for kicks! You don't abandon the narrative, the linear evolution of your attack, until you find that it bears no fruit. It's the equivalent of the hero running toward the enemy firing his pistol. If the enemy is dumb enough to not take cover, aim carefully and shoot a burst from their SMGs, you might get away with it and it would be glorious. If not, you die idiotically.

It is important to note that in the "Hollywood" chess thinking you are prone to assume that the enemy will make mistakes in order to facilitate your brilliant plan. The evaluation goes as follows: "I will try something that looks cool if the chances for a horrible and immediate loss are small". When some hurdle foils your heroic plan, you make subplans that would, as well as you hope, distract the adversary from your actual target. This, as far as I know, is a typical human reasoning type and I doubt many (if any) computer game engines have it. In computer terms, one would have to define a completely new game, a smaller one, and direct an AI designed specifically for it to tell you if it would work or not. Given the massively parallel architecture of the human brain, it is not hard to understand why we do something like this. But we can do the same with a computer, mind you. I am thinking of something like a customized MinMax algorithm working on few levels, one or two, as the human would. That would result in a choice of N possible moves to make. Then construct a narrative for each, a depth search that just tries to get as much as possible from each move without considering many of the implications. Then assign a risk to each level of this story. If the level exceeds a threshold, use the small range MinMax at those points and try to see if you can minimize the risk or if at that point the risk makes your narrative unlikely.

Let's recap the human thinking algorithm so far:

Try to greedily take what the opponent has stupidly made available
Try to lazily use the strongest piece to get the most result with the least effort
Try to pridefully find the most showy move, the one that would make the best drinking story afterwards
Try to delegate the solving of individual problems in your heroic narrative to a different routine

Wow! Doesn't it seem that the seven deadly sins are built-in features, rather than bugs? How come we enjoy playing with opponents that pretty much go through each of them in order to win more than we do with a rational emotionless algorithm that only does what is right?

Again, something relevant transpires: we take quite a long time imagining the best moves we can make, but we think less of the opponent's replies. In computer terms we would prune a lot more the enemy possible moves than we would our own. In most rookie cases, one gets absorbed by their own attack and ignores moves that could counterattack. It's not intuitive to think that while you are punching somebody, they would choose to punch back rather than avoid the pain. In chess it's a little bit easier and more effective, since you can abandon a piece in order to achieve an overall gain in the game, but it can and it is done in physical combat as well.

Okay, we now have two alternatives. One is the logical one: take into account all the rules chess masters have taught us, shortcuts for achieving a better position on the board; choose moves based on those principles and then gauge the likely response from the opponent. Repeat. This is exactly like a MinMax algorithm! So we won't do that. The hell with it! If I can't enjoy the game, neither will my enemy!!

Human solution: don't do anything. Think of what your opponent would do, if you wouldn't move anything and foil their immediate plan. This way of thinking would be counterintuitive for a computer algorithm. Functioning on the basis of specific game rules, a computer would never be inclined to think "what would the enemy do if I didn't move anything, which is ILLEGAL in chess?". That makes us superior, obviously ;-)

Slowly, but surely, a third component of the algorithm becomes apparent: the move order choice. Let's imagine a naive MinMax implementation. In order to assess every possible move, it would have to enumerate them. If the list of moves is always the same in a certain board position, the game will always proceed the same way. The solution is to take the list of possible moves, but in a random order. In the case of the "human algorithm" the ordering becomes more complex (favouring powerful piece moves, for example). One could even consider the ordering mechanism responsible for choosing whether to do a careful breadth search for each level or a depth first one.

Here is a suggestion for an algorithm, one that takes into account the story of the game and less the objective gain or position strength:

For each of your power pieces - anything but the king and pawns - compute mobility, or the possibility to move and attack. Favour the stronger pieces first.
For each power piece with low mobility consider pawn moves that would maximize that mobility.
For each power piece with high mobility consider the moves that would increase the chance of attack or that would attack directly
For each strong move, consider the obstacles - enemy pieces, own pieces, possible enemy countermeasures
Make the move that enables the considered power move or that foils the enemy attempts of reply

The advantage of this approach is that it only takes into account the enemy when he can do something to stop you, the pawns only when they can enable your devious plan and focuses on ventures that yield the best attack for your heroes. For any obstruction, you delegate the resolution of the problem to a different routine. This makes the algorithm parallelizable as well as modular - something we devs love because we can test the individual parts separately.

This algorithm would still use a board estimation function, but being more focused on heroic attacks, it would prefer interesting move orders to static positions as well as the "fun factor", something that is essential to a human-like algorithm. If the end result of the attack is a check-mate, then it doesn't really matter what position estimate you get when you did half the moves. All one has to wonder is if the attack is going to be successful or not and if one can do something to improve the chances of success. And indeed this is one of the most difficult aspects for a chess playing human: to switch from a failing plan to a successful plan when it is not yet clear is the first plan is failing. We invest energy and thought into an idea and we want it to work. A lot of the chess playing strategy of human rookies relies on prayer, after all. A computer would just assess the situation anew at every move, even if it has a strategy cached somewhere. If the situation demands it, a new strategy will be created and the last one abandoned. It's like killing your child and making another!

But, you will say, all you did so far was to describe an inferior algorithm that can be approximated by MinMax with only custom choices for the pruning and move order functions! You are missing the point. What I am describing is not supposed to beat Grand Masters, but to play a fun game with you, the casual player. More than that, my point is that for different desired results, different algorithms must be employed. This would be akin to creating a different AI for each level of a chess game.

Then there is the issue of the generalized TicTacToe or other games, such as Arimaa, created specially to make it difficult for computer algorithms to play, where MinMax fails completely. To make a comparison to real life, it's like you would consider the career steps you would take in life based on all possible jobs available, imagining what would it be to be employed there, what the difficulties might be, finding solutions to those problems, repeating the procedure. You will get to the conclusion that it is a good idea to become a computer scientist after thoroughly examining and partially understanding what it would be like to be a garbage man, a quantum scientist, a politician and a gigolo, as well as all the jobs in between. Of course, that is not as far fetched as you think, since in order to be a success in software development you must be at least a politician and a garbage man, perhaps even a gigolo. Lucky for our profession, quantum computers are in the works, too.

The same incongruency can be found when thinking of other games humans enjoy, like races. The desired result can only be achieved at the end of the race, when you actually get somewhere. In order to get to that specific point in space, you could consider the individual value of each direction change, or even of each step. However humans do it differently, they specify waypoints that must be achieved in order to get to the finish and then focus on getting from waypoint to waypoint, rather than rethinking the entire course. In computer terms this is a divide-and-conquer strategem, where one tries to solve a problem that has known start and end points by introducing a middle point and then solving the problem from the start to the middle. BTW, this also solves Zeno's paradox: "Why does the arrow reach its target if, at any point in its course, it has at least half the distance left to fly?" and the answer is "Because of the exit condition that prevents a stack overflow". Try to sell that one in a philosophy class, heh heh.

So why aren't chess AIs based on human thinking processes? Why don't they implement a divide and conquer solution for a game that always starts with a specific board position and ends in capturing a specific piece? Why do chess engines lower their "level" by sometimes randomly choosing a completely losing path instead of something that is plausible to choose, even if completely wrong objectively? How can MinMax be the best general algorithm for game AIs, when some of them have a branching factor that makes the use of the algorithm almost useless?

I obviously don't have the answers to these questions, but I may have an opportunity to explore them. Hopefully I will be less lazy than I usually am and invent something completely unscientific, but totally fun! Wish me luck!