If you read an Angular book or read a howto, you will think that Angular is the greatest discovery since fire. Everything is neatly stacked in modules, controllers, templates, directives, factories, etc. The problem comes when you want to use some code of your own, using simple Javascript that does specific work, and then you want to link it nicely with AngularJS. It is not always easy. My example concerns the simple display of a dialog which edits an object. I want it to work on every page, so I added it to the general layout template. The layout does not have a controller. Even if I add it, the dialog engine I have been using was buggy and I've decided to just use jQuery.dialog.

So here is my conundrum: How to load the content of a dialog from an Angular template, display it with jQuery.dialog, load the information with jQuery.get, then bind its input elements to an Angular scope object. I've tried the obvious: just load the template in the dialog and expect Angular to notice a new DOM element was added and parse it and work its magic. It didn't work. Why can't I just call an angular.refresh(elem); function and get it over with, I thought. There are several other solutions. One is to not create the content dynamically at all, just add it to the layout, mark it with ng-controller="something" and then, in the controller, save the object you are interested in or the scope as some sort of globally accessible object that you instantiate from jQuery.get. The dialog would just move the element around, afterwards. That means you need to create a controller, maybe in another file, to be nice, then load it into your page. Another is to create some sort of directive or script tag that loads the Angular template dynamically and to hope it works.

Long story short, none of these solutions appealed to me. I wanted a simple refresh(elem) function. And there is one. It is called angular.injector. You call it with the names of the modules you need to load ('ng' one of them and usually the main application module the second). The result is a function that can use invoke to get the same results as a controller constructor. And that is saying something: if you can do the work that the controller does in your block of code, you don't need a zillion controllers making your life miserable, nor do you need to mark the HTML uselessly for very simple functionality.

Without further ado, here is a function that takes as parameters an element and a data object. The function will force angular to compile said element like it was part of the angular main application, then bind to the main scope the properties of the data object:

function angularCompile(elem, data) {
// create an injector
var $injector = angular.injector(['ng','app']);

// use the type inference to auto inject arguments, or use implicit injection
$injector.invoke(function($rootScope, $compile, $document){
var compiled = $compile(elem || $document);
compiled($rootScope);
if (data) {
for (var k in data) {
if (data.hasOwnProperty(k)) {
$rootScope[k]=data[k];
}
}
}
$rootScope.$digest();
});
}


Example usage:

angularCompile(dialog[0],{editedObject: obj}); // will take the jQuery dialog element, compile it, and add to the scope the editedObject property with the value of obj.


Full code:

OpenTranslationDialog=function(Rule, onProceed, onCancel) {
jQuery.ajax({
type: 'GET',
url: '/Content/ng-templates/dialogs/Translation.html',
data: Rule,
success: function(data) {
var dialog=jQuery('<div></div>')
.html(data)
.dialog({
resizable:true,
width:700,
modal:true,
buttons: {
"Save": function() {
var url='/some/api/url';
jQuery.ajax({
type:'PUT',
url:url,
data:Rule,
success:function() {
if (onProceed) onProceed();
$(this).dialog( "close" );
},
error:function() {
alert('There was an error saving the rule');
}
});
},
Cancel: function() {
if (onCancel) onCancel();
$(this).dialog( "close" );
}
}
});

angularCompile(dialog[0],{Rule:Rule});
},
error:function() {
alert('There was an error getting the dialog template');
}
});
}


Before you take my word on it, though, beware: I am an Angular noob and my desire here was to hack away at it in order to merge my own code with the nice structured code of my colleagues, who now hate me. Although they liked angular.injector when I showed it to them :)

Update 2015 August 28: I've replaced the function master.sys.fn_varbintohexstr with CONVERT, with the extra parameter 2, which translates a binary field into a hexadecimal string with no leading 0x. In addition to being ugly to use, fn_varbintohexstr is very slow.

Sometimes you need to create a unique identifier for a bunch of values so that you use it as an ID in the database. The immediately obvious choice is the CHECKSUM and BINARYCHECKSUM functions. But beware, the purpose of these functions is to detect changes in a string, not to uniquely identify it. It might seem strange, but the two concepts are very different. The change modification functionality is only meant to generate very different values on small changes. The uniqueness is trying to create a value as distinctive as possible for any string. That is why when you use a checksum you will get a lot of similar values for (very) different strings.

Enter HASHBYTES, another function that has the purpose of creating a cryptographic hash for a string. It is mainly used for password hashing, but it will fit nicely for our purpose. There are some caveats, though. First, CHECKSUM gets a variable number of parameters, HASHBYTES only accepts one, so we must take care of the cumbersome concatenation of multiple values. Unfortunately SQL functions do not have the option of variable parameters, which is truly a shame, so we can't hack it. Also, the value that HASHBYTES returns is a varbinary. We could cast it to NVARCHAR, but it turns into a weird Chinese characters string. In order to turn it into a proper string, we need to use the same function used by SQL Server to display varbinary when selecting it: master.sys.fn_varbintohexstr the CONVERT function with a parameter of 2 (hex string without the leading 0x).

So let's compare the two usages. Suppose we have this nice table that contains company data: company name, contact first name, contact last name, phone, email, yearly value. We need to create a unique ID based on these values.
First CHECKSUM:
SELECT CHECKSUM(companyName, firstName, lastName, phone, email, yearlyValue) FROM OurTable
So easy! Just add the columns, no matter how many or what type they have, and get a value as a result. You can even use * to select all columns in a row. You also have the advantage of getting the same checksum for differently capitalized strings. If you don't want this behaviour, use BINARYCHECSUM, which works even better.

Second HASHBYTES:
SELECT CONVERT(VARCHAR(Max),HASHBYTES('SHA1',companyName+'|'+firstName+'|'+lastName+'|'+phone+'|'+email+'|'+CAST(yearlyValue as NVARCHAR(100))),2) as id,*
FROM OurTable
Ugly! You need to create a string from different types, using ugly casts. Also, this works more like BINARYCHECKSUM. If you want to get the same functionality as CHECKSUM you need to use LOWER(LTRIM(RTRIM(value))). Horrid!
However, it works.

WARNING: using CAST to NVARCHAR from a FLOAT loses precision. You should use STR instead!

A middle solution is to use XCHECKSUM. What is that, you ask? A placeholder that can be replaced with some regular expression search and replace, of course :)

Update: I've created a query that creates the script to update the value of a column called 'ValuesHash', for tables that have it, with the hash of all columns that are not in a list of names, they are not primary keys and they are not foreign keys, plus they are not computed, rowguidcol or filestream.
Imagine the scenario where you have something like this:
  • Table A:
    1. Id: primary identity key
    2. Data1: some data
    3. Data2: some data
    4. CreateTime: the creation time
    5. ValuesHash: a VARBINARY(50) column - only 20 are required normally, but let's make sure :)
  • Table B:
    1. Id: primary identity key
    2. AId: foreign key to A
    3. Data1: some data
    4. Data2: some data
    5. ModifyTime: the modification time
    6. ValuesHash: a VARBINARY(50) column - only 20 are required normally, but let's make sure :)
  • Table C:
    1. Id: primary identity key
    2. AId: foreign key to A
    3. Data1: some data
    4. Data2: some data
The query below will update ValuesHash for A and B (because C doesn't have the ValuesHash column) with a hash constructed from the Data columns. The Id columns will be ignored for being primary keys (and for being in the list of columns to ignore), the AId columns will be ignored for being foreign keys, ValuesHash and CreateTime and ModifyTime will be ignored for being in a list of custom columns)

WARNING: each column data is always truncated to 4000 characters, then the corresponding string is also truncated to 4000 bytes before running HASHBYTES (which only accepts a maximum of 8000 bytes). This hash will help in determining unique records, but it is not 100%.

SELECT * 
FROM (
SELECT t.name,
'UPDATE [' + t.name+ '] SET ValuesHash = HASHBYTES(''SHA1'',SUBSTRING('
+ Stuff(
(SELECT '+ ''|''+ ISNULL('+CASE
WHEN tp.name IN ('float', 'real') THEN 'STR('+c.name+',30,30)'
WHEN tp.name IN ('binary', 'varbinary') THEN 'CONVERT(NVARCHAR(4000),'+c.name+',2)'
ELSE 'CONVERT(NVARCHAR(4000),'+c.name+')' END+','''')'
FROM sys.all_columns c
INNER JOIN sys.types tp
ON c.system_type_id=tp.system_type_id
AND c.user_type_id=tp.user_type_id
LEFT JOIN sys.index_columns ic
ON ic.object_id=t.object_id
AND ic.column_id=c.column_id
LEFT JOIN sys.indexes i
ON ic.object_id=i.object_id
AND ic.index_id=i.index_id
LEFT JOIN sys.foreign_key_columns fc
ON fc.parent_object_id=t.object_id
AND c.column_id=fc.parent_column_id
WHERE t.object_id=c.object_id
AND ISNULL(c.is_identity, 0)=0
AND ISNULL(c.is_computed, 0)=0
AND ISNULL(c.is_filestream, 0)=0
AND ISNULL(c.is_rowguidcol, 0)=0
AND ISNULL(i.is_primary_key, 0)=0
AND fc.parent_column_id IS NULL
AND c.name NOT IN ('Id', 'CreateTime' , 'AcquireTime' , 'IntermediateCreateTime', 'IntermediateModifyTime', 'IntermediateDeleteTime', 'ValuesHash')
ORDER BY Sign(c.max_length) DESC, c.max_length, Lower(c.name)
FOR XML PATH(''), TYPE).value('.', 'nvarchar(max)')
, 1, 7, '')
+ ',0,4000)) WHERE ValuesHash IS NULL' AS computed
FROM sys.tables t
INNER JOIN sys.all_columns c
ON t.object_id = c.object_id
WHERE c.name = 'ValuesHash') x
WHERE computed IS NOT NULL
ORDER BY name

Change it to suit your needs. It is by no means perfect, but it's a start for whatever you need.

Update:

A new FORMAT function was introduced in SQL Server 2012, working somewhat similar to the .NET ToString method. Using that function is slightly more precise:

SELECT * 
FROM (
SELECT t.name,
'UPDATE [' + t.name+ '] SET ValuesHash = HASHBYTES(''SHA1'',SUBSTRING('
+ Stuff(
(SELECT '+ ''|''+ ISNULL('+CASE
WHEN tp.name IN ('float', 'real') THEN 'FORMAT('+c.name+',''R'')'
WHEN tp.name IN ('decimal') THEN 'FORMAT('+c.name+',''G'')'
WHEN tp.name IN ('datetime','datetime2') THEN 'FORMAT('+c.name+',''O'')'
WHEN tp.name IN ('binary', 'varbinary') THEN 'CONVERT(NVARCHAR(4000),'+c.name+',2)'
ELSE 'CONVERT(NVARCHAR(4000),'+c.name+')' END+','''')'
FROM sys.all_columns c
INNER JOIN sys.types tp
ON c.system_type_id=tp.system_type_id
AND c.user_type_id=tp.user_type_id
LEFT JOIN sys.index_columns ic
ON ic.object_id=t.object_id
AND ic.column_id=c.column_id
LEFT JOIN sys.indexes i
ON ic.object_id=i.object_id
AND ic.index_id=i.index_id
LEFT JOIN sys.foreign_key_columns fc
ON fc.parent_object_id=t.object_id
AND c.column_id=fc.parent_column_id
WHERE t.object_id=c.object_id
AND ISNULL(c.is_identity, 0)=0
AND ISNULL(c.is_computed, 0)=0
AND ISNULL(c.is_filestream, 0)=0
AND ISNULL(c.is_rowguidcol, 0)=0
AND ISNULL(i.is_primary_key, 0)=0
AND fc.parent_column_id IS NULL
AND c.name NOT IN ('Id', 'CreateTime' , 'AcquireTime' , 'IntermediateCreateTime', 'IntermediateModifyTime', 'IntermediateDeleteTime', 'ValuesHash')
ORDER BY Sign(c.max_length) DESC, c.max_length, Lower(c.name)
FOR XML PATH(''), TYPE).value('.', 'nvarchar(max)')
, 1, 7, '')
+ ',0,4000)) WHERE ValuesHash IS NULL' AS computed
FROM sys.tables t
INNER JOIN sys.all_columns c
ON t.object_id = c.object_id
WHERE c.name = 'ValuesHash'
) x
WHERE computed IS NOT NULL
ORDER BY name

and has 0 comments
It just happens that I have two different projects that have the need of cluster analysis, applied in two different ways: one has uses on maps, where a large number of items needs to be displayed quickly, while another implies finding clusters of news items, where the distance between them is determined by their content. The most used clustering algorithm and the first to be found by searching the web is the k-means clustering. Its purpose is to categorize a list of items into a number of k clusters, hence the name. Setting aside the use value of the algorithm for my purposes, the biggest problem I see is the complexity: in its general form it is at least O(n2), and most of the time a lot higher. The net abounds with scientific papers investigating the k-means complexity and suggesting improvements, but they are highly mathematical and I didn't have the time to investigate further. So I just built my own algorithm. It is clearly fuzzy, imperfect, may even be wrong in some situations, but at least it is fast. I will certainly investigate this area more, maybe even try to understand the math behind it and analyse my results based on this research. When I do that I will update this post or write others. But until then, let me present my work so far.

The first problem I had was, as I said, complexity. For one million points on the map, any algorithm that takes into account the distance between any two items will have to make at least one trillion comparisons. So my solution was to limit the number of items by grouping them in a grid:
Step 1: find the min and max on each dimension (that means going through the entire item collection once or knowing beforetime the map bounds)
Step 2: determine a number of cells that would be a bit more than what I need in the end. (that's a decision I have to take, no algorithmic complexity)
Example: for my map example I have only two dimensions: X and Y. I want to display an upper bound of 1000 clusters. Therefore I find the minimum and maximum X and Y and then split each dimension into 100 slots. That means I would cluster the items I have into 10000 cells.
Step 3: for each item, find its cell based on X,Y and add the item to the cell. This is done by simple division: (X-minX)/(maxX-minX). (again that means going once through the collection)
Step 4: find the smallest cell (the complexity is reduced now to working with cells)
Step 5: find its smallest neighbour (the complexity of this on the implementation)
Step 6: merge the two cells
Until the number of cells is larger than the desired number of clusters, repeat from Step 4.
In the end, the algorithm is O(n+p*log(p)), I guess, where p is the number of cells chosen at step 2.

Optimizations are the next issue.
  • How does one find the neighbours of a cell? On Step 3 we also create a list of neighbors for each new cluster by looking for a cluster that is at coordinates immediately above, below, left or right. When we merge two clusters, we get a cluster that is a neighbour to all the neighbours of the merged clusters.
  • How does one quickly get the cluster at a certain position? We create a dictionary that has the coordinates as the key. What about when we merge two clusters? Then the new cluster will be accessible by any of the original cluster keys (that implied that each cluster has a list of keys, as well)
  • How does one find the smallest cell in the cluster list? After Step 3 we sort the cluster list by the number of items they contain and each time we perform a merge we find the first item larger than the merged result and we insert it in the list at that location, so that the list always remains sorted.
  • How do we easily find the first item larger than an item? By employing a divide-et-impera method of splitting the list in two at each step and choosing to look into one bucket based on the item count of the cluster at the middle position

Before you use the code note that there are specific scenarios where this type of clustering would look a bit off, like items in a long line or an empty polygon (the cluster will appear to be in its center). But I needed speed and I got it.

Enjoy!

Update: The performance of removing or adding items from a List is very poor, so I created a LinkedList version that seems to be even faster. Here it is. The old List version is at the end

/// <summary>
/// Generic x,y positioned item class
/// </summary>
public class Item
{
public double X { get; set; }
public double Y { get; set; }
}

public class ClusteringDataProcessor
{
/// <summary>
/// the squared root of the initial cell number (100*100 = 10000)
/// </summary>
private const int initialClustersSquareSide = 100;
/// <summary>
/// the desired number of resulting clusters
/// </summary>
private const int numberOfFinalClusters = 1000;
private static Random _rnd = new Random();

/// <summary>
/// In this implementation, the Cluster inherits from Item, so the result is a list of Item
/// In the case of one Item Clusters, we actually return the original Item
/// </summary>
/// <param name="list"></param>
/// <returns></returns>
public List<Item> Process(List<Item> list)
{
if (list.Count <= numberOfFinalClusters) return list;
// find bounds. If already known, this can be provided as parameters
double minY = double.MaxValue;
double minX = double.MaxValue;
double maxY = double.MinValue;
double maxX = double.MinValue;
foreach (var item in list)
{
var y = item.Y;
var x = item.X;
minY = Math.Min(minY, y);
maxY = Math.Max(maxY, y);
minX = Math.Min(minX, x);
maxX = Math.Max(maxX, x);
}
// the original list of clusters
var clusterArr = new List<Cluster>();

// the dictionary used to index clusters on their position
var clusterDict = new Dictionary<string, Cluster>();

// the unit for finding the cell position for the initial clusters
var qX = (maxX - minX) / initialClustersSquareSide;
var qY = (maxY - minY) / initialClustersSquareSide;

foreach (var item in list)
{
// compute cell coordinates (integer and the values used as keys in the dictionary)
var cx = Math.Min((int)((item.X - minX) / qX), initialClustersSquareSide - 1);
var cy = Math.Min((int)((item.Y - minY) / qY), initialClustersSquareSide - 1);
var key = getKey(cx, cy);
Cluster cluster;
// if the cluster for this position does not exist, create it
if (!clusterDict.TryGetValue(key, out cluster))
{
cluster = new Cluster
{
Keys = new List<string> { key },
X = minX + cx * qX + qX / 2,
Y = minY + cy * qY + qY / 2,
//Items = new List<Item>(),
Count = 0,
Neighbors = new List<string>()
};
// the neighbours of this cluster are the existing clusters that are below, above, left or right. If they exist, this cluster also is added to their neighbour list
var nkeys = new[] { getKey(cx - 1, cy), getKey(cx + 1, cy), getKey(cx, cy - 1), getKey(cx, cy - 1) };
for (var j = 0; j < 4; j++)
{
Cluster nc;
if (clusterDict.TryGetValue(nkeys[j], out nc))
{
cluster.Neighbors.Add(nkeys[j]);
nc.Neighbors.Add(key);
}
}
clusterDict[key] = cluster;
clusterArr.Add(cluster);
}
// add the item to the cluster (note that the commented lines hold the items in a list, while the current implementation only remember the number of items)
//cluster.Items.Add(item);
cluster.Item = item;
cluster.Count++;
// add the item position to the sums, so that we can compute the final position of the cluster at the Finalize stage without enumerating the items (or having to hold them in an Items list)
cluster.SumX += item.X;
cluster.SumY += item.Y;
}
// if the number of items is already smaller than the desired number of clusters, just return the clusters
if (clusterArr.Count <= numberOfFinalClusters)
{
return clusterArr.Select(c => c.Finalize()).ToList();
}

// sort the cluster list so we can efficiently find the smallest cluster
//clusterArr.Sort(new Comparison<Cluster>((c1, c2) => c1.Items.Count.CompareTo(c2.Items.Count)));
LinkedList<Cluster> clusterLinkedList = new LinkedList<Cluster>(clusterArr.OrderBy(c => c.Count));

// remember last merged cluster, as next merged clusters might have similar sizes
var lastCount = int.MaxValue;
LinkedListNode<Cluster> lastLinkedNode = null;

// do this until we get to the desired number of clusters
while (clusterLinkedList.Count > numberOfFinalClusters)
{
// we need to get the smallest (so first) cluster that has any neighbours
var cluster1 = clusterLinkedList.First(c => c.Neighbors.Any());
Cluster cluster2 = null;
// then find the smallest neighbour
var min = int.MaxValue;
foreach (var nkey in cluster1.Neighbors)
{
var n = clusterDict[nkey];
//var l = n.Items.Count;
var l = n.Count;
if (l < min)
{
min = l;
cluster2 = n;
}
}
// join the clusters
var keys = cluster1.Keys.Union(cluster2.Keys).ToList();
var cluster = new Cluster
{
Keys = keys,
// approximate cluster position, not needed
//X = (cluster1.X + cluster2.X) / 2,
//Y = (cluster1.Y + cluster2.Y) / 2,

// the result holds the count of both clusters
//Items = cluster1.Items.Union(cluster2.Items).ToList(),
Count = cluster1.Count + cluster2.Count,
// the neighbors are in the union of their neighbours that does not contain themselves
Neighbors = cluster1.Neighbors.Union(cluster2.Neighbors)
.Distinct()
.Except(keys)
.ToList(),
// compute the sums for the final position
SumX = cluster1.SumX + cluster2.SumX,
SumY = cluster1.SumY + cluster2.SumY
};
foreach (var key in keys)
{
clusterDict[key] = cluster;
}
// efficiently remove clusters since LinkedList removals are fast
clusterLinkedList.Remove(cluster1);
clusterLinkedList.Remove(cluster2);

// a little bit of magic to make the finding of the insertion point faster (LinkedLists go through the entire list to find an item)
// if the last merged cluster is smaller or equal to the new merged cluster, then start searching from it.
// this halves the insert time operation, but I am sure there are even better implementations, just didn't think it's so important
LinkedListNode<Cluster> start;
if (lastCount <= cluster.Count && lastLinkedNode.Value != cluster1 && lastLinkedNode.Value != cluster2)
{
start = lastLinkedNode;
}
else
{
start = clusterLinkedList.First;
}
var insertionPoint = nextOrDefault(clusterLinkedList, start, c => c.Count >= cluster.Count);
// remember last merged cluster
LinkedListNode<Cluster> link;
if (insertionPoint == null)
{
link = clusterLinkedList.AddLast(cluster);
}
else
{
link = clusterLinkedList.AddBefore(insertionPoint, cluster);
}
lastLinkedNode = link;
lastCount = cluster.Count;
}
return clusterLinkedList.Select(c => c.Finalize()).ToList();
}

private LinkedListNode<T> nextOrDefault<T>(LinkedList<T> list, LinkedListNode<T> start, Func<T, bool> condition)
{
while (start.Next != null)
{
if (condition(start.Value)) return start;
start = start.Next;
}
return null;
}

private string getKey(int cx, int cy)
{
return cx + ":" + cy;
}

private class Cluster : Item
{
public Cluster()
{
SumX = 0;
SumY = 0;
}

public double SumX { get; set; }
public double SumY { get; set; }

public List<string> Keys { get; set; }

//public List<Item> Items { get; set; }
public int Count { get; set; }
public Item Item { get; set; }

public List<string> Neighbors { get; set; }


/// <summary>
/// the function that finalizes the computation of the values or even decides to return the only item in the cluster
/// </summary>
/// <returns></returns>
public Item Finalize()
{
//if (Items.Count == 1) return Items[0];
if (Count == 1) return Item;
/*Y = SumY / Items.Count;
X = SumX / Items.Count;
Count = Items.Count;*/
Y = SumY / Count;
X = SumX / Count;
Count = Count;
return this;
}
}
}

old List based code (click to show)

and has 2 comments
Well, sometimes an admin will try to make the system secure by annoying the people who have to use it. Yeah, that always works. My situation is that I have to login every day into a virtual machine that is on a "secure network". So after using a very restrictive password policy that forces everybody to be creative in the way they write "password" and "123456", he also disallowed the saving credentials in Remote Desktop Connection. So every day I have to enter the damn complicated password. I couldn't have that. Here is a .js script that you execute with WScript and it logs you in automatically:
var shell = WScript.CreateObject("WScript.Shell");
shell.Run("mstsc /v:[remote server] /console");
while (!shell.AppActivate("Windows Security")) {
WScript.Sleep(100);
}
WScript.Sleep(100);
shell.SendKeys("[password]{enter}");

Save this into a Javascript file and replace [remove server] and [password] with your settings and either double click the .js file or create a batch file like this:
@echo off
start "Auto log on!" wscript c:\Batches\autologin.js

Of course, this means your secure password will be stored in a stupid text file somewhere, so be warned.

This is one of those WTF moments. After more than a decade of working in software development I learn something this basic about T-SQL (or rather, any flavour based on SQL-92). What would you think happens when running this script?
IF ''='                 ' SELECT 'WTF?!' -- empty string compared to a bunch of spaces
IF ' '=' ' SELECT 'WTF?!' -- bunch or spaces compared to another bunch of spaces of different length
IF 'x'='x ' SELECT 'WTF?!' -- 'x' compared to 'x' followed by a bunch of spaces
IF 'x'=' x' SELECT 'WTF?!' -- 'x' compared to 'x' preceded by a bunch of spaces

There will be three WTF rows returned, for the first three queries. You don't believe me? Try it yourself. The motive is explained here: INF: How SQL Server Compares Strings with Trailing Spaces. Short story shorter: in order for SQL to compare two strings of different lengths, it first right-pads the shorter one with spaces.

So what can you do to fix it? Easy enough, use LEN ,right? Nope. Read the definition carefully: Returns the number of characters of the specified string expression, excluding trailing blanks. A possible but weird solution is to use DATALENGTH. A string is empty only is it has a datalength of 0. In the case of NVARCHAR you could even divide the resulting number to 2 in order to get the true length of the string. WTF, right?

and has 1 comment
Well, it's pretty obvious, but I wanted to post it here, as well. You see, we have this query that was supposed to filter some values based on a parameter. The query was done like this: SELECT * FROM table WHERE value=(CASE WHEN @filter IS NULL THEN value ELSE @filter). Can you spot the problem? Indeed, if value is NULL, then value is NOT equal to value. Not only is this incorrect, but also bad from the standpoint of the SQL execution plan. It is much faster to do SELECT * FROM table WHERE (@filter is NULL OR value=@filter). If, for whatever reason, you need the first syntax, you need to do it like this: SELECT * FROM table WHERE ISNULL(value,@impossibleValueThatIsNotNull)=COALESCE(@filter, value, @impossibleValueThatIsNotNull). Yeah, a bit of a show off, but when the "no filter" value is null, it's better to use ISNULL and COALESCE wherever possible.

I have at work a very annoying HTTP proxy that requires a basic authentication. This translates in very inconsistent behaviour between applications and the annoying necessity of entering the username and password whenever the system wants it. So I've decided to add another local proxy to the chain that handles the authentication for me forever.

This isn't as easy as it sounds. Sure, proxies are a dime a dozen, but for some reason most of them seem to be coming from the Linux world. That is really bad user interaction, vague documentation and unhelpful forums where any request for help ends up in some version of "read the fucking manual". Hence I've decided to help out by creating this lovely post that explains how you can achieve the purpose described above with very little headache. These are the very easy steps that you have to undertake:
  1. Download and install Privoxy
  2. Go to the Program Files folder and look for Privoxy. You will need to edit two files: config.txt and user.action
  3. Optional: change the listen-address port, otherwise the proxy will function on port 8118
  4. Enter your proxy authentication username and password in the fields below and press Help me configure Privoxy - this is strictly a client base Javascript so don't worry that I am going to steal your proxy credentials...
  5. Edit user.action and add the bit of text that appeared as destined for that file.
  6. Edit config.txt, look for examples of forward and add to it the bit that belongs to config.txt and replace proxy:port and the domains and IP masks with the correct values for you
  7. Restart Privoxy
  8. Configure your internet settings to use a proxy on 127.0.0.1 and the port you configured in step 2 (or the default 8118)

This should be it. Enjoy!

Username:

Password:





I made a function in T-SQL that parsed some string and returned a decimal. It all worked fine until one of my colleagues wanted to use it on the test server. And here there was, a beautiful error message: 'decimal' is not a recognized built-in function name. I isolated the line, executed it, same error. It was like the server did not understand decimals anymore. The line in question was a simple SELECT TRY_CONVERT(DECIMAL(18,6),'10.33'). If I used CONVERT, though, it all worked fine. Another hint was that the function worked perfectly fine on the same server, in another database. The problem was that for that particular database, the defined SQL server version was 2008, not 2012. We changed it and it all worked fine after that. The setting in question is found in the Properties of the database, Options, Compatibility level.

While working on a small personal project, I decided to make a graphical tool that displayed a list of items in a Canvas. After making it work (for details go here), I've decided to make the items animate when changing their position. In my mind it had to be a simple solution, akin to jQuery animate or something like that; it was not.

The final solution was to finally give up on a generic method for this and switch to the trusted attached properties. But if you are curious to see what else I tried and how ugly it got, read it here:
Click here to get ugly!

Well, long story short: attached properties. I created two attached properties CanvasLeft and CanvasTop. When they change, I animate the real properties and, at the end of the animation, I set the value. Here is the code:
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows;
using System.Windows.Controls;
using System.Windows.Media.Animation;

namespace Siderite.AttachedProperties
{
public static class UIElementProperties
{
public static readonly DependencyProperty CanvasLeftProperty = DependencyProperty.RegisterAttached("CanvasLeft", typeof(double), typeof(UIElementProperties), new FrameworkPropertyMetadata(
0.0,
FrameworkPropertyMetadataOptions.OverridesInheritanceBehavior,
CanvasLeftChanged));

[AttachedPropertyBrowsableForType(typeof(UIElement))]
public static double GetCanvasLeft(DependencyObject element)
{
if (element == null)
{
throw new ArgumentNullException("element");
}
return (double)element.GetValue(CanvasLeftProperty);
}

[DesignerSerializationVisibility(DesignerSerializationVisibility.Visible)]
public static void SetCanvasLeft(DependencyObject element, double value)
{
if (element == null)
{
throw new ArgumentNullException("element");
}
element.SetValue(CanvasLeftProperty, value);
}

private static void CanvasLeftChanged(DependencyObject d, DependencyPropertyChangedEventArgs e)
{
var sb = new Storyboard();
var oldVal = (double)e.OldValue;
if (double.IsNaN(oldVal)) oldVal = 0;
var newVal = (double)e.NewValue;
if (double.IsNaN(newVal)) newVal = oldVal;
var anim = new DoubleAnimation
{
From = oldVal,
To = newVal,
Duration = new Duration(TimeSpan.FromSeconds(1)),
FillBehavior = FillBehavior.Stop
};
Storyboard.SetTarget(anim, d);
Storyboard.SetTargetProperty(anim, new PropertyPath("(Canvas.Left)"));
sb.Children.Add(anim);
sb.Completed += (s, ev) =>
{
d.SetValue(Canvas.LeftProperty, newVal);
};
var fe = d as FrameworkElement;
if (fe != null)
{
sb.Begin(fe, HandoffBehavior.Compose);
return;
}
var fce = d as FrameworkContentElement;
if (fce != null)
{
sb.Begin(fce, HandoffBehavior.Compose);
return;
}
sb.Begin();
}


public static readonly DependencyProperty CanvasTopProperty = DependencyProperty.RegisterAttached("CanvasTop", typeof(double), typeof(UIElementProperties), new FrameworkPropertyMetadata(
0.0,
FrameworkPropertyMetadataOptions.OverridesInheritanceBehavior,
CanvasTopChanged));

[AttachedPropertyBrowsableForType(typeof(UIElement))]
public static double GetCanvasTop(DependencyObject element)
{
if (element == null)
{
throw new ArgumentNullException("element");
}
return (double)element.GetValue(CanvasTopProperty);
}

[DesignerSerializationVisibility(DesignerSerializationVisibility.Visible)]
public static void SetCanvasTop(DependencyObject element, double value)
{
if (element == null)
{
throw new ArgumentNullException("element");
}
element.SetValue(CanvasTopProperty, value);
}

private static void CanvasTopChanged(DependencyObject d, DependencyPropertyChangedEventArgs e)
{
var sb = new Storyboard();
var oldVal = (double)e.OldValue;
if (double.IsNaN(oldVal)) oldVal = 0;
var newVal = (double)e.NewValue;
if (double.IsNaN(newVal)) newVal = oldVal;
var anim = new DoubleAnimation
{
From = oldVal,
To = newVal,
Duration = new Duration(TimeSpan.FromSeconds(1)),
FillBehavior = FillBehavior.Stop
};
Storyboard.SetTarget(anim, d);
Storyboard.SetTargetProperty(anim, new PropertyPath("(Canvas.Top)"));
sb.Children.Add(anim);
sb.Completed += (s, ev) =>
{
d.SetValue(Canvas.TopProperty, newVal);
};
var fe = d as FrameworkElement;
if (fe != null)
{
sb.Begin(fe, HandoffBehavior.Compose);
return;
}
var fce = d as FrameworkContentElement;
if (fce != null)
{
sb.Begin(fce, HandoffBehavior.Compose);
return;
}
sb.Begin();
}
}
}

and this is how you would use them:

<ListView ItemsSource="{Binding KernelItems}" 
SelectedItem="{Binding SelectedItem,Mode=TwoWay}"
SelectionMode="Single"
>
<ListView.ItemsPanel>
<ItemsPanelTemplate>
<Canvas HorizontalAlignment="Stretch" VerticalAlignment="Stretch" Background="Black" />
</ItemsPanelTemplate>
</ListView.ItemsPanel>
<ListView.ItemContainerStyle>
<Style TargetType="{x:Type ListViewItem}">
<Setter Property="FocusVisualStyle" Value="{x:Null}"/>
<Setter Property="Foreground" Value="White"/>
<Setter Property="att:UIElementProperties.CanvasLeft" >
<Setter.Value>
<MultiBinding Converter="{StaticResource CoordinateConverter}">
<Binding Path="X"/>
<Binding Path="ActualWidth" ElementName="lvKernelItems"/>
<Binding Path="DataContext.Zoom" RelativeSource="{RelativeSource AncestorType={x:Type Window}}"/>
</MultiBinding>
</Setter.Value>
</Setter>
<Setter Property="att:UIElementProperties.CanvasTop" >
<Setter.Value>
<MultiBinding Converter="{StaticResource CoordinateConverter}">
<Binding Path="Y"/>
<Binding Path="ActualHeight" ElementName="lvKernelItems"/>
<Binding Path="DataContext.Zoom" RelativeSource="{RelativeSource AncestorType={x:Type Window}}"/>
</MultiBinding>
</Setter.Value>
</Setter>
<Style.Resources>
<SolidColorBrush x:Key="{x:Static SystemColors.HighlightBrushKey}" Color="Transparent" />
<SolidColorBrush x:Key="{x:Static SystemColors.ControlBrushKey}" Color="Transparent" />
<SolidColorBrush x:Key="{x:Static SystemColors.HighlightTextBrushKey}" Color="Black" />
<SolidColorBrush x:Key="{x:Static SystemColors.ControlTextBrushKey}" Color="Black" />
</Style.Resources>
<Style.Triggers>
<Trigger Property="IsSelected" Value="True">
<Setter Property="Foreground" Value="Cyan"/>
<Setter Property="Effect">
<Setter.Value>
<DropShadowEffect ShadowDepth="0" Color="White" Opacity="0.5" BlurRadius="10"/>
</Setter.Value>
</Setter>
<Setter Property="Canvas.ZIndex" Value="1000"/>
</Trigger>
</Style.Triggers>
</Style>
</ListView.ItemContainerStyle>
</ListView>

Hope it helps.

For a WPF project I wanted to create a graphical representation of a list of items. I computed some X,Y coordinates for each item and started changing the XAML of a Listview in order to reflect the position of each item on a Canvas. Well, it is just as easy as you imagine: change the ItemsPanel property to a Canvas and then style each item as whatever you want. The gotcha comes when trying to set the coordinates. The thing is that for each item in a listview a container is constructed and inside the item template is displayed. So here you have all you items displayed exactly as you want them, except the coordinates don't work, since what needs to be placed on the Canvas are the generated containers, not the items. Here is the solution:
<Window.Resources>
<local:CoordinateConverter x:Key="CoordinateConverter"/>
<DataTemplate DataType="{x:Type vm:KernelItem}"> <!-- the template for the data items -->
<Grid>
<Ellipse Fill="{Binding Background}" Width="100" Height="100" Stroke="DarkGray" Name="ellipse"
ToolTip="{Binding Tooltip}"/>
<TextBlock Text="{Binding Text}" MaxWidth="75" MaxHeight="75"
HorizontalAlignment="Center" VerticalAlignment="Center"
TextAlignment="Center"
TextWrapping="Wrap"
ToolTip="{Binding Tooltip}" />
</Grid>
</DataTemplate>
</Window.Resources>
<ListView ItemsSource="{Binding KernelItems}" Name="lvKernelItems"
SelectedItem="{Binding SelectedItem,Mode=TwoWay}"
SelectionMode="Single"
>
<ListView.ItemsPanel>
<ItemsPanelTemplate>
<Canvas HorizontalAlignment="Stretch" VerticalAlignment="Stretch" Background="Black" />
</ItemsPanelTemplate>
</ListView.ItemsPanel>
<ListView.ItemContainerStyle>
<Style TargetType="{x:Type ListViewItem}">
<Setter Property="FocusVisualStyle" Value="{x:Null}"/><!-- no highlight of selected items -->
<Setter Property="Foreground" Value="White"/>
<Setter Property="(Canvas.Left)" >
<Setter.Value>
<MultiBinding Converter="{StaticResource CoordinateConverter}">
<Binding Path="X"/>
<Binding Path="ActualWidth" ElementName="lvKernelItems"/>
<Binding Path="DataContext.Zoom" RelativeSource="{RelativeSource AncestorType={x:Type Window}}"/>
</MultiBinding>
</Setter.Value>
</Setter>
<Setter Property="(Canvas.Top)" >
<Setter.Value>
<MultiBinding Converter="{StaticResource CoordinateConverter}">
<Binding Path="Y"/>
<Binding Path="ActualHeight" ElementName="lvKernelItems"/>
<Binding Path="DataContext.Zoom" RelativeSource="{RelativeSource AncestorType={x:Type Window}}"/>
</MultiBinding>
</Setter.Value>
</Setter>
<Style.Resources><!-- no highlight of selected items -->
<SolidColorBrush x:Key="{x:Static SystemColors.HighlightBrushKey}" Color="Transparent" />
<SolidColorBrush x:Key="{x:Static SystemColors.ControlBrushKey}" Color="Transparent" />
<SolidColorBrush x:Key="{x:Static SystemColors.HighlightTextBrushKey}" Color="Black" />
<SolidColorBrush x:Key="{x:Static SystemColors.ControlTextBrushKey}" Color="Black" />
</Style.Resources>
<Style.Triggers>
<Trigger Property="IsSelected" Value="True"><!-- custom selected item template -->
<Setter Property="Foreground" Value="Cyan"/>
<Setter Property="Effect">
<Setter.Value>
<DropShadowEffect ShadowDepth="0" Color="White" Opacity="0.5" BlurRadius="10"/>
</Setter.Value>
</Setter>
<Setter Property="Canvas.ZIndex" Value="1000"/>
</Trigger>
</Style.Triggers>
</Style>
</ListView.ItemContainerStyle>
</ListView>

As a bonus, you see the way to remove the default selection of an item: the ugly dotted line and the highlighting background.

If you go to the system Internet Settings (in Network Connections, or Internet Explorer or Chrome), and you advance to the Tab "Connections", then click LAN Settings, then go to Advanced... I mean, why wouldn't you? ... there is a checkbox called "Use automatic configuration script". The script is supposed to dynamically return the correct proxy for an URL. The practice is called Proxy Auto Configuration for some reason. The script is Javascript and it uses some predefined functions to return either "DIRECT" (don't use a proxy) or "PROXY address:port" (use the proxy at that address and port). You can chain the options by separating them with a semicolon like this: "PROXY 1.2.3.4:55 ; PROXY 10.20.30.40:50; DIRECT". And before you search like a madman for it, there is no way to specify the username/password for those proxy servers in your config file. You still have to type them when asked.

Use this solution to fix problems with proxies that work well for outside sites, but not for internal networks. For reasons too weird to explain here (but explained here: Understanding Web Proxy Configuration) you cannot just put your script on the local drive and use it, instead you have to read it from an http URL. If you don't have the possibility (or it's too annoying) to install IIS or some other web server in order to serve the pac file, try using it from the local drive with a file:// URL (not just C:\...). However, it is a deprecated method and you may experience issues, with .NET software or Internet Explorer 11, for example.

Here is a sample file that connects directly to any URL that is part of a domain or is part of an IP class:
function FindProxyForURL(url, host) {

var defProxy="10.20.30.40:50"; // the misbehaving or incomplete proxy

var domains=[
".mysite.com",
".xxx",
"localhost"
];
var ipClasses=[
"11.22.33.0",
"55.0.0.0",
"127.0.0.0"
];

for (var i=0; i<domains.length; i++) {
if (dnsDomainIs(host,domains[i])) return "DIRECT";
}

var MYHOST = dnsResolve(host);

for (var i=0; i<ipClasses.length; i++) {
var mask=getMask(ipClasses[i]);
if (isInNet(MYHOST, ipClasses[i],mask)) return "DIRECT";
}

return "PROXY "+defProxy;

function getMask(ip) {
var splits=ip.split('.');
for (var i=0; i<splits.length; i++) {
if (splits[i]!='0') splits[i]='255';
}
return splits.join('.');
}

}

Just add the domains or the IP classes to the arrays in order to connect directly to them. Do not forget to add the local IP classes as well for direct connection, including 127.0.0.0 to access your own localhost.

In order to test or debug your .pac files, use the PacParser open source utility. A reference to the functions you can use in your script can be found here: Using Automatic Configuration, Automatic Proxy, and Automatic Detection

MinMax or Minimax, as some like to call it, is the basis of most Artificial Intelligence built for games like chess. Its basis is extremely easy to understand: a rational player will try to take the best option available to them, so whatever is good for me the adversary will take as the most likely outcome and he will find the best solution against that outcome. I, following the same pattern, will also look for his best counter move and plan against it. Therefore the thinking for a game of chess, let's say, is that I will take all possible moves, find the one that leaves me with the best position (evaluated by a function from the board position), then look for the similar best play for the adversary. I continue this way until I get to the end of the game or am out of computing resources.

Now, that sounds logical and it's crazy easy to implement. The problem is that for all but the most childish of plays, the tree of all possible moves increases exponentially. And chess isn't even one of the worst games to do that. Imagine Tic-Tac-Toe, a game played on a 3x3 board between two players. You have a total of 9 possible moves to choose from as the first player, then 8, then 7, etc. The entire game tree has a total of 9! possible moves, or 362880. But generalize the game to a board of 10x10 and a winning rule of 5 in a line and you get 100! moves, which is less than 1E+158, that is 10 followed by 158 zeros.

That's why the so called pruning was created, the most common of all being Alpha-Beta, which tries to abort the processing of leaves that seem to reach a worse situation than their parent node. Of course, all of this is the general gist. You might want to take into account a number N best moves from the opponent, as well as try a more lenient pruning algorithm (after all, sacrificing a piece brings you to a worse position than when you started, but it might win the game). All of this increases, not decreases the number of possible moves.

And now comes my thought on this whole thing: how can I make a computer play like a human when the core edict of the algorithm is that all participating players are rational? Humans are rarely so. Mathematically I could take N, the number of best moves I would consider for my opponent, to be the total number of moves my opponent could make, but it would increase the exponential base of the tree of moves. Basically it would make the algorithm think of stupid things all the time.

The pruning algorithm seems to be the most important part of the equation. Indeed, I could consider the move choice algorithm to be completely random and as long as I have a perfect pruning algorithm it will remove all the stupid choices from me and let me with the smart ones. A quote comes to mind: "you reach perfection not when you have nothing else to add, but when there is nothing left to remove". It's appropriate for this situation.

Now, before attacking an algorithm that has survived for so long in the AI industry (and making my own awesome one that will defeat all chess engines in the world - of course, that's realistic) I have to consider the alternative algorithm: the lowly human. How does a human player think in a game of chess? First he surveys the board for any easy wins. That means a broad one or two levels analysis based on a simple board evaluation function. Immediately we get something from this: there might be multiple evaluation functions, we don't need just one. The simple one is for looking for greedy wins, like "He moved his queen where I can capture it, yay!".

The same outcome for situations like this would be achieved by a MinMax algorithm, so we ignore this situation. It gets more interesting from now, though. We look for the moves of the most active pieces. I know that this is the rookie system, but I am a rookie, I will make my computer algorithm be as stupid as I am, if I am to play it, so shut up! The rookie will always try to move his queen to attack something. It's the most powerful piece and it should get the most results for the least effort. We left Greed behind, remember? We are now doing Sloth. Still, with a good pruning algorithm we eliminate stupid Queen moves from the beginning, so considering the Queen first, then Rooks, then Bishops, then Knights, etc. is not a bad idea. The order of the pieces can be changed based on personal preferences as well as well established chess rules, like Knights being better that Bishops in closed games and so on.

This is a small optimization, one that probably most game engines have. And we haven't even touched pruning; boy, this is going to be a long article! Now, what does the human do? He does the depth first tree searches. Well, he doesn't think of them like that, he thinks of them as narrative, but it's basically a depth first search. This is the casual "What if...?" type of play. You move the Queen, let's say, bringing it right in the enemy territory. You don't capture anything important, but to bring a strong piece this uncomfortably near to the enemy king is scary. You don't play for game points, but for emotion points, for special effects, for kicks! You don't abandon the narrative, the linear evolution of your attack, until you find that it bears no fruit. It's the equivalent of the hero running toward the enemy firing his pistol. If the enemy is dumb enough to not take cover, aim carefully and shoot a burst from their SMGs, you might get away with it and it would be glorious. If not, you die idiotically.

It is important to note that in the "Hollywood" chess thinking you are prone to assume that the enemy will make mistakes in order to facilitate your brilliant plan. The evaluation goes as follows: "I will try something that looks cool if the chances for a horrible and immediate loss are small". When some hurdle foils your heroic plan, you make subplans that would, as well as you hope, distract the adversary from your actual target. This, as far as I know, is a typical human reasoning type and I doubt many (if any) computer game engines have it. In computer terms, one would have to define a completely new game, a smaller one, and direct an AI designed specifically for it to tell you if it would work or not. Given the massively parallel architecture of the human brain, it is not hard to understand why we do something like this. But we can do the same with a computer, mind you. I am thinking of something like a customized MinMax algorithm working on few levels, one or two, as the human would. That would result in a choice of N possible moves to make. Then construct a narrative for each, a depth search that just tries to get as much as possible from each move without considering many of the implications. Then assign a risk to each level of this story. If the level exceeds a threshold, use the small range MinMax at those points and try to see if you can minimize the risk or if at that point the risk makes your narrative unlikely.


Let's recap the human thinking algorithm so far:
  1. Try to greedily take what the opponent has stupidly made available
  2. Try to lazily use the strongest piece to get the most result with the least effort
  3. Try to pridefully find the most showy move, the one that would make the best drinking story afterwards
  4. Try to delegate the solving of individual problems in your heroic narrative to a different routine

Wow! Doesn't it seem that the seven deadly sins are built-in features, rather than bugs? How come we enjoy playing with opponents that pretty much go through each of them in order to win more than we do with a rational emotionless algorithm that only does what is right?

Again, something relevant transpires: we take quite a long time imagining the best moves we can make, but we think less of the opponent's replies. In computer terms we would prune a lot more the enemy possible moves than we would our own. In most rookie cases, one gets absorbed by their own attack and ignores moves that could counterattack. It's not intuitive to think that while you are punching somebody, they would choose to punch back rather than avoid the pain. In chess it's a little bit easier and more effective, since you can abandon a piece in order to achieve an overall gain in the game, but it can and it is done in physical combat as well.

Okay, we now have two alternatives. One is the logical one: take into account all the rules chess masters have taught us, shortcuts for achieving a better position on the board; choose moves based on those principles and then gauge the likely response from the opponent. Repeat. This is exactly like a MinMax algorithm! So we won't do that. The hell with it! If I can't enjoy the game, neither will my enemy!!

Human solution: don't do anything. Think of what your opponent would do, if you wouldn't move anything and foil their immediate plan. This way of thinking would be counterintuitive for a computer algorithm. Functioning on the basis of specific game rules, a computer would never be inclined to think "what would the enemy do if I didn't move anything, which is ILLEGAL in chess?". That makes us superior, obviously ;-)

Slowly, but surely, a third component of the algorithm becomes apparent: the move order choice. Let's imagine a naive MinMax implementation. In order to assess every possible move, it would have to enumerate them. If the list of moves is always the same in a certain board position, the game will always proceed the same way. The solution is to take the list of possible moves, but in a random order. In the case of the "human algorithm" the ordering becomes more complex (favouring powerful piece moves, for example). One could even consider the ordering mechanism responsible for choosing whether to do a careful breadth search for each level or a depth first one.


Here is a suggestion for an algorithm, one that takes into account the story of the game and less the objective gain or position strength:
  1. For each of your power pieces - anything but the king and pawns - compute mobility, or the possibility to move and attack. Favour the stronger pieces first.
  2. For each power piece with low mobility consider pawn moves that would maximize that mobility.
  3. For each power piece with high mobility consider the moves that would increase the chance of attack or that would attack directly
  4. For each strong move, consider the obstacles - enemy pieces, own pieces, possible enemy countermeasures
  5. Make the move that enables the considered power move or that foils the enemy attempts of reply

The advantage of this approach is that it only takes into account the enemy when he can do something to stop you, the pawns only when they can enable your devious plan and focuses on ventures that yield the best attack for your heroes. For any obstruction, you delegate the resolution of the problem to a different routine. This makes the algorithm parallelizable as well as modular - something we devs love because we can test the individual parts separately.

This algorithm would still use a board estimation function, but being more focused on heroic attacks, it would prefer interesting move orders to static positions as well as the "fun factor", something that is essential to a human-like algorithm. If the end result of the attack is a check-mate, then it doesn't really matter what position estimate you get when you did half the moves. All one has to wonder is if the attack is going to be successful or not and if one can do something to improve the chances of success. And indeed this is one of the most difficult aspects for a chess playing human: to switch from a failing plan to a successful plan when it is not yet clear is the first plan is failing. We invest energy and thought into an idea and we want it to work. A lot of the chess playing strategy of human rookies relies on prayer, after all. A computer would just assess the situation anew at every move, even if it has a strategy cached somewhere. If the situation demands it, a new strategy will be created and the last one abandoned. It's like killing your child and making another!


But, you will say, all you did so far was to describe an inferior algorithm that can be approximated by MinMax with only custom choices for the pruning and move order functions! You are missing the point. What I am describing is not supposed to beat Grand Masters, but to play a fun game with you, the casual player. More than that, my point is that for different desired results, different algorithms must be employed. This would be akin to creating a different AI for each level of a chess game.

Then there is the issue of the generalized TicTacToe or other games, such as Arimaa, created specially to make it difficult for computer algorithms to play, where MinMax fails completely. To make a comparison to real life, it's like you would consider the career steps you would take in life based on all possible jobs available, imagining what would it be to be employed there, what the difficulties might be, finding solutions to those problems, repeating the procedure. You will get to the conclusion that it is a good idea to become a computer scientist after thoroughly examining and partially understanding what it would be like to be a garbage man, a quantum scientist, a politician and a gigolo, as well as all the jobs in between. Of course, that is not as far fetched as you think, since in order to be a success in software development you must be at least a politician and a garbage man, perhaps even a gigolo. Lucky for our profession, quantum computers are in the works, too.

The same incongruency can be found when thinking of other games humans enjoy, like races. The desired result can only be achieved at the end of the race, when you actually get somewhere. In order to get to that specific point in space, you could consider the individual value of each direction change, or even of each step. However humans do it differently, they specify waypoints that must be achieved in order to get to the finish and then focus on getting from waypoint to waypoint, rather than rethinking the entire course. In computer terms this is a divide-and-conquer strategem, where one tries to solve a problem that has known start and end points by introducing a middle point and then solving the problem from the start to the middle. BTW, this also solves Zeno's paradox: "Why does the arrow reach its target if, at any point in its course, it has at least half the distance left to fly?" and the answer is "Because of the exit condition that prevents a stack overflow". Try to sell that one in a philosophy class, heh heh.


So why aren't chess AIs based on human thinking processes? Why don't they implement a divide and conquer solution for a game that always starts with a specific board position and ends in capturing a specific piece? Why do chess engines lower their "level" by sometimes randomly choosing a completely losing path instead of something that is plausible to choose, even if completely wrong objectively? How can MinMax be the best general algorithm for game AIs, when some of them have a branching factor that makes the use of the algorithm almost useless?

I obviously don't have the answers to these questions, but I may have an opportunity to explore them. Hopefully I will be less lazy than I usually am and invent something completely unscientific, but totally fun! Wish me luck!

and has 12 comments
I wanted to take an arbitrary XML format and turn it into C# classes. I even considered for a while to write my own IXmlSerializable implementation of the classes, but quickly gave up because of their large number and heavy imbrication. Before we proceed you should know that there are several ways in which to turn XML into C# classes. Here is a short list (google it to learn more):
  • In Visual Studio 2012, all you have to do is copy the XML, then go to Edit -> Paste Special -> Paste XML as classes. There is an option for pasting JSON there as well.
  • There is the xsd.exe option. This is usually shipped with the Windows SDK and you have to either add the folder to the PATH environment variable so that the utility works everywhere, or use the complete path (which depends on which version of SDK you have).
  • xsd2Code is an addon for Visual Studio which gives you an extra menu option when you right click an .xsd file in the Solution Explorer to transform it to classes
  • Other zillion custom made tools that transform the XML into whatever

Anyway, the way to turn this XML into classes manually (since I didn't like the output of any of the tools above and some were even crashing) is this:
  • Create a class that is decorated with the XmlRoot attribute. If the root element has a namespace, don't forget to specify the namespace as well. Example:
    [XmlRoot(ElementName = "RootElement", Namespace = "http://www.somesite.org/2005/someSchema", IsNullable = false)]
  • For each descendant element you create a class. You add a get/set property to the parent element class, then you decorate it with the XmlElement (or XmlAttribute, or XmlText, etc). Specify the ElementName as the exact name of the element in the source XML and the Namespace url if it is different from the namespace of the document root. Example:
    [XmlElement(ElementName = "Integer", Namespace = "http://www.somesite.org/2005/differentSchemaThanRoot")]
  • If there are supposed to be more children elements of the same type, just set the type of the property to an array or a List of the class type representing one element
  • Create an instance of an XmlSerializer using the type of the root element class as a parameter. Example:
    var serializer = new XmlSerializer(typeof(RootElementEntity));
  • Create an XmlSerializerNamespaces instance and add all the namespaces in the document to it. Example:
    var ns = new XmlSerializerNamespaces(); ns.Add("ss", "http://www.somesite.org/2005/someSchema"); ns.Add("ds", "http://www.somesite.org/2005/differentSchemaThanRoot");
  • Use the namespaces instance to serialize the class. Example: serializer.Serialize(stream, instance, ns);

The above technique serializes a RootElementEntity instance to something similar to:
<ss:RootElement xmlns:ss="http://www.somesite.org/2005/someSchema" xmlns:ds="http://www.somesite.org/2005/differentSchemaThanRoot">
<ds:Integer>10</ds:Integer>
</ss:RootElement>

Now, everything is almost good. The only problem I met doing this was trying to deserialize an XML containing xsi:type attributes. An exception of type InvalidOperationException was thrown with the message "The specified type was not recognized: name='TheType', namespace='http://www.somesite.org/2005/someschema', at " and then the XML element that caused the exception. (Note that this is an internal exception of the first InvalidOperationException thrown that just says there was an error in the XML)

I finally found the solution, even if it is not the most intuitive. You need to create a type that inherits from the type you want associated to the element. Then you need to decorate it (and the original element) with an XmlRoot attribute specifying the namespace (even if the namespace is the same as the one of the document root element). And then you need to decorate the base type with the XmlInclude attribute. Here is an example.

The XML:

<ss:RootElement xmlns:ss="http://www.somesite.org/2005/someSchema" xmlns:ds="http://www.somesite.org/2005/differentSchemaThanRoot">
<ds:Integer>10</ds:Integer>
<ss:MyType xsi:type="ss:TheType">10</ss:MyType>
</ss:RootElement>

You need to create the class for MyType then inherit TheType from it:
[XmlRoot(Namespace="http://www.somesite.org/2005/someSchema")]
[XmlInclude(typeof(TheType))]
public class MyTypeEntity {}

[XmlRoot(Namespace="http://www.somesite.org/2005/someSchema")]
public class TheType: MyTypeEntity {}
Removing any of these attributes makes the deserialization fail.

Hope this helps somebody.

It is a bit embarrassing not knowing this at my level of software development, but I was stunned to see other people, even more experienced than I, had the same lack of knowledge. Apparently Microsoft SQL Server converts empty or whitespace strings to default values when using CONVERT or CAST. So CONVERT(INT,''), equivalent to CAST('' as INT), equals 0. DATETIME conversion leads to a value of 1900-01-01. And so on. That means that a good practice for data conversion when you don't know what data you may be getting is to always turn whitespace to null before using CONVERT or CAST. Also, in related news, newline is NOT whitespace in T-SQL so LTRIM(CHAR(10)) and LTRIM(CHAR(13)) is not empty string!

Bottom line: instead of CONVERT(<type>,<unknown string value>) use the cumbersome CONVERT(<type>,CASE WHEN LTRIM(RTRIM(<unknown string value>))!='' THEN <unknown string value> END). Same with CAST.

Here is a table of conversions for some values converted to FLOAT:

ValueNormal CONVERTCumbersome CONVERTTRY_CONVERT
NULLNULLNULLNULL
'' (empty string)0NULL0
' ' (whitespace)0NULL0
'
' (whitespace and newlines)
Conversion errorConversion errorNULL
'123'123123123

You might think this is not such a big deal, but in Microsoft SQL 2012 they introduced TRY_CONVERT and the similar TRY_CAST, which return null if there is a conversion error. This means that for an incorrect string value the function would return null for most but empty string, where it would return the default value of the type chosen, thus resulting in an inconsistent behavior.

and has 0 comments
For a personal project of mine I needed to gather a lot of data and condense it into a newsletter. What I needed was to take information from selected blogs, google queries and various pages that I find and take only what was relevant into account. Great, I thought, I will make a software to help me do that. And now, proverbially, I have two problems.

The major issue is that after getting all the info I needed, I was stuck on reading thousands of web pages to get to the information I needed. I was practically spammed. The thing is that there aren't even so many stories, it's just the same content copied from news site to news site, changing only the basic structure of the text, maybe using other words or expanding and collapsing terms in and out of abbreviations and sometimes just pasting it exactly as it was in the source, but displayed in a different web page, with a different template.

So the challenge was to compare two or more web pages for the semantic similarity of the stories. While there is such theory as semantic text analysis, just google for semantic similarity and you will get mostly PDF academic white papers and software that is done in Python or some equally disgusting language used only in scientific circles. And while, true, I was intrigued and for a few days I entertained the idea of understanding all that and actually building a C# library up to the task, I did not have the time for it. Not to mention that the data file I was supposed to parse was growing day by day while I was dallying in arcane algorithms.

In conclusion I used a faster and more hackish way to the same end. Here is how I did it.



The first major hurdle was to clear the muck from the web page and get to the real information. A simple html node innerText would not do. I had to ignore not only HTML markup, but such lovely things as menus, ads, sidebars with blog information, etc. Luckily there is already a project that does that called Boilerpipe. And before you jump at me for linking to a Java project, there is also a C# port, which I had no difficulties to download and compile.

At the time of the writing, the project would not compile well because of its dependency to a Mono.Posix library. Fortunately the library was only used for two methods that were never used, so I just removed the reference and the methods and all was well.

So now I would mostly have the meaningful text of both web pages. I needed an algorithm to quickly determine their similarity. I skipped the semantic bit of the problem altogether (trying to detect synonyms or doing lexical parsing) and I resorted to String Kernels. Don't worry if you don't understand a lot of the Wikipedia page, I will explain how it works right away. My hypothesis was that even if they change some words, the basic structure of the text remains the same, so while I am trying to find the pages with the same basic meaning, I could find them by looking for pages with the same text structure.

In order to do that I created for each page a dictionary with string keys and integer values. The keys would be text n-grams from the page (all combinations of three characters that are digits and letters) and the values the count of those kernels in the Boilerpipe text. At first I also allowed spaces in the character list of kernels, but it only complicated the analysis.

To compare a page to others, I would take the keys in the kernel dictionary for my page and look for them in the dictionaries of other pages, then compute a distance out of the counts. And it worked! It's not always perfect, but sometimes I even get pages that have a different text altogether, but reference the same topic.

You might want to know what made me use 3-grams and not words. The explanation comes mostly from what I read first when I started to look for a solution, but also has some logic. If I would have used words, then abbreviations would have changed the meaning of the text completely. Also, I did not know how many words would have been in a few thousand web pages. Restricting the length to three characters gave me an upper limit for the memory used.


Conclusion: use the .Net port of Boilerpipe to extract text from the html, create a kernel dictionary for each page, then compute the vector distance between the dictionaries.

I also found a method to compare the dictionaries better. I make a general kernel dictionary (for all documents at once) and then the commonality of a bit of text is the number of times it appears divided by the total count of kernels. Or the number of documents in which it is found divided by the total number of documents. I chose commonality as the product of these two. Then, one computes the difference between kernel counts in two documents by dividing the squared difference for each kernel by its commonality and adding the result up. It works much better like this. Another side effect of this method is that one can compute how "interesting" a document is, by adding up the counts of all kernels divided by their commonality, then dividing that to the length of the text (or the total count of kernels). The higher the number, the less common its content would be.