Portable Game Notation and parsing it with regular expressions

Published Sep 25, 2011

Short version: here is the link to the uploaded .NET regular expression. (look in the comment for the updated version)

I noticed that the javascript code that I am using to parse PGN chess games and display it is rather slow and I wanted to create my own PGN parser, one that would be optimal in speed. "It should be easy", I thought, as I was imagining getting the BNF syntax for PGN, copy pasting it into a parser generator and effortlessly getting the Javascript parser that would spit out all secrets of the game. It wasn't easy.

First of all, the BNF notation for Portable Game Notation was not complete. Sure, text was used to explain the left overs, but there was no real information about it in any of the "official" PGN pages or Wikipedia. Software and chess related FTPs and websites seemed to be terrible obsolete or missing altogether.

Then there was the parser generator. Wikipedia tells me that ANTLR is pretty good, as it can spew Javascript code on the other end. I downloaded it (a .jar Java file - ugh!), ran it, pasted BNF into it... got a generic error. 10 minutes later I was learning that ANTLR does not support BNF, but only its own notation. Searches for tools that would do the conversion automatically led me to smartass RTFM people who explained how easy it is to do it manually. Maybe they should have done for me, then.

After all this (and many fruitless searches on Google) I decided to use regular expressions. After all, it might make a lot of sense to have a parser in a language like C#, but the difference in speed between a Javascript implementation and a native regular expression should be pretty large, no matter how much they optimize the engine. Ok, let's define the rules of a PGN file then.

In a PGN file, a game always starts with some tags, explaining what the event is, who played, when, etc. The format of a tag is [name "value"]. There are PGN files that do not have this marker, but then there wouldn't be more than one game inside. The regular expression for a tag is: (\[\s*(?<tagName>\w+)\s*"(?<tagValue>[^"]*)"\s*\]\s*)+. Don't be scared, it only means some empty space maybe, then a word, some empty space again, then a quoted string that does not contain quotes, then some empty space again, all in square brackets and maybe followed by more empty space, all of this appearing at least once.

So far so good, now comes the list of moves. The simplest possible move looks like 1. e4, so a move number and a move. But there are more things that can be added to a move. For starters, the move for black could be following next (1. e4 e5) or a bit after, maybe if there are commentaries or variations for the move of the white player (1... e5). The move itself has a variety of possible forms:

e4 - pawn moved to e4
Nf3 - knight moved to f3
Qxe5 - queen captured on e5
R6xf6 - the rook on the 6 rank captured on f6
Raa8 - The rook on file a moved to a8
Ka1xc2 - the knight at a1 captured on c2
f8=Q - pawn moved to f8 and promoted to queen
dxe8=B - pawn on the d file captured on e8 and promoted to bishop

There is more information about the moves. If you give check, you must end it with a + sign, if you mate you end with #, if the move is weird, special, very good, very bad, you can end it with stuff like !!, !?, ?, !, etc which are the PGN version of WTF?!. And if that is not enough, there are some numbers called NAG which are supposed to represent a numeric, language independent, status code. Also, the letters that represent the pieces are not language independent, so a French PGN might look completely different from an English one. So let's attempt a regular expression for the move only. I will not implement NAG or other pieces for non-English languages: (?:[PNBRQK]?[a-h]?[1-8]?x?[a-h][1-8](?:\=[PNBRQK])?|O(-?O){1,2})[\+#]?(\s*[\!\?]+)?). I know, scary. But it means a letter in the list PNBRQK, one for each possible type of chess piece, which may appear or it may not, then a letter between a and h, which would represent a file, then a number between 1 and 8 which would represent a rank. Both letter and number might not appear, since they represent hints on where the piece that moved was coming from. Then there is a possible letter x, indicating a capture, then, finally, the destination coordinates for the move. There follows an equal sign and another piece, in case of promotion. An astute reader might observe that this also matches a rook that promotes to something else, for example. This is not completely strict. If this long expression is not matched, maybe something that looks like OO, O-O, OOO or O-O-O could be matched, representing the two possible types of castling, when a rook and a king move at the same time around each other, provided neither had not moved yet. And to top it off, we allow for some empty space and the characters ! and ? in order to let chess annotators express their feelings.

It's not over yet. PGN notation allows for commentaries, which are bits of text inside curly brackets {what an incredibly bad move!} and also variations. The variations show possible outcomes from the main branch. They are lists of moves that are enclosed in round brackets. The branches can be multiple and they can branch themselves! Now, this is a problem, as regular expressions are not recursive. But we only need to match variations and then reparse them in code when found. So, let's attempt a regular expression. It is getting quite big already, so let's add some tokens that can represent already discussed bits. I will use a @ sign to enclose the tokens. Here we go:

@tags@ - we will use this as a marker for one or more tags
@move@ - we will use this as a marker for the complicated move syntax explained above
(?<moveNumber>\d+)(?<moveMarker>\.|\.{3})\s*(?<moveValue>@move@)(?:\s*(?<moveValue2>@move@))?\s* - the move number, 1 or 3 dots, some empty space, then a move. It can be followed directly by another move, for black. Lets call this @line@
(?:\{(?<varComment>[^\}]*?)\}\s*)? - this is a comment match, something enclosed in curly brackets; we'll call it @comment@
(?:@line@@variations@@comment@)* - wow, so simple! Multiple lines, each maybe followed by variations and a comment. This would be a @list@ of moves.
(?<endMarker>1\-?0|0\-?1|1/2\-?1/2|\*)?\s* - this is the end marker of a game. It should be there, but in some cases it is not. It shows the final score or an unfinished match. We'll call it @ender@
(?<pgnGame>\s*@tags@@list@@ender@) - The final tokenised regular expression, containing an entire PGN game.

But it is not over yet. Remember @variations@ ? We did not define it and with good reason. A good approximation would be (?:\((?<variation>.*)\)\s*)*, which defines something enclosed in parenthesis. But it would not work well. Regular expressions are greedy by default, so it would just get the first round bracket and everything till the last found in the file! Using the non greedy marker ? would not work either, as the match will stop after the first closing bracket inside a variation. Comments might contain parenthesis characters as well.

The only solution is to better match a variation so that some sort of syntax checking is being performed. We know that a variation contains a list of moves, so we can use that, by defining @variations@ as (?:\((?<variation>@list@)\)\s*)*. @list@ already contains @variations@, though, so we can do this a number of times, to the maximum supported branch depth, then replace the final variation with the generic "everything goes" approximation from above. When we read the results of the match, we just take the variation matches and reparse them with the list subexpression, programatically, and check extra syntax features, like the number of moves being subsequent.

It is no wonder that at the Regular Expressions Library site there was no expression for PGN. I made the effort to upload it, maybe other people refine it and make it even better. Here is the link to the uploaded regular expression. The complete regular expression is here:

(?<pgnGame>\s*(?:\[\s*(?<tagName>\w+)\s*"(?<tagValue>[^"]*)"\s*\]\s*)+(?:(?<moveNumber>\d+)(?<moveMarker>\.|\.{3})\s*(?<moveValue>(?:[PNBRQK]?[a-h]?[1-8]?x?[a-h][1-8](?:\=[PNBRQK])?|O(-?O){1,2})[\+#]?(\s*[\!\?]+)?)(?:\s*(?<moveValue2>(?:[PNBRQK]?[a-h]?[1-8]?x?[a-h][1-8](?:\=[PNBRQK])?|O(-?O){1,2})[\+#]?(\s*[\!\?]+)?))?\s*(?:\(\s*(?<variation>(?:(?<varMoveNumber>\d+)(?<varMoveMarker>\.|\.{3})\s*(?<varMoveValue>(?:[PNBRQK]?[a-h]?[1-8]?x?[a-h][1-8](?:\=[PNBRQK])?|O(-?O){1,2})[\+#]?(\s*[\!\?]+)?)(?:\s*(?<varMoveValue2>(?:[PNBRQK]?[a-h]?[1-8]?x?[a-h][1-8](?:\=[PNBRQK])?|O(-?O){1,2})[\+#]?(\s*[\!\?]+)?))?\s*(?:\((?<varVariation>.*)\)\s*)?(?:\{(?<varComment>[^\}]*?)\}\s*)?)*)\s*\)\s*)*(?:\{(?<comment>[^\}]*?)\}\s*)?)*(?<endMarker>1\-?0|0\-?1|1/2\-?1/2|\*)?\s*)

Note: the flavour of the regular expression above is .Net. Javascript does not support named tags, the things between the angle brackets, so if you want to make it work for js, remove ?<name> constructs from it.

Now to work on the actual javascript (ouch!)

Update: I took my glorious regular expression and used it in a javascript code only to find out that groups in Javascript do not act like collections of found items, but only the last match. In other words, if you match 'abc' with (.)* (match as many characters in a row, and capture each character in part) you will get an array that contains 'abc' as the first item and 'c' as the second. That's insane!

Update: As per Matty's suggestion, I've added the less used RxQ move syntax (I do have a hunch that it is not complete, for example stuff like RxN2, RxNa or RxNa2 might also be accepted, but they are not implemented in the regex). I also removed the need for at least one PGN tag. To avoid false positives you might still want to use the + versus the * notation after the tagName/tagValue construct. The final version is here:

(?<pgnGame>\s*(?:\[\s*(?<tagName>\w+)\s*"(?<tagValue>[^"]*)"\s*\]\s*)*(?:(?<moveNumber>\d+)(?<moveMarker>\.|\.{3})\s*(?<moveValue>(?:[PNBRQK]?[a-h]?[1-8]?x?(?:[a-h][1-8]|[NBRQK])(?:\=[PNBRQK])?|O(-?O){1,2})[\+#]?(\s*[\!\?]+)?)(?:\s*(?<moveValue2>(?:[PNBRQK]?[a-h]?[1-8]?x?(?:[a-h][1-8]|[NBRQK])(?:\=[PNBRQK])?|O(-?O){1,2})[\+#]?(\s*[\!\?]+)?))?\s*(?:\(\s*(?<variation>(?:(?<varMoveNumber>\d+)(?<varMoveMarker>\.|\.{3})\s*(?<varMoveValue>(?:[PNBRQK]?[a-h]?[1-8]?x?(?:[a-h][1-8]|[NBRQK])(?:\=[PNBRQK])?|O(-?O){1,2})[\+#]?(\s*[\!\?]+)?)(?:\s*(?<varMoveValue2>(?:[PNBRQK]?[a-h]?[1-8]?x?(?:[a-h][1-8]|[NBRQK])(?:\=[PNBRQK])?|O(-?O){1,2})[\+#]?(\s*[\!\?]+)?))?\s*(?:\((?<varVariation>.*)\)\s*)?(?:\{(?<varComment>[^\}]*?)\}\s*)?)*)\s*\)\s*)*(?:\{(?<comment>[^\}]*?)\}\s*)?)*(?<endMarker>1\-?0|0\-?1|1/2\-?1/2|\*)?\s*)

The Regexlib version has also been updated in a comment (I don't know how - or if it is possible - to edit the original).

T-SQL: Determining the byte size of a text in a specific encoding

Published Sep 20, 2011

Posted in
database
programming

and has 0 comments

There was this FTP surrogate program that used SQL as a filesystem. I needed to store the size of the file (which was an HTML template and was stored as NTEXT) in the row where the content was stored. The problem is that the size of a text in a Microsoft SQL Server NTEXT column is about two bytes per character, while the actual size of the content, stored web like in UTF8, was different to almost half.

I thought that there must be an easy way to compute it, trying to cast the string to TEXT then using LEN, trying DATALENGTH, BINARY, etc. Nothing worked. In the end I made my own function, because the size of a string in UTF8 is documented on the Wikipedia page of that encoding: 1 byte for ASCII characters (character code<128), 2 bytes for less than 2048, 3 for 65536 and 4 for the rest. So here is the sql function that computes the size in UTF8:


CREATE FUNCTION [fn_UTF8Size] 
(
  @text NVARCHAR(max)              
)
RETURNS INT
WITH SCHEMABINDING
AS

BEGIN
DECLARE @i INT=1
DECLARE @size INT=0
DECLARE @val INT
WHILE (@i<=LEN(@text))
BEGIN

SET @val=UNICODE(SUBSTRING(@text,@i,1))

SET @size=@size+
CASE
  WHEN @val<128 THEN 1
  WHEN @val<2048 THEN 2
  WHEN @val<65536 THEN 3
  ELSE 4
END
SET @i=@i+1

END

RETURN @size
END

A similar approach would work for any other encoding.

Paragraphs containing block elements in XHTML

Published Sep 20, 2011

and has 1 comment

I was updating the content of a span element via AJAX when I noticed that the content was duplicated. It was no rocket science: get element, replace content with the same thing that was rendered when the page was first loaded, just with updated values. How could it be duplicated?

Investigating the DOM in the browser (any browser, btw) I've noticed something strange: When the page was first loaded, the content was next to the container element, not inside it. I've looked at the page source, only to see that it was, by all counts, correct. It looked something like this:

<p><div><table>...</table></div></p>

. The DOM would show the div inside the paragraph element and the table as the sibling of the paragraph element. The behavior was fixed if the page DOCTYPE was changed from XHTML into something else.

It appears that block elements should not be inside layout elements, like p. The browsers are attempting to "fix" this problem and so they change the DOM, assuming that if a table starts inside the paragraph, then you must have forgotten to close the paragraph. If I was adding it via ajax, the browser did not seem to want to fix the content in any way, as I was manipulating the DOM directly and there was no parsing phase.

State of the Union

Published Sep 17, 2011

and has 0 comments

I have been reviewing my blog posts for the last few months and I noticed a troubling trend: a lot more social commentary and hobby related stuff than actual tech work. Check out this statistic of posts in the last three months:

TV and Movie: 5
Books: 6
Personal or hobby: 6
Social commentary: 1
Tech: 8

8 is marginally more than 6, but split them between misc and programming and you get 18 misc for 10 programming (with some overlapping). And consider that two of the tech posts were attempts to fix something that did not work so well.

What does this mean? Do I not learn new stuff at work? Am I not interested in tech work anymore? Am I working too much and not having time to blog? Well, it is a bit of all. I am interested in tech work, but right now I am fighting to adapt to the new job. I am learning new stuff, but that is mostly office related than new frontiers of programming. And I am a bit tired as well.

I have been thinking of cool tech stuff to share with you at least in this post, but I could find none. I am reading a lot of blogs with new information about stuff ranging from Windows 8, .Net 5, the future of C# and Visual Studio to videos of Vesta, things that verge on proving the dark matter model is wrong and amazing BIOS rootkits, but that is not what I am doing.

So let me summarize the technical state of my work so far:

Scrum - my workplace uses Scrum as a development practice and invests a lot in maintaining the quality of its implementation. I've learned a lot about the advantages, but also the disadvantages of the practice (there is nothing as annoying as an Outlook alert that you need to do the daily scrum meeting when you are concentrated on a task)
Visual Basic - as the original application that was bought by my employing company 5 years ago was written in Visual Basic, large portions of it are still VB. That only proves my point that refactoring code should be a priority, not a nice to have option. I wonder how many developing hours, research hours and hair roots could have been saved if the company would have invested in moving the application to a readable and canonical code form. I also wonder if the guy that invented Visual Basic is now burning in hell, as so many devs with whom I've talked about VB seem to want.
Visual Basic - it just deserves two bullet points, for the bullet reason only at least. Also, try converting C# generic and lambda expression code to Visual Basic. Hilarious!
Computing power - I am now working on a laptop that has a Quad Core I7 processor, 8Gb of RAM and a Solid State Drive. And I still want it 10 times faster. It seems to me that computing power is only keeping up with the size of the software projects and the complexity of the tools used to develop them, so that the total compile time for a project remains constant. Also, if for some reason the company issues you with a computer powerful enough to break the constant, they also need to enforce drive encryption as to compensate.
Continuous Integration and Unit Testing - it gives one a good feeling of comfort to know that after "it works on my machine", the source control server can compile, test and run the software successfully (while you are working at something else, no less).
Software Patterns - there are people who can think and visualize software patterns. They can architect any piece of code and make it really neat. However, it now seems to me that an over-architected software is just as hard to read and follow as a non-architected one. Fortunately for me, my colleagues are more the smart "let's make it work" type

That is about it. No magical silver bullet practices, no amazing software, no technological edge code, just plain software shop work.

Ghost in the Wires, by Kevin Mitnick

Published Aug 30, 2011

and has 2 comments

When I was a child I watched with huge eyes movies like Hackers, enjoying the shenanigans of computer rebels fighting the stupid law enforcement and the "evil" hackers. Of course, there was also Angelina Jolie. Even then I knew that my pleasure was a guilty one: no way could the police be that stupid, no way it would be that easy to penetrate all kinds of systems and produce effects so flashy. A while after that I watched Skeet Ulrich in the movie Operation Takedown, which was a more realistic hacker movie (and one I think Skeet did a great job in). It depicted how Kevin Mitnick has been apprehended by the authorities. I really loved that movie, although it had a lot of eye rolling moments.

Fast forward to now, reading Ghost in the Wires, Kevin Mitnick's book about himself, practically a hacking autobiography, and I loved this book every bit as much as I liked those movies as a kid. Not only I couldn't leave the book out of my hands once I started reading it, but was shocked to see that reality is not that far away from what was depicted in hacking movies. It was also interesting to read how the script of Operation Takedown came to be, which Kevin considers defamatory and mostly untrue.

Long story short, Mitnick is a smart kid with a great memory, an absent father and no real friends. He starts dabbling with radio and telephones and manages to get access to phone systems way before computers where personal or connected to each other. He's a kid, though, and he gets caught a few times. Nobody seems to understand he does it just for the fun of it and he can't seem to understand why nobody gets him. In the end, pushed by the desire to challenge himself, but also by authorities baiting him all the time, he becomes a life long hacker and eventually gets caught.

A shocking part of the book is how easy it is to penetrate any system, not by whatever technical wizardry, but by simply tricking people into giving you information and access. Called "social engineering" it was Mitnick's strongest point and at several times in the book, when the technology would not allow him to enter one system or another, he would just abandon the tech stuff and go with tricking people. Already having knowledge on how to manipulate phone systems made that a lot easier, as well.

Another, less shocking, but utterly disappointing part is about authorities. Just as they are now about file sharing and whatever "crisis" they are in, law enforcement agencies are basing their entire existence on pure power of coercion, ignoring the rules that they themselves are enforcing and being motivated only by keeping that power in their hands. Technical morons, they only seem to be getting into the action when their pride is affected. In this book Kevin Mitnick dances around security personnel, local cops, FBI, NSA several steps ahead of them, but they only seem to really mind when newspapers start publishing articles that makes law enforcement look bad. And once they have him, caught only with the help of other hackers, they are using all the dirty tricks in the book to bring Mitnick to his knees. Nothing has changed from then to now, just look at cases like Gary McKinnon's. Intimidation is a bully's greatest strength. That's sad.

I would have to say that the most unexpected thing was the tone of the book, which is almost exuberant. Mitnick has not become a bitter and paranoid man after countless personal betrayals and authority abuse and he is not angry at all. If anything, the guy is happy to have lived as the lead actor in the "Myth of Kevin Mitnick", which has grown way bigger than the real person. There is a scene when he gets outside of a building and there are hundreds of fans there, shouting, and he looks behind to see if there is a celebrity around.

Bottom line: this is a book you can't miss. It is easy to read to the point of instantly addictive, it is well written with enough juicy technical details to keep one interested and, most of all, makes you feel good, even in the horrible moments of his detention. It makes one wonder, did Mitnick socially engineer himself into remaining an open and cool guy in the face of adversity? Or is it he had this strength all along and that is his most powerful "magic"?

Weird 'DataTableReader is invalid for current DataTable' exception

Published Aug 17, 2011

and has 4 comments

I was debugging an application when I've noticed that there was an exception thrown in one of the legacy DAL modules. A method was using regular ADO.Net to run a stored procedure, fill a DataSet, then proceeded on getting an IDataReader for the set, using the CreateDataReader method. On reader.Read() the exception DataTableReader is invalid for current DataTable 'Table' was thrown.

I've investigated the issue only to notice that the DataSet was correctly retrieved from the database, the only problem came when creating the reader. Any meaningful property threw this error. I've found a thread that discussed this problem here, but the answer was not there. Instead, I read an obscure line somewhere that said this exception is thrown when the DataSet has changes and tried that solution: before running CreateDataReader, I ran DataSet.AcceptChanges. And it worked!

The strange part comes now: I did the AcceptChanges bit in the Watch window during debug, not as a permanent change to the code. From then on, the code worked, no matter how many times I ran iisreset or restarted the browser. I've added the solution in the code for good measure, but I am still not sure if this is the solution or simply some fluke of the universe. One possible answer is the "race condition" described in this discussion, which also suggests this happens in debug mode only. Strange, innit?

Update: AcceptChanges did not solve the issue on the production server. I am still investigating, but if you know what this is about, please share :)

A bit of randomness

Published Aug 13, 2011

and has 0 comments

A colleague of mine asked a question that seemed trivial, but then it revealed interesting layers of complexity: how would you build an algorithm for a random number in any integer interval assuming that you already have a function that returns a random binary bit? The distribution of the bit is perfectly random and so it should be that of your function.

My first attempt was to divide the interval in two, then choose the first or second half based on the random bit function. This worked perfectly for intervals of even length, but there were issues with odd sized intervals. Let's take the most basic version there is: we want a random number between 7 and 9. The interval has a size of 3, which is not divisible by 2.

One solution is to split it in half anyway, ignoring one number, then use the random bit function one more time to determine in which half the remaining number should be added. For example the random bit yields 1, so we add the odd number to the second half: 7,8,9 -> 7 and 8,9 . Now the random bit is 0, thus choosing the first half, which is 7. This sounds good enough, let's see how this works:

Possible random bit results:

0 (7,8|9)
- 0 (7|8)
  - 0 (=7)
  - 1 (=8)
- 1 (=9)
1 (7|8,9)
- 0 (=7)
- 1 (8|9)
  - 0 (=8)
  - 1 (=9)

The interesting part is coming when deciding (pun not intended) what type of probability we would consider. From the tree above, if we take the terminal leafs and count them, there are exactly 6. Each of the numbers in the interval appear exactly twice. There is a perfectly balanced probability that a number will appear in the leaf nodes. But if we decide that each random bit run divides the total probability by two, then we have a 50% chance for 0 or 1 and thus the probability that 7 would be chosen is 1/4 + 1/8 (3/8), the same for 9, but then 8 would have a 2/8 probability to be chosen, so not so perfect.

What is the correct way to compute it? As I see it, the terminal graph leaf way is the external method, the algorithm can end in just 6 possible states and an external observer would not care about the inner workings of the algorithm; the second is an internal view of the use of the "coin toss" function inside the algorithm. The methods could be reconciled by continuing the algorithm even when the function has terminated, until all the possible paths have the same length, something akin to splitting 7 in two 7 nodes, for example, so that the probability would be computed between all the 2 to the power of the maximum tree height options. If the random bit yielded 0, then 0, we still toss the coin to get 000 and 001; now there are 8 terminal nodes and they are divided in 3,2,3 nodes per numbers in the interval. But if we force this method, then we will never get a result. No power of two can be equally divided by 3.

Then I came with another algorithm. What if we could divide even an odd number in two, by multiplying it with two? So instead of solving for 7,8,9 what if we could solve it for 7,7,8,8,9,9 ? Now things become interesting because even for a small finite interval length like 3, the algorithm does not have a deterministic running length. Let's run it again:

Possible random bit results:

0 (7,7,8)
- 0 (7,7,7)
- 1 (7,8,8)
  - 0 (7,7,8)... and so on
  - 1 (8,8,8)
1 (8,9,9)
- 0 (8,8,9)
  - 0 (8,8,8)
  - 1 (8,9,9)... and so on
- 1 (9,9,9)

As you can see, the tree looks similar, but the algorithm never truly completes. There are always exactly two possibilities in each step that the algorithm will continue. Now, the algorithm does end most of the time, with a probability to end increasing exponentially with each step, but its maximum theoretical length is infinity. We are getting into Cantoresque sets of infinite numbers and we want to calculate what is the probability that a random infinite number would be part of one set or another. Ugh!

And even so, for the small example above, it does seem that the probability for each number is 25%, while there is another 25% chance to continue the algorithm, but if you look at the previous stage you have a 25% chance for 7 or 9, but no chance for 8 at all. If we arbitrarily stop in the middle of the algorithm, not only does it invalidate the result, but also makes no sense to compute any probability.

You can look at it another way: this new algorithm is splitting probability in three equal integer parts, then it throws the rest into the future. It is a funny way of using time and space equivalence, as we are trading interval space for time. (See the third and last algorithm in the post)

My conclusion is that the internal method of computing the probability of the result was flawed. As a black box operator of the algorithm I don't really care how it spews its output, only that it does so with an as perfect probability as possible (pun, again, not intended). That means that if I use the algorithm two times there is no way it can output equals amounts of three values. The probability can't be computed like that. If we use it a million times we would expect a rough 333333 times of each value, but still one would be off one side or another. So the two algorithms are just as good.

Also, some people might ask: how can you possible use the second algorithm for large intervals. You are not going to work with arrays of millions of items for million size intervals, are you? In fact, you only need five values for the algorithm: the limits of the interval (a and b), the amount of lower edge values (p), the amount for the higher edge (r), then the amount for any number in between (q). Example: 7778888888899999 a=7, b=9, p=3, q=8, r=5 . You split this in two and (for the coin toss of 0) you get 7778888 a=7, b=8, p=3, q=1 (don't care at this point), r=4. The next step of the algorithm you multiply by two p,q and r and you go on until a=b.

You can consider a simpler version though: there are three values in the interval so we need at least a number equal or bigger than three that is also a power of two. That means four, two coin tosses. If the coin toss is 00, the result is 7; if the coin toss is 01, the result is 8; for 10, the result is 9. What happens when you get 11? Well, you run the algorithm again.

Sending an array to a stored procedure in Sql Server 2008

Published Aug 11, 2011

and has 0 comments

I needed to pass an array of IDs to a stored procedure on SQL Server 2008. This version of the server supports user defined table types and a way to access it from .Net, of course. A comprehensive resource for sending arrays to any version of SQL Server can be found here.

Long story short, for 2008 you first define a user table type that has a single int column (we are talking about an array of integers here, obviously), then a stored procedure that takes a parameter of that type. A way to send the array from .Net code is detailed here. As you can see, you create an array of something called SqlMetaData, holding the information of each column as defined in the user defined type, then you use an SqlParameter of SqlDbType Structured and with the TypeName the name of the user defined table in SQL Server. The parameter will have a list of SqlDataRecord instances that have the integer values in their first columns. Yes, there is an even longer story and I consider this short :-P

All nice and easy, but there is a caveat, something that is not immediately obvious from the code. The column metadata is set as a property value for any of the records that are added to the sql parameter value list. What if the list is empty? In this case it appears that there is a bug somewhere. The stored procedure fails, I guess because it does not receive the structure of the user defined table declared in the metadata and cannot map it to the user defined type.

A solution for this is to add a dummy SqlDataRecord with no values and then, in the stored procedure, check for NULL. A very ugly solution. The solution on Erland Sommarskog's blog did not say anything about this specifically, but I did find this: There are a few peculiarities, though. This does not work:

EXEC get_product_names NULL

but results in this error message:

Operand type clash: void type is incompatible with integer_list_tbltype

It is quite logical when you think of it: NULL is a scalar value, and not a table value. But what do you think about this:

EXEC get_product_names

You may expect this to result in an error about missing parameters, but instead this runs and produces an empty result set!. Therefore the solution I used was to check in code if the .Net list of integers was empty and, in that case, do not send a parameter to the stored procedure. And it worked.

Computer vs. Human in chess

Published Aug 6, 2011

and has 0 comments

Ok, I am cheating now. I was feeling bad for not playing chess lately (or playing badly when I had other stuff to do, generating even more guilt) and having nothing to blog about except maybe books and also thinking about all the other directions of the blog that I failed to cover: programming, music, tech news.

So I bring you Brute force or intelligence? The slow rise of computer chess, which is an article about chess, it is from Ars Technica (tech news) and it involves some notions of programming. All I need for this to be complete is music!

Seriously now, I went to a friend's last night and played a bit of chess. We were both a little tired and drunk, so we played chess "for fun" (which translates to incredibly bad), but it really felt fun as opposed to playing a computer at a very low level. Why is that? I believe it is all about prioritization.

When a human plays, he is trying to use the principles of chess, but he doesn't have the time or mental resources to take each one and analyse each piece or position. Humans do use subconscious mechanisms to quickly scan a table, but that only comes with a lot of chess training. So basically, what a beginner human player is left with is finding a strategy that would quickly and (preferably) forcibly win the game. That means that we use something akin with the "Type B" algorithm from the article above. But it's not quite it, because it is a bit of everything, something that is traditionally hard to implement in a computer program (and that has more to do with the psychology of programming engineers than with a specific level of difficulty). Basically we look at the pieces, prioritised by their power and reach as well as their position relative to an area of attack or defence. That is why we don't see the queen or bishop in the corner of the table, because, looking in ever wider circles around the area we are focused on, we suddenly stop and start doing something else. Compare that with a computer which can take the measly 32 pieces on the board and computer in a few fractions of a second all their possible moves and the resulting board position.

Then, when we see a possible good move, we take it forward as many steps as we can. Does a chess beginner do a comprehensive tree of all possible moves in that scenario? Of course not. Not only we do not see all (or most) of the moves, but when we see a possibility for the opponent to play a counter move, we quickly analyse the likelihood that the other guy would see it and sometimes we even gamble that they won't do it, just because we wish they didn't. This is also psychological: the gambler way of thinking has been documented for a while, they are motivated by loss which gives them more of an adrenaline rush than winning or that makes winning ever sweeter; also the guy we play with is probably our friend and we partly root for the guy as well. Program that into a computer! I've had games where I took huge risks on the hope that my friend would a) not see the move, which would make me look good when playing a cool game and b) that he would see the move, making his game look cool, thus making the entire session interesting.

Back to programming, I think that the easiest way of implementing this kind of bad human play in a computer game is to take a normal computer algorithm for playing chess, like mini-max, then program a sort of Alzheimer routine, that would remove bits of its reasoning based on a probability computed from the following factors: proximity of pieces to a region of interest (which would also have to be defined, but let's just assume it would be the average of positions of the pieces involved in the current line of thought), the artistic value of a line of thought (which would be defined either by massive sacrifices for important gains, or by how severely we limit the opponent options - in other words: power), the probability that the opponent would see a move (computed based on current history of play) and also by the artistic value of the entire game, as described in the last paragraph.

In other words, what I am proposing here is that we have a perfect algorithm for playing chess, one that is limited by computing power alone. What we don't have is a good algorithm for bad play, for fun play. Most computer programs I've seen, including ChessMaster, which boasts with its ability to simulate human players of varying abilities, have incredibly stupid ways of limiting performance. For example: a knight wants to attack f7, the black soft spot; it has plans to move a bishop there as well. I move a pawn to prevent the bishop from attacking that spot and the computer takes with the knight, sacrificing a minor piece for a pawn and my king's ability to castle. Or a rook attacks a knight. It then takes the knight, even if defended. In other words, random, pointless moves. Every human move is purposeful, even if the purpose if flawed by bad judgement. Random moves won't do, they have to be moves that follow a plan, no matter how bad that plan is. We need a perfect algorithm for throttling the playing chess level. We need to look at human bad games, make their own chess database, extract rules for bad play and implement this into computers.

Changing jobs

Published Jul 26, 2011

and has 3 comments

I guess it is finally official: I am now a corporate employee. While the previous company I worked with was nice in terms of the people there and the technology used, I got bored. I blame myself for getting depressed when assigned disconnected UI tasks and when singled out socially. It shouldn't have mattered. Surely I could have worked on overcoming adversity and improving my development methods, no matter how boring the task at hand.

However, bored I did get and when a big corporate company approached me with a job offer, I was intrigued. This is a long story, though, because I passed their phone screening, their 6 hour long technical interview and got the approval of the top brass yet in another interview, all some time at the end of March. This coincided with my birthday so I thought it was like a present to myself: an opportunity to learn new things, work in an environment I was scared of, but which was different and exciting, not the mention better payroll, although that didn't matter that much.

So, why am I writing this blog entry now, at the end of July? Because I only got hired two days ago. Budgetary strategy, corporate decisional speed and pure bad luck (I hope) pushed the employment date for four stressful and uncertain months. And I am not even fully employed, I am a contractor with an intermediary for the time being.

I can't tell you yet how things truly are in the new company. People are certainly more professional and yet relaxed, not at all like the stick-in-the-ass image I had (well, most of them). Frankly, these people are more geek and less social monkey than some of the juniors at my last job, which is great. On the other hand, until I start actual work (which will take another two weeks of gruelling meetings and annoying bureaucracy) I will not know how (and if) this company gets anything done.

Certainly, a quad-core laptop with 8Gb of RAM and SSD harddrive will decrease developing time (I used to watch movies and read books while compiling projects at the old job). They also seem very communicative (to the point of never stopping from talking about a project), which is something I am less used to and I welcome gladly. They encourage and help with personal development and good development techniques, like TDD and a commitment to Scrum. And if you don't know something, people are not sneering, but offering to help. So far, I can't complain (and you know me, I am so good at it).

I will be working on an ASP.Net CRM project, something evolving from an older VB ASP.Net 1.0 thing to a C# ASP.Net MVC monster. Hopefully, this will reignite my passion for development, rather than reassert my disgust with web work. So you will see Javascript and ASP.Net posts again soon and not so much WPF. Too bad, I really liked that particular technology.

So, wish me luck!

Loading a Javascript external script conditionally and programatically from javascript

Published Jul 26, 2011

and has 0 comments

I was working on the chess board support on my blog, involving a third party javascript script. The idea was that, if any element on the page is of class pgn, then the parser should try to read the contents and transform it into a chess board.

Traditionally, what you would do is:

load the external scripts
load the external css files
run a small initialization script that finds all the pgn elements and applies the transformation on them

. Now, imagine that I am doing several processes like this. On some blog posts I will talk about chess, but on other I would display some chart or have extra functionality. Wouldn't it be nice if I could only load the external scripts when I actually need them?

In order to do that, I must reverse the order of the steps. If I find pgn elements, I load external scripts. When loaded, I execute the initialization script. Long story short, here is a javascript function to load an external script (or several) and then execute a function when they are loaded:


function loadScript(src,onload) {
  if (src&&typeof(src)!='string'&&src.length) {
    //in case the first parameter is a list of items
    var loadedScripts=0;
    // execute the load function only if all the scripts have finished loading
    var increment=onload
      ?function() {
        loadedScripts++;
        if (loadedScripts==src.length) {
          onload(null);
        }
      }
      :null;
    for (var i=0; i<src.length; i++) {
      loadScript(src[i],increment);
    }
    return;
  }
  var script=document.createElement('script');
  script.type='text/javascript';
  // HTML5 only, but it could work
  script.async=true;
  script.src=src;
  if (onload) {
  if (typeof(script.onload)=='undefined'&&typeof(script.readyState)!='undefined') {
  //IE
    script.onreadystatechange=function() {
      if(script.readyState=='complete'||script.readyState=='loaded') {
        onload(script);
      }
    };
  } else {
    //not IE
    script.onload=function() {
      onload(script);
    };
  }
  }
  document.body.appendChild(script);
}

I hope it helps.

WPF UserControl .cs file compilation error

Published Jun 22, 2011

and has 0 comments

I've met a very interesting WPF bug today, something that is hard to explain or reproduce, but might give terrible headaches if you don't know its source.

I had a WPF UserControl, with its xaml and cs files. Now, I know that in MVVM I shouldn't really use those much, but it was out of my control. The control had a section of resources (UserControl.Resources) in which there was a ResourceDictionary with some stuff in it. Considering that I'd removed all the merged dictionaries from this, I thought that I had no need of the dictionary tags, after all the Resources property of an element is already a ResourceDictionary. So it was something like this:


<UserControl ... >
  <UserControl.Resources>
    <!-- <ResourceDictionary> with these tags commented the error occurs -->
      ... stuff ...
    <!-- </ResourceDictionary> -->
  </UserControl.Resources>
</UserControl>

The error itself is that, during compilation, the partial user control class defines in the code behind doesn't seem to find things from the xaml. Probably, the compiler fails building the xaml into a class, but fails silently, while the codebehind is completely disconnected from the xaml because it is the only partial file for that class name.

By selectively removing items in the resources I've narrowed it down to one of the converters. It was creating using the MarkupExtension trick, but it was also declared as a resource for some reason. I do not see why that should matter, but still.

Bottom line: when the partial codebehind class for a WPF user control (or maybe for windows as well) fails to connect to the xaml, it means it silently fails the compilation of the XAML and you should try checking the resources of the elements therein.

Pro ASP.Net MVC Framework by Steven Sanderson

Published Jun 22, 2011

and has 0 comments

I've finally finished reading Pro ASP.Net MVC Framework by Steven Sanderson. The book is slightly dated, since it discusses the technology used in Visual Studio 2008 and without any mention of the new Razor engine, but these are details that are not important to the content of the book anyway. I can say that it is a very nice book and it was worth reading, especially the first part.

There are two parts to this, the first being a TDD ASP.Net MVC web shop application built step by step and explained line by line. It goes through some Domain Driven Design concepts as well, it does unit testing and mocking, even shows off a little dependency injection via Castle Windsor. What I liked most, though, is how painstakingly thorough Sanderson was explaining every single detail. He didn't assume anything as he documented every step of the way, down to what lambda expressions are and what .Net features he was using.

The second part of the book is a little less readable, as it goes through the classes and features of ASP.Net MVC, complete with methods, properties and small samples. I highly recommend reading this part while actually experimenting with the framework on the computer. Even if you do not, this part of the book remains a very valuable reference for when you do. In this section of the book you can learn about data entry, Ajax and partial updates, application security and deployment, even how to mix classic ASP.Net with MVC, though not really recommended.

The bottom line is that Pro ASP.Net MVC Framework is a must read for a developer learning ASP.Net MVC. There is an updated version of the book for VS2010 and .Net 4 that I think that I will also read (the book was so good). Here is the link for Pro ASP.NET MVC 2 Framework.

ContentControl and ContentPresenter do not propagate their DataContext to their Content

Published Jun 21, 2011

and has 3 comments

A quick post here about using a ContentPresenter (or a ContentControl which uses a ContentPresenter in its template) with its Content property. The intended usage of ContentPresenter is to set the Content to some binding to a data object, then control the element tree via the ContentTemplate property. That may lead to a counterintuitive situation when you want to specify some UI element content and then use bindings in that content. Let's take an example:


<!-- 
This ContentControl has a MainViewModel class as a DataContext.
The MainViewModel class exposes a MyButtonCommand property.
-->
<ContentControl>
  <ContentControl.Content>
    <Button Command="{Binding MyButtonCommand}">Press me!</Button>
  </ContentControl.Content>
</ContentControl>

You may expect to press the button and execute the command, but it doesn't work. In fact, the binding on the Command property will fail.

Here is a working example:


<!-- 
This ContentControl has a MainViewModel class as a DataContext.
The MainViewModel class exposes a MyButtonCommand property.
-->
<ContentControl Content="{Binding MyButtonCommand}">
  <ContentControl.ContentTemplate>
    <DataTemplate>
      <Button Command="{Binding}">Press me!</Button>
    </DataTemplate>
  </ContentControl.ContentTemplate>
</ContentControl>

I realize this is not what most of you have in mind when using a ContentControl. Another solution is to use the Content as in the first example, but add an explicit DataContext property to it before using any binding, something like this:


<!-- 
This ContentControl has a MainViewModel class as a DataContext.
The MainViewModel class exposes a MyButtonCommand property.
-->
<ContentControl>
  <ContentControl.Content>
    <DataTemplate>
      <Button 
         DataContext="{Binding DataContext,RelativeSource={RelativeSource AncestorType={x:Type ContentControl}}}"
         Command="{Binding MyButtonCommand}">Press me!</Button>
    </DataTemplate>
  </ContentControl.Content>
</ContentControl>

In this case, though, you specify the DataContext as an ugly binding and, worst of all, you cannot set it via the ContentControl, but you need to access the actual content.

Perhaps another solution, one that would involve a custom DataTemplateSelector on the ContentControl would work, but right now I have no perfectly satisfactory solution.

WPF Hyperlinks and the IsEnabled mystery

Published Jun 18, 2011

and has 3 comments

A colleague of mine started using a control I made in which there was a Hyperlink. Well, the purpose of it was not complex, I just needed a text that can be clicked. A Hyperlink sounded like the only solution out of the box, since there is no LinkButton control in the standard WPF controls. That aside, after I finish writing this blog post, I will write myself a LinkButton control to use in these situations instead, since Hyperlink seems to have several design flaws. It's not even a control, but a flow element.

What my colleague reported is that the Hyperlink would not get enabled in certain situations and we've come to the conclusion that after the initialization of the control, the IsEnabled property on the Hyperlink does not change when an ancestor control changes its enabledness. The only way to force it is to actually bind it to an ancestor IsEnabled.

Here is the scenario: You place several items in a XAML file. They are a text in a Hyperlink, in a TextBlock (since Hyperlinks cannot be directly part of a Panel, first design flaw), in a StackPanel. The text in the text block will appear as a clickable link. Set the StackPanel to IsEnabled="False" and the text will appear as disabled. Now, add a ToggleButton and bind its IsChecked property to the StackPanel IsEnabled property. Click the button and the StackPanel will get disabled, but not the hyperlink. Start with a disabled StackPanel, the link will be disabled, click the button and the hyperlink will stay disabled. The solution: set on the Hyperlink, inline or via a style, IsEnabled="{Binding IsEnabled,RelativeSource={RelativeSource AncestorType={x:FrameworkElement}}}". Now that is ugly.

As a sideline, whenever you see a WPF element inexplicably disabled and you use Snoop on it and try to set IsEnabled to true and you can't, there is probably one of two situations:

A parent of the control is disabled
The control is implementing ICommandSource and its Command property is set on an ICommand that returns false on its CanExecute method