Fast String Distance (SIFT) Algorithm

Published Nov 28, 2006

Posted in
programming
essay

This article is obsolete, a better version of the algorithm has been published: Sift3

While researching different ways of measuring the distance between two strings, or how different they are, I've found of course the Levenstein algorithm. The problem with it is that it is slow. Searching more, I've seen some algorithms that seemed fast, but I didn't have the time or brain power to understand them. So I've devised my own algorithm, called SIFT. You might think the name comes from Siderite's Intelligent and Fast Technique, but it comes from the English word 'sift'. :)
How does it work? Well, the common scenario in comparing strings is that someone made a mistake, a typo. So in principle, the two strings should be very similar in order to be worth comparing them. So what I do is this:

foreach phase
    remove identical overlapping characters
    shake strings
return number of shakes + the length of the longest string between the two.

There is an optimisation towards the safe side: if the sift similarity is big enough, perform the constly Levenstein distance.

Ok, it might not be so clear, let's take an example:
INTERESTING
INFORMATIVE

Step 1: remove all identical overlapping characters (sift)
TEESNG
FOMAVE

Now we have smaller words to check, let's suppose there was a typo, that means that part of the one word is offset with one or maybe two characters from the other. So we move them a bit, that's a 'shake'.

Step 2: shake
TEESNG
[]FOMAVE

Oops, no overlapping characters. We do this one or two times more and there is no result, so...

Step 3: return result
MaxLength(TEESNG, FOMAVE)=6

There you have it. The sifting algorithm, because it resembles sifting grain.

Not satisfied with such a simple example? Let's take another:
Click here

Tests have shown it to be pretty close to Levenstein, at least in the cases that matter, while being substantially faster.

Running processes in background

Published Nov 27, 2006

Posted in
.NET
programming
C#

and has 0 comments

Long story short: the BackgroundWorker object. Available in .NET 2.0
This is a Microsoft tutorial on using BackgroundWorker:
How to: Run an Operation in the Background
This is an older and more Windows Forms basic tutorial on multithreading:
Safe, Simple Multithreading in Windows Forms, Part 1
Safe, Simple Multithreading in Windows Forms, Part 2

Details:
BackgroundWorker has the DoWork, ProgressChanged, RunWorkerCompleted, and Disposed events. You need to assign at least one method for DoWork and one for RunWorkerCompleted, then run

bw.RunWorkerAsync(obj);

The DoWork method should do something like

e.Result=BackgroundOperation(obj);

while the RunWorkerCompleted method should do anything related to the GUI. There is also a CancelAsync() method, to try to stop a background operation.

Also, here is an article about a possible bug in BackgroundWorker, but I haven't replicated it on my computer.

Building a GridView, DataGrid or Table with THEAD, TBODY or TFOOT sections

Published Nov 22, 2006

Posted in
.NET
ASP.NET
programming

and has 10 comments

There are 726 articles on Google when you search "gridview thead". Most of them, and certainly all the first ones, talk about not being able to render thead, tbody and tfoot elements for NET 2.0 table based controls. But it's not so!

Each table row has a property called TableSection. If you set it to TableRowSection.TableHeader, TableBody or TableFooter, the specific tags will be created. Let me show a quick example of creating a THEAD element in a gridview:

gridView.HeaderRow.TableSection=TableRowSection.TableHeader;

And that's it. This kind of behaviour works for the Table WebControl and everything that derives from it or uses it to render itself.
However, the rendering of these elements inside the Table control is done simply with writer.RenderBeginTag(HtmlTextWriterTag.Thead), which gives no one the ability to change from .NET code the attributes of those sections. You can't have it all! You can use CSS, though. ex:

.tableClass thead {
  position:relative;
}

Multiple IE versions on the same computer

Published Nov 20, 2006

and has 1 comment

Usually, when I decide to blog on something, I do the testing and researching and installing first, then blog about it. But now I intend to use this cool site:
http://tredosoft.com/Multiple_IE
which boasts with installing all IE versions since 3.0 (oh, beloved 3.0) on the same computer without any problems. Since I am not a trusty guy, I blog about it before, then, if no one ever hears from me again, it means that no browser on any computer worked anymore after this :)

Well, without further ado, let me proceed :-SS
Extra info: http://www.positioniseverything.net/articles/multiIE.html

Step 1: Installing IE7.
Of course I had to validate my copy of Windows to download it, then I had to download all updates (even if I went to Windows Update right before installing IE7), then wait until it searched my computer for malicious software, then installing everything. You can't imagine a smoother installer. It just tells you to wait and does everything in the background, showing you meaningless text labels, a cool progress bar and, of course, asking you to close everything before and restart Windows after the installation.
But it worked, and I am not writing this from Firefox :)

Step 2: Installing multiple-ie-setup.exe
Wow! It took around 1 minute to install everything. Of course, not everything is going as smoothly as planned. First of all, I can't see Blogger (and suposedly not any other cookie using site) in IE6.0. Then, it redirects me to a nocookies.html files that doesn't exist :-/ But that's a Blogger issue. The Options menu in IE6.0 is actually the IE7.0 menu and the settings for cookies cannot be overriden. Actually, you can, but the settings won't save.
After looking at the TrendoSoft site, I've noticed that this bug is considered solved, even if some of the people seem to continue to have problems. So I've tested more throughly. Session variables seems to be saved, but Blogger continues to take me to the nocookies page. Also, AjaxPro, the ajax library I am using, doesn't seem to work with IE 6.0.

All in all it seems a pretty functional program. However, the type of sites that I am building have certain characteristics that seem not to work with it. I will try to log on TrendoSoft and get the problem fixed, but I guess Yousif did the best he could so far and resolving every issue I have will be hard if not impossible.

TransactionScope in .NET

Published Nov 7, 2006

and has 0 comments

Amirthalingam Prasanna's article about the transactional model in NET 2.0.

As far as I understand, the old declarative ADO.NET Begin/RollBack/CommitTransaction model has become obsolete and a new TransactionScope model is used in NET 2.0. You have to add the System.Transactions.dll file to your references.

Basically, the C# code is like this:


using (TransactionScope scope=new TransactionScope
                       (scopeOption,transactionOptions,interopOption)) {
    // do database ops

    // if everything is alright
    scope.Complete();
}

scopeOption is an enum of type TransactionScopeOption, with the options Required (requires a transaction and uses if there is already on open), RequiresNew (always opens a new transaction), Suppress (don't use a transaction, even if one is open)

transactionOptions is of type TransactionOptions which has two interesting properties: IsolationLevel and Timeout.

interopOption is an enum of type EnterpriseServicesInteropOption and specifies how distributed transactions interact with COM+ transactions.

But what about the old NET1.1 framework? Doesn't it have something like that? Here comes Alexander Shirshov with help:
TransactionScope in .NET 1.1

Sort datatable with the Select method when column names contain commas

Published Oct 27, 2006

Posted in
.NET
programming
C#

and has 0 comments

Well, basically, you can't do it.
I am looking at the internal ParseSortString(string sortString) in the DataTable object of NET 1.1, where the first thing the method does is to split the string by commas, then check for square brackets. This insures that there is no way to sort datatables by columns with commas in their names. The funny thing is that the filtering expression is treated like royalty by an ExpressionParser object, and allows column names with commas inside.
Now let's check the code for NET 2.0. It's identical.

The solution is a little annoying code like this:

private DataRow[] SelectSafe(DataTable dt, string filter, string sort)
{
  var columns = new string[dt.Columns.Count];
  for (var c=0; c<dt.Columns.Count; c++)
  {
    columns[c] = dt.Columns[c].ColumnName;
    if (dt.Columns[c].ColumnName.IndexOf(',')>-1)
    {
      dt.Columns[c].ColumnName = dt.Columns[c].ColumnName.Replace(',', ';');
      // assume that the column name was bracketed correctly in the select
      sort = sort.Replace(
        "[" + columns[c] + "]",
        "[" + dt.Columns[c].ColumnName + "]");
    }
  }
  var dr = dt.Select(filter, sort);
  for (int c=0; c<dt.Columns.Count; c++) {
    dt.Columns[c].ColumnName = columns[c];
  }
  return dr;
}

I am sure there is more elegant code, but this seems to be the only solution so far except manually sorting a DataTable.

MSDN Briefing Bucharest 24th of October 2006

Published Oct 25, 2006

Posted in
misc
news
programming

and has 0 comments

The whole meeting took place at the Titulescu room at RomExpo. They had 4 desks that spanned the alphabetical ordering of software firms participating, so the entry was really comfortable. They gave us a pen and a little notebook to take notes, too. The whole meeting lasted from 9:00-16:30, then there was an hour of free talks.
My general impression of the briefing was good. The presenters were enthusiastic and talked about: application development on Windows Vista with NET 3.0 and Sharepoint and Office 2007. The new Microsoft XML office format was presented, programatic methods of accessing and creating them, how to mix Sharepoint and Office in order to create quick Excel based web sites, etc. The most interesting part, though, was of course the last. It presented the advances in programming technology like the C# 3.0 features and ADO.Net vNext. Too bad I've already read about those technologies, but the enthusiastic presentation mode (alas, SPOKEN TOO LOUD) was refreshing.
The information was as compacted as possible, but there was too little code for me, even if most presenters seemed to have the same hatred for marketing slides as I did.
There were also 30 minutes of coffee break and 1 hour of lunch. The lunch food was very good and varied, from chinese apetizers like sesame meatballs and Shanghai chicken to sandwiches, salads and sweets. Taking into account that I've been to a similar Microsoft thing in Milano, where they barely gave us some sandwiches in plastic bags, this was truly great.

NullReferenceException in GridView BuildCallbackArgument when trying to render it dynamically with paging

Published Oct 23, 2006

Posted in
ASP.NET
programming
C#

and has 9 comments

Whoa! Big title there. This article applies to errors in ASP.NET 2.0 when trying to dynamically render a gridview with paging.

I have been trying to "Ajaxify" my web programming and I've found that the easiest method is to wrap everything I need in Web User Controls, render the controls, then send the rendered string through ajax to fill some innerHTML.
Well, the fine people at Microsoft thought otherwise. I have been trying to render a web user control with a gridview inside it for 3 hours now and nothing helped. I kept getting a NullReferenceException when calling GridView.DataBind and the StackTrace showed it originated in the BuildCallBackArgument(int) method.
Nothing helped except actually decompiling the code of the GridView itself. I've found out that it called a method in the Page of the control, in other words it needed a page. I gave it a page, but then another problem occured that I'd already solved here. I already had a parent page that overrode the offending method, so the final code is this:

ParentPage pp=new ParentPage();  // where ParentPage is a Page with VerifyRenderingInServerForm(Control control) overriden so it doesn't do anything
pp.EnableEventValidation = false; //another silly error
uc.Page = pp; //uc is the user control needing rendering
uc.Bind();

Javascript replace function

Published Oct 19, 2006

and has 0 comments

Unlike .NET C# (or most other programming languages), the Javascript replace function only replaces the first instance of a string. To replace all the instances, you need to use regular expressions.
Simon Willison's Weblog contains a good article on this.

Basically, if you use str1.replace(str2,str3) it will return str1 with the first occurence of str2 replaced with str3. If you use str1.replace(regexp1,str3) and the regular expression has the g modifier it will return str1 with all matches of regexp1 replaced with str3.

A regular expression looks like : /searchpattern/modifiers.
You can create a regular expression from a string by using the

new RegExp(str2,modifiers)

syntax.

The problem comes when you want to create a regular expression from a variable that may contain regular expression escape sequences. Here is Simon Willison's function that "escapes" the string in order to use the RegExp syntax safely, slightly modified to contain the '^' character:


RegExp.escape = function(text) {
  if (!arguments.callee.sRE) {
    var specials = [
      '/', '.', '*', '+', '?', '<', '>',
      '(', ')', '[', ']', '{', '}', '\\', '^'
    ];
    arguments.callee.sRE = new RegExp(
      '(\\' + specials.join('\\') + ')', 'g'
    );
  }
  return text.replace(arguments.callee.sRE, '\\$1');
}

Example: str=str.replace(/\\/g,'\\\\') (replace slashes with double slashes)

Checking your web app against different browser resolutions

Published Oct 17, 2006

and has 0 comments

Use an online type of viewer.
You select the resolution, type in the URL of your site, then sit back and relax.
- you don't want everybody to know you are working on a site
- you don't want the site to be open to the public while you work on it
- you need an internet connection

Use an external program.
BrowserSizer is as good as any. It's free and easy to use. You select the resolution and it resizes your IE window accordingly.
The only problem I see is that you need the program. You install it, it messes with your browser settings, etc.

Use a javascript script in the IE addressbar:
```
javascript:db=document.body;bst=db.style.zoom=1;rd=prompt("Width:",800);
db.style.zoom=db.offsetWidth/rd;void('')
```
Unfortunately, the zoom messes up quite a few things, including fixed table headers or custom javascript controls. However, I do believe it is the best solution for most situations.

ASP.NET DefaultButton on LinkButton or ImageButton and Mozilla Firefox

Published Oct 17, 2006

Posted in
.NET
ASP.NET
programming

and has 0 comments

You may experience unexpected results when you open a Web page that is in an ASP.NET 2.0-based application in Mozilla Firefox and the DefaultButton property is assigned to a LinkButton control or an ImageButton control

Just a reminder of a bug I am likely to encounter. The obvious solution is to use ControlAdapters or inheritance to create LinkButtons and ImageButtons that are rendered as buttons.

FxCop - automatic check of NET code.

Published Oct 17, 2006

Posted in
.NET
programming

and has 0 comments

I've accidentally stumbled upon FxCop, a Microsoft free tool that analyses the generated NET code (.exe or .dll) for bad design practices.
While many of the errors and warnings I got were related to casing, a lot of them were not and they had come with links, extended information and solutions. The rules that FxCop has help you to make members static if none of the instantiated object's properties or members are used in it, use case insensitive String.Compare instead of comparing two ToLower strings, or StringBuilder in loops, use NET 2.0 constructs instead of 1.1 ones, etc.
I find it at least interesting and I intend to use it in my future software projects.

Virtualize! Good bye, cruel real world!

Published Oct 12, 2006

Posted in
.NET
ASP.NET
programming

and has 0 comments

ASP.NET 2.0 has this nice feature called Virtual Path Providers. What it
actually does is enable you to get your site files from anywhere using an
override of the VirtualPathProvider class.

Virtualizing Access to Content: Serving Your Web Site from a ZIP File
This is a very nice article where a Microsoft guy shows how to run a
complete ASP.NET site from a ZIP arhive. Just two lines of code in
global.asax , a standard web config file and a ZIP arhive.

This opens up a lot of possibilities, like reading the ASPX or CS files from
a class that creates them dynamically, or reading the files from multiple
sources at once. Yummy!

Nullable types in C# 2.0

Published Oct 10, 2006

Posted in
.NET
programming
C#

and has 0 comments

I vaguely remember reading of nullable types in the C# 2.0 "what's new" documentation, but somehow it slipped by me. Now I've stumbled over this useful new feature and I can explain it for a bit.
Here is the Microsoft explanation.

Basically you can declare any value type as nullable by using the syntax <type>?.
Example: int? x=null;
There is even a nice operator ?? that acts like the SQL isnull function.
Example: int y=x ?? 0;
The two above examples are the short for the following:
System.Nullable x=null;
int y=x==null?0:x.Value; int y=x.HasValue?x.Value:0; OR int y=x.GetValueOrDefault(0)

GetValueOrDefault([value]) - gets the default value or the specified value when the nullable type is null

HasValue() - something like is not null

There is no IsNull method to the Nullable type. Also, x=null makes x==null true, as opposed to, let's say, the SqlInt32 type.

ASP.NET 2.0 Parser Error Message: Access to the path '[something.cs]' is denied

Published Sep 18, 2006

Posted in
.NET
ASP.NET
programming

and has 2 comments

I was stunned today to see that a site that I was working on was not starting because of this idiotic error:
ASP.NET 2.0 Parser Error Message: Access to the path '[something.cs]' is denied.
And further down:
No relevant source lines

The only thing I remembered doing was close the project in Visual Studio and work on another. I tried starting the site with the URL, without loading it into Visual Studio and the error occured. If you search on the net this error, you will see that there are a lot of articles that talk about the ind*exing ser*vice, but I stopped it a long time ago. So what was going on?

The file was accessible

The file was not opened by another application

I could freely delete/remove/modify the file

In desperation I compared the Security settings for this file with other files in the directory. To my surprise, there was a major difference. The files that were accessible had a lot of users with access rights to them, the file that gave the error had around three.

I have no idea what caused this. I just copied the same rights from the other files to the problematic one and it worked. I am using Visual Studio 2005, Resharper and SourceSafe. Do you also hear the Twilight Zone soundtrack?