Chrome extension: Bookmark Explorer

Published Mar 23, 2016

Update 29 August 2017 - Version 3.0.4: The extension has been rewritten in EcmaScript6 and tested on Chrome, Firefox and Opera.

Update 03 March 2017 - Version 2.9.3: added a function to remove marketing URLs from all created bookmarks. Enable it in the Advanced settings section. Please let me know of any particular parameters you need purged. So far it removes utm_*, wkey, wemail, _hsenc, _hsmi and hsCtaTracking.

Update 26 February 2017: Version (2.9.1): added customizing the URL comparison function. People can choose what makes pages different in general or for specific URL patterns
Update 13 June 2016: Stable version (2.5.0): added Settings page, Read Later functionality, undelete bookmarks page and much more.
Update 8 May 2016: Rewritten the extension from scratch, with unit testing.
Update 28 March 2016: The entire source code of the extension is now open sourced at GitHub.

Whenever I read my news, I open a bookmark folder containing my favorite news sites, Twitter, Facebook, etc. I then proceed to open new tabs for each link I find interesting, closing the originating links when I am done. Usually I get a number of 30-60 open tabs. This wreaks havoc on my memory and computer responsiveness. And it's really stupid, because I only need to read them one by one. In the end I've decided to fight my laziness and create my first browser extension to help me out.

The extension is published here: Siderite's Bookmark Explorer and what it does is check if the current page is found in any bookmark folder, then allow you to go forward or backwards inside that folder.

So this is my scenario on using it:

Open the sites that you want to get the links from.
Open new tabs for the articles you want to read or YouTube videos you want to watch,etc.
Bookmark all tabs into a folder.
Close all the tabs.
Navigate to the bookmark folder and open the first link.
Read the link, then press the Bookmark Navigator button and then the right arrow. (now added support for context menu and keyboard shortcuts)
If you went too far by mistake, press the left arrow to go back.

OK, let's talk about how I did it. In order to create your own Chrome browser extension you need to follow these steps:

1. Create the folder

Create a folder and put inside a file called manifest.json. It's possible structure is pretty complex, but let's start with what I used:

{
    "manifest_version" : 2,

    "name" : "Siderite's Bookmark Explorer",
    "description" : "Gives you a nice Next button to go to the next bookmark in the folder",
    "version" : "1.0.2",

    "permissions" : [
        "tabs",
        "activeTab",
        "bookmarks",
        "contextMenus"
    ],
    "browser_action" : {
        "default_icon" : "icon.png",
        "default_popup" : "popup.html"
    },
    "background" : {
        "scripts" : ["background.js"],
        "persistent" : false
    },
    "commands" : {
        "prevBookmark" : {
            "suggested_key" : {
                "default" : "Ctrl+Shift+K"
            },
            "description" : "Navigate to previous bookmark in the folder"
        },
        "nextBookmark" : {
            "suggested_key" : {
                "default" : "Ctrl+Shift+L"
            },
            "description" : "Navigate to next bookmark in the folder"
        }
    }
}

The manifest version must be 2. You need a name, a description and a version number. Start with something small, like 0.0.1, as you will want to increase the value as you make changes. The other thing is that mandatory is the permissions object, which tells the browser what Chrome APIs you intend to use. I've set there activeTab, because I want to know what the active tab is and what is its URL, tabs, because I might want to get the tab by id and then I don't get info like URL if I didn't specify this permission, bookmarks, because I want to access the bookmarks, and contextMenus, because I want to add items in the page context menu. More on permissions here.

Now, we need to know what the extension should behave like.

If you want to click on it and get a popup that does stuff, you need to specify the browser_action object, where you specify the icon that you want to have in the Chrome extensions bar and/or the popup page that you want to open. If you don't specify this, you get a default button that does nothing on click and presents the standard context menu on right click. You may only specify the icon, though. More on browserAction here.

If you want to have an extension that reacts to background events, monitors URL changes on the current page, responds to commands, then you need a background page. Here I specify that the page is a javascript, but you can add HTML and CSS and other stuff as well. More on background here.

Obviously, the files mentioned in the manifest must be created in the same folder.

The last item in the manifest is the commands object. For each command you need to define the id, the keyboard shortcut (only the 0..9 and A..Z are usable unfortunately) and a description. In order to respond to commands you need a background page as shown above.

2. Test the extension

Next you open a Chrome tab and go to chrome://extensions, click on the 'Developer mode' checkbox if it is not checked already and you get a Load unpacked extension button. Click it and point the following dialog to your folder and test that everything works OK.

3. Publish your extension

In order to publish your extension you need to have a Chrome Web Store account. Go to Chrome Web Store Developer Dashboard and create one. You will need to pay a one time 5$ fee to open it. I know, it kind of sucks, but I paid it and was done with it.

Next, you need to Add New Item, where you will be asked for a packed extension, which is nothing but the ZIP archive of all the files in your folder.

That's it.

Let's now discuss actual implementation details.

Adding functionality to popup elements

Getting the popup page elements is easy with vanilla Javascript, because we know we are building for only one browser: Chrome! So getting elements is done via document.getElementById(id), for example, and adding functionality is done via elem.addEventListener(event,handler,false);

One can use the elements as objects directly to set values that are related to those elements. For example my prev/next button functionality takes the URL from the button itself and changes the location of the current tab to that value. Code executed when the popup opens sets the 'url' property on the button object.

Just remember to do it when the popup has finished loading (with document.addEventListener('DOMContentLoaded', function () { /*here*/ }); )

Getting the currently active tab

All the Chrome APIs are asynchronous, so the code is:

chrome.tabs.query({
    'active' : true,
    'lastFocusedWindow' : true
}, function (tabs) {
    var tab = tabs[0];
    if (!tab) return;
    // do something with tab
});

Changing the URL of a tab

chrome.tabs.update(tab.id, {
  url : url
});

Changing the icon in the Chrome extensions bar

if (chrome.browserAction) chrome.browserAction.setIcon({
    path : {
        '19' : 'anotherIcon.png'
    },
    tabId : tab.id
});

The icons are 19x19 PNG files. browserAction may not be available, if not declared in the manifest.

Get bookmarks

Remember you need the bookmarks permission in order for this to work.

chrome.bookmarks.getTree(function (tree) {
  //do something with bookmarks
});

The tree is an array of items that have title and url or children. The first tree array item is the Bookmarks Bar, for example. More about bookmarks here.

Hooking to Chrome events

chrome.tabs.onUpdated.addListener(refresh);
chrome.tabs.onCreated.addListener(refresh);
chrome.tabs.onActivated.addListener(refresh);
chrome.tabs.onActiveChanged.addListener(refresh);
chrome.contextMenus.onClicked.addListener(function (info, tab) {
    navigate(info.menuItemId, tab);
});
chrome.commands.onCommand.addListener(function (command) {
    navigate(command, null);
});

In order to get extended info on the tab object received by tabs events, you need the tabs permission. For access to the contextMenus object you need the contextMenus permission.

Warning: if you install your extension from the store and you disable it so you can test your unpacked extension, you will notice that keyboard commands do not work. Seems to be a bug in Chrome. The solution is to remove your extension completely so that the other version can hook into the keyboard shortcuts.

Creating, detecting and removing menu items

To create a menu item is very simple:

chrome.contextMenus.create({
  "id" : "menuItemId",
  "title" : "Menu item description",
  "contexts" : ["page"] //where the menuItem will be available
});

However, there is no way to 'get' a menu item and if you try to blindly remove a menu item with .remove(id) it will throw an exception. My solution was to use an object to store when I created and when I destroyed the menu items so I can safely call .remove().

To hook to the context menu events, use chrome.contextMenus.onClicked.addListener(function (info, tab) { }); where info contains the menuItemId property that is the same as the id used when creating the item.

Again, to access the context menu API, you need the contextMenus permission. More about context menus here.

Commands

You use commands basically to define keyboard shortcuts. You define them in your manifest and then you hook to the event with chrome.commands.onCommand.addListener(function (command) { });, where command is a string containing the key of the command.

Only modifiers, letters and digits can be used. Amazingly, you don't need permissions for using this API, but since commands are defined in the manifest, it would be superfluous, I guess.

That's it for what I wanted to discuss here. Any questions, bug reports, feature requests... use the comments in the post.

Firebase - Queries

Published Feb 18, 2016

and has 0 comments

In the previous post I was discussing Firebase, used in Javascript, and that covered initialization, basic security, read all and insert. In this post I want to discuss about complex queries: filtering, ordering, limiting, indexing, etc. For that I will get inspiration (read: copy with impunity) from the Firebase documentation on the subject Retrieving Data, but make it quick and dirty... you know, like sex! Thank you, ma'am!

OK, the fluid interface for getting the data looks a lot like C# LInQ and I plan to work on a Linq2Firebase thing, but not yet. Since LInQ itself got its inspiration from SQL, I was planning to structure the post in a similar manner: how to do order by, top/limit, select conditions, indexing and so on, so we can really use Firebase like a database. An interesting concept to explore is joining, since this is an object database, but we still need it, because we want to filter by the results of the join before we return the result, like getting all the transaction of users that have the name 'Adam'. Aggregating is another thing that I feel Firebase needs to support. I don't want a billion records in order to compute the sum of a property.

However, the Firebase API is rather limited at the moment. You get .orderByChild, then stuff like .equalTo, .startAt and .endAt and then .limitToFirst and .limitToLast. No aggregation, no complex filters, no optimized indexing, no joining. As far as I can see, this is by design, so that the server is as dumb as possible, but think about that 1GB for the free plan. It is a lot.

So, let's try a complex query, see were it gets us.

ref.child('user')
  .once('value',function(snapshot) {
    var users=[];
    snapshot.forEach(function(childSnapshot) {
      var item=childSnapshot.val();
      if (/adam/i.test(item.name)) {
        users.push(item.userId);
      }
    });
    getInvoiceTotalForUsers(users,DoSomethingWithSum);
  });


function getInvoiceTotalForUsers(users,callback)
{
  var sum=0;
  var count=0;
  for (var i=0; i<users.length; i++) {
    var id=users[i];
    ref.child('invoice')
      .equalTo(id,'userId')
      .orderByChild('price')
      .startAt(10)
      .endAt(100)
      .once('value',function(snapshot) {
        snapshot.forEach(function(childSnapshot) {
          var item = childSnapshot.val();
          sum+=item.price;
          count++;
          if (count==users.length) callback(sum);
        });
      });
  }
}

First, I selected the users that have 'adam' in the name. I used .once instead of .on because I don't want to wait for new data to arrive, I want the data so far. I used .forEach to enumerate the data from the value event. With the array of userIds I call getInvoiceTotalForUsers, which gets all the invoices for each user, with a price bigger or equal to 10 and less or equal to 100, which finally calls a callback with the resulting sum of invoice prices.

For me this feels very cumbersome. I can think of several methods to simplify this, but the vanilla code would probably look like this.

Firebase - a free Javascript/Rest accessible cloud no SQL database - Introduction

Published Feb 18, 2016

and has 0 comments

I have been looking for a long time for this kind of service, mainly because I wanted to monitor and persist stuff for my blog. Firebase is all of that and more and, with a free plan of 1GB, it's pretty awesome. However, as it is a no SQL database and as it can be accessed via Javascript, it may be a bit difficult to get it at first. In this post I will be talking about how to use Firebase as a traditional database using their Javascript library.

So, first off go to the main website and signup with Google. Once you do, you get a page with a 5 minute tutorial, quickstarts, examples, API docs... but you want the ultra-quick start! Copy pasted working code! So click on the Manage App button.

Take note of the URL where you are redirected. It is the one used for all data usage as well. Ok, quick test code:

var testRef = new Firebase('https://*******.firebaseio.com/test');
testRef.push({
  val1: "any object you like",
  val2: 1,
  val3: "as long as it is not undefined or some complex type like a Date object",
  val4: "think of it as JSON"
});

What this does is take that object there and save it in your database, in the "test" container. Let's say it's like a table. You can also save objects directly in the root, but I don't recommend it, as the path of the object is the only one telling you what type of object it is.

Now, in order to read inserted objects, you use events. It's a sort of reactive way of doing things that might be a little unfamiliar. For example, when you run the following piece of code, you will get after you connect all the objects you ever inserted into "test".

var testRef = new Firebase('https://*******.firebaseio.com/test');
testRef.on('child_added', function(snapshot) {
   var obj = snapshot.val();
   handle(obj); //do what you want with the object
});

Note that you can use either child_added or value, as the retrieve event. While 'child_added' is fired on each retrieved object, 'value' returns one snapshot containing all data items, then proceeds to fire on each added item with full snapshots. Beware!, that means if you have a million items and you do a value query, you get all of them (or at least attempt to, I think there are limits), then on the next added item you get a million and one. If you use .limitToLast(50), for example, you will get the last 50 items, then when a new one is added, you get another 50 item snapshot. In my mind, 'value' is to be used with .once(), while 'child_added' with .on(). More details in my Queries post

Just by using that, you have created a way to insert and read values from the database. Of course, you don't want to leave your database unprotected. Anyone could read or change your data this way. You need some sort of authentication. For that go to the left and click on Login & Auth, then you go to Email & Password and you configure what are the users to log in to your application. Notice that every user has a UID defined. Here is the code to use to authenticate:

var testRef = new Firebase('https://*******.firebaseio.com/test');
testRef.authWithPassword({
  email    : "some@email.com",
  password : "password"
}, function(error, authData) {
  if (error) {
    console.log("Login Failed!", error);
  } else {
    console.log("Authenticated successfully with payload:", authData);
  }
});

There is an extra step you want to take, secure your database so that it can only be accessed by logged users and for that you have to go to Security & Rules. A very simple structure to use is this:

{
  "rules": {
    "test": {
      ".read": false,
      ".write": false,
      "$uid": {
        // grants write access to the owner of this user account whose uid must exactly match the key ($uid)
        ".write": "auth !== null && auth.uid === $uid",
        // grants read access to any user who is logged in with an email and password
        ".read": "auth !== null && auth.provider === 'password'"
      }
    }
  }
}

This means that:

It is forbidden to write to test directly, or to read from it
It is allowed to write to test/uid (remember the user UID when you created the email/password pair) only by the user with the same uid
It is allowed to read from test/uid, as long as you are authenticated in any way

Gotcha! This rule list allows you to read and write whatever you want on the root itself. Anyone could just waltz on your URL and fill your database with crap, just not in the "test" path. More than that, they can just listen to the root and get EVERYTHING that you write in. So the correct rule set is this:

{
  "rules": {
    ".read": false,
    ".write": false,
    "test": {
      ".read": false,
      ".write": false,
      "$uid": {
        // grants write access to the owner of this user account whose uid must exactly match the key ($uid)
        ".write": "auth !== null && auth.uid === $uid",
        // grants read access to any user who is logged in with an email and password
        ".read": "auth !== null && auth.provider === 'password'"
      }
    }
  }
}

In this particular case, in order to get to the path /test/$uid you can use the .child() function, like this: testRef.child(authData.uid).push(...), where authData is the object you retrieve from the authentication method and that contains your logged user's UID.

The rule system is easy to understand: use ".read"/".write" and a Javascript expression to allow or deny that operation, then add children paths and do the same. There are a lot more things you could learn about the way to authenticate: one can authenticate with Google, Twitter, Facebook, or even with custom tokens. Read more at Email & Password Authentication, User Authentication and User Based Security.

But because you want to do a dirty little hack and just make it work, here is one way:

{
  "rules": {
    ".read": false,
    ".write": false,
    "test": {
      ".read": "auth.uid == 'MyReadUser'",
      ".write": "auth.uid == 'MyWriteUser'"
    }
  }
}

This tells Firebase that no one is allowed to read/write except in /test and only if their UID is MyReadUser, MyWriteUser, respectively. In order to authenticate for this, we use this piece of code:

testRef.authWithCustomToken(token,success,error);

The handlers for success and error do the rest. In order to create the token, you need to do some cryptography, but nevermind that, there is an online JsFiddle where you can do just that without any thought. First you need a secret, for which you go into your Firebase console and click on Secrets. Click on "Show" and copy paste that secret into the JsFiddle "secret" textbox. Then enter MyReadUser/MyWriteUser in the "uid" textbox and create the token. You can then authenticate into Firebase using that ugly string that it spews out at you.

Done, now you only need to use the code. Here is an example:

var testRef = new Firebase('https://*****.firebaseio.com/test');
testRef.authWithCustomToken(token, function(err,authData) {
    if (err) alert(err);
    myDataRef.on('child_added', function(snapshot) {
        var message = snapshot.val();
           handle(message);
    });
});

where token is the generated token and handle is a function that will run with each of the objects in the database.

In my case, I needed a way to write messages on the blog for users to read. I left read access on for everyone (true) and used the token idea from above to restrict writing. My html page that I run locally uses the authentication to write the messages.

There you have it. In the next post I will examine how you can query the database for specific objects.

Performance of the NOT operator in Javascript when used to verify existence or if a value is set

Published Oct 23, 2015

and has 0 comments

I had this Javascript code that I was trying to write as tight as possible and I realized that I don't know if using "!" on an object to check if it is set to a value is slow or fast. I mean, in a strong typed language, I would compare the object with null, not use the NOT operator. So I randomly filled an array with one of five items: null, undefined, empty string, 0 and new Date(), then compared the performance of a loop checking the array items for having a value with the NOT operator versus other methods. I used Chrome 48.0.2564.109 m, Internet Explorer 11.0.9600.18163 and Firefox 44.0.2 for the tests.

Fast tally:

NOT operator: 1600/51480/1200ms
=== 0 (strong type equality): 1360/47510/2180ms
=== null : 550/45590/510ms
== with 0: 38700/63030/131940ms
== with null: 1100/48230/900ms
=== used twice(with 0 and null): 1760/69460/3500ms
typeof == 'object' (which besides the Dates also catches null): 1360/382980/1380ms
typeof === 'object' (which besides the Dates also catches null): 1370/407000/1400ms
instanceof Date: 1060/69200/600ms

Thoughts: the !/NOT operator is reasonably fast. Using normal equality can really mess up your day when it tries to transform 0 into a Date or viceversa (no, using 0 == arr[i] instead of arr[i] == 0 wasn't faster). Fastest, as expected, was the strong type equality to null. Surprising was the null equality, which catches both null and undefined and is second place. typeof was also surprising, since it not only gets the type of the object, but also compares the result with a string. Funny thing: the === comparison in the case of typeof was slower than the normal == comparison for all browsers, so probably it gets treated as a special construct.

It is obvious that both Chrome and Firefox have really optimized the Javascript engine. Internet explorer has a 18 second overhead for the loops alone (so no comparison of any kind), while the other browsers optimize it to 300ms. Sure, behind the scenes they realize that nothing happens in those loops and drop them, but still, it was a drag to wait for the result from Internet Explorer. Compared with the other huge values, the ===null comparison is insignificantly smaller than all the others on Internet Explorer, but still takes first place, while typeof took forever! Take these results with a grain of salt, though. When I was at FOSDEM I watched this presentation from Firefox when they were actually advising against this type of profiling, instead recommending special browser tools that would do that. You can watch it yourselves here.

Final conclusion: if you are checking if an object exists or not, especially if you can insure that the value of a non existent object is the same (like null), === kicks ass. The NOT operator can be used to check a user provided value, since it catches all of null, undefined, empty space or 0 and it's reasonably fast.

Here is the code:

var arr=[];
for (var i=0; i<100000; i++) {
    var r=parseInt(Math.random()*5);
    switch(r) {
        case 0: arr.push(null); break;
        case 1: arr.push(undefined); break;
        case 2: arr.push(''); break;
        case 3: arr.push(0); break;
        case 4: arr.push(new Date()); break;
    }
}

var n=0;
var start=performance.now();
for (var j=0; j<1000; j++) {
    for (var i=0; i<100000; i++) {
        if (!arr[i]) n++;
    }
}
var end=performance.now();
console.log('!value                    '+n+': '+(end-start));

n=0;
start=performance.now();
for (var j=0; j<1000; j++) {
    for (var i=0; i<100000; i++) {
        if (arr[i] === 0) n++;
    }
}
end=performance.now();
console.log('value===0                 '+n+': '+(end-start));

n=0;
start=performance.now();
for (var j=0; j<1000; j++) {
    for (var i=0; i<100000; i++) {
        if (arr[i] === null) n++;
    }
}
end=performance.now();
console.log('value===null              '+n+': '+(end-start));

n=0;
start=performance.now();
for (var j=0; j<1000; j++) {
    for (var i=0; i<100000; i++) {
        if (arr[i] == 0) n++;
    }
}
end=performance.now();
console.log('value==0                  '+n+': '+(end-start));

n=0;
start=performance.now();
for (var j=0; j<1000; j++) {
    for (var i=0; i<100000; i++) {
        if (arr[i] == null) n++;
    }
}
end=performance.now();
console.log('value==null               '+n+': '+(end-start));

n=0;
start=performance.now();
for (var j=0; j<1000; j++) {
    for (var i=0; i<100000; i++) {
        if (arr[i] === 0 || arr[i] === null) n++;
    }
}
end=performance.now();
console.log('value===0 || value===null '+n+': '+(end-start));

n=0;
start=performance.now();
for (var j=0; j<1000; j++) {
    for (var i=0; i<100000; i++) {
        if (typeof(arr[i])=='object') n++;
    }
}
end=performance.now();
console.log('typeof(value)==\'object\'   '+n+': '+(end-start));

n=0;
start=performance.now();
for (var j=0; j<1000; j++) {
    for (var i=0; i<100000; i++) {
        if (typeof(arr[i])==='object') n++;
    }
}
end=performance.now();
console.log('typeof(value)===\'object\'  '+n+': '+(end-start));

n=0;
start=performance.now();
for (var j=0; j<1000; j++) {
    for (var i=0; i<100000; i++) {
        if (arr[i] instanceof Date) n++;
    }
}
end=performance.now();
console.log('value instanceof Date     '+n+': '+(end-start));

navigator.geolocation API strange behaviour on Microsoft Edge - a foolproof location snippet in Javascript

Published Aug 13, 2015

and has 2 comments

Today we tested a web application in the new Microsoft Edge browser. To our surprize, the site failed where Internet Explorer, Chrome, Firefox and even Safari worked perfectly well. I narrowed the problem to the navigator.geolocation.getCurrentLocation which wasn't working. The site would see navigator.geolocation, ask for the current location, the user would be prompted to allow the site to access location and after that would silently fail. What I mean by that is that neither the success or the error callbacks were called, even if the options object specified one second for the timeout. I don't have access to a lot of Windows 10 machines and I assume that if a lot of people met with this problem they would invade the Internet with angry messages, but so far I've found no one having the same issue.

Bottom line: forced to take into consideration the possibility that the geolocation API would silently fail, I changed the code like this:

if (navigator.geolocation) {
    var timeoutInSeconds=1;
    var geotimeout=setTimeout(function() {
        handleNoGeolocation();
    },timeoutInSeconds*1000+500); //plus 500 ms to allow the API to timeout normally
    navigator.geolocation.getCurrentPosition(function (position) {
        clearTimeout(geotimeout);
        var pos = doSomethingWith(position.coords.latitude, position.coords.longitude);
    }, function () {
        clearTimeout(geotimeout);
        handleNoGeolocation();
    },{
        enableHighAccuracy:true,
        timeout: timeoutInSeconds*1000
    });
} else {
    handleNoGeolocation();
}

In the handleNoGeolocation function I've accessed the great service FreeGeoIp, that returns vague coordinates based on your IP and fell back to a static latitude, longitude pair if even this call failed.

Note: for the first time the function is called for your site, a browser dialog will appear, requesting permission to share the location. During the display of the dialog the timeout will fire, then, based on the user choice (and browser) a success/error handler will be called or nothing (like in this case), so make sure your code can handle running handleNoGeolocation followed by doSomethingWith.

Blog Question: How to use a large image as a page background, without showing it loading slowly?

Published Mar 25, 2015

and has 0 comments

A blog reader asked me to help him get rid of the ugly effect of a large background image getting loaded. I thought of several solutions, all more complicated than the rest, but in the end settled on one that seems to be working well and doesn't require complicated libraries or difficult implementation: using the img onload event.

Let's assume that the background image is on the body element of the page. The solution involves setting a style on the body to hide it (style="display:none") then adding as child of the body an image that also is hidden and that, when completing loading, shows the body element. Here is the initial code:

<style>
body {
    background: url(bg.jpg) no-repeat center center fixed;
}
</style>
<body>

And after:

<style>
body {
    background: url(bg.jpg) no-repeat center center fixed;
}
</style>
<body style="display:none">
<img src="bg.jpg" onload="document.body.style.display=''" style="display:none;" />

This loads the image in a hidden img element and shows the body element when the image finished loading.

The solution might have some problems with Internet Explorer 9, as it seems the load event is not fired for images retrieved from the cache. In that case, a slightly more complex Javascript solution is needed as detailed in this blog post: How to Fix the IE9 Image Onload Bug. Also, in Internet Explorer 5-7 the load event fires for animated GIFs at every loop. I am sure you know it's a bad idea to have an animated GIF as a page background, though :)

Warning: While this hides the effect of slow loading background images, it also hides the page until the image is loaded. This makes the page appear blank until then. More complex solutions would show some simple html content while the page is loading rather than hiding the entire page, but this post is about the simplest solution for the question asked.

A more comprehensive analysis of image preloading, complete with a very nice Javascript code that covers a lot of cases, can be found at Preloading images using javascript, the right way and without frameworks

Javascript "arguments" weirdness

Published Feb 5, 2015

and has 0 comments

I was using this JavaScript function that I wanted to accept an array of arguments or just a bunch of arguments that would be interpreted as an array. As you probably know, for each function in JS you get a variable called arguments which contains the arguments to the function. It is a pseudo array, not a real array, and has some extra properties. My code looked like this:

function x(arg) {
  if (!arg) {
    arg=[];
  } else if (!arg.length) {
    arg=[];
    for(var i=0; i<arguments.length; i++) arg.push(arguments[i]);
  }
  // do something with arg
}

The logic is simple, if there is no first parameter, arg becomes an empty array, else, if there is a first argument, but it doesn't have a length property (not an array) set arg to an array and push all arguments of the function as items of the array. But it doesn't work! The point is this: you set arg to an empty array and at that moment arguments[0] is no longer the original argument, but the empty array. Even worse, the code then adds the array as an item of itself, which makes the object be infinitely recursive.

Let's make this simpler:

function x(arg) {
  arg=[];
  console.log(arguments[0]);
}

After you execute x() with any arguments, the console will show an empty array, not the original argument. Weird, huh?

Super Fast and Accurate string distance algorithm: Sift4

Published Nov 10, 2014

and has 44 comments

The algorithm works perfectly well and is better than Sift3, however it's slightly more complex. You might want to start with Sift3 in order to understand where it came from.

Update November 8 2022: I found a bug in the algorithm, relating to maxDistance. I've updated the code. If you didn't use maxDistance, you are unaffected. Basically the fix is to compare temporaryDistance>minDistance (before it was >= ) and to move the calculation of the temporary distance after c1 and c2 are updated to their minimum value when a token was not found (otherwise the temporary distance might become larger than the final distance)

Try the Javascript implementation here:

Algorithm:

Levenstein
Sift3
Sift4 MaxOffset:

String 1:

String 2:

Result:

Update 28 Mar 2015: I've changed the algorithm significantly. The transpositions are now computed differently and the cost of a transposition in the final result is 1, rather than 0.5. Also, while I think a value of 1 is better conceptually, I noticed that Sift4 approximates Levenshtein a little better when the cost of a transposition is either 2 or a function depending on the offset difference between c2 and c1, especially when maxOffset grows. This can be now changed via the new options function transpositionCostEvaluator. The problem I am having now is more false positives when the letters/tokens of the two strings are the same, but their positions are jumbled differently. With small maxOffset values, like 5 or 10, the result is much better than Sift3, however when maxOffset grows, lots of matches can be found and the cost of transpositions becomes very important.

Update 27 Mar 2015: Thanks to Emanuele Bastianelli who discovered a bug that appeared in an edge case, I've updated the algorithms. Now, at the end of the while loop there is an extra check to prevent the algorithm exiting prematurely, before computing remaining tokens.

Intro

A really long time ago I wrote the third version of Sift, the string distance algorithm. It so happens that I am going to give a small presentation, here in Ispra, about this algorithm, so I had the opportunity to review it. I found some inconsistencies and I actually did some research in the field that gave more more ideas. So before giving the presentation I thought of publishing what I think is the fourth version. What's new:

33% more accurate
three different variants: simple, common and general
new concepts added
support for own value and matching functions, different tokenizer functions, etc.
actually tested with a (slightly more) serious test
more robust, working better for large values of maxOffset

Before I get into the details, I am publishing the algorithm here for the moment, no Codeplex or PasteBin or GitHub or whatever. Also, it is written in Javascript now, the C# and T-SQL version pending. Of course, it would be great if, as before, the community of people using the algorithm would go into implementing it into various programming languages, however I am a bit apprehensive because more often than not people came with their own improvements or interpretations when translating the algorithm into another language. But support is always welcome!

New concepts in Sift4

I created a test that used random strings, but also a huge list of commonly used English phrases as well as mutations on these strings, adding or removing small bits and so on. I then implemented Sift3, Levenstein and the new algorithm and computed the error distance between the Levenstein distance and the two Sift variants. This permitted me to see how the error evolves when changing the algorithm and the parameters. One thing I noticed is that when increasing the maxOffset value to large values like 15 or 20, the accuracy of Sift3 was going down. Also, as pointed out by one commenter on the Sift3 post, there are cases when Sift3(a,b) is different from Sift3(b,a). There are edge cases, but this one in particular grated me.

After implementing Sift4, I can now tell you that the simple version is slightly better than Sift3 for small maxOffset values like 5, but it gets better as the value increases. The common version is a bit more complex, but the error decreases with 33% and maintains a low error for large maxOffset values. The extended or general version receives an options object that can change almost everything, but most important is the tokenizer function. Imagine that you want to compute the distance based not on letters, but on n-grams (groups of n characters). Or that you want to compare them by the words in the text, maybe even their synonyms. This can all be achieved just by changing the tokenizer function. The other parameters involve defining what it means for two tokens to match and what is the value of their match, etc.

One of the new concepts implemented is taken from the Jaro distance. Jaro seems a lot like Sift in the way that it considers two characters to match if they are in close proximity. Also, if "the streams cross", like 'ab' vs 'ba', one considers them transpositions and removes some of their value from the distance. Actually, if I look at the implementation, it might be that I have independently discovered the Jaro distance. I will research this further. I don't know if the transposition calculation is the most optimal. At the moment it uses an array of all matches found until a point, clearing it of values as the cursors move along the string. The difference between the simple and the common versions of Sift4 is that the simple version is not computing the transpositions at all and has no concept of maxDistance. In that respect it is a slightly fixed up Sift3.

Another new concept added is the one of local substring. Imagine that the Largest Common Subsequence that Sift is actually trying to find in order to determine the distance is made of substrings, separated by non matching characters. Each of these substrings can be used to improve the distance function. For example one could argue that 'abcdex' is closer to 'abcde' than 'abcxde', because even if the largest common subsequence is 5, the largest common substring is 5 for the first string and only 3 for the second. The extended version of the algorithm allows for changing the value of each substring individually.

Well, here they are, the three versions. The extended version has some examples at the end for possible parameters.

The code

Simplest Sift4:

// Sift4 - simplest version
// online algorithm to compute the distance between two strings in O(n)
// maxOffset is the number of characters to search for matching letters
function sift4(s1, s2, maxOffset) {
    if (!s1 || !s1.length) {
        if (!s2) {
            return 0;
        }
        return s2.length;
    }

    if (!s2 || !s2.length) {
        return s1.length;
    }

    var l1 = s1.length;
    var l2 = s2.length;

    var c1 = 0; //cursor for string 1
    var c2 = 0; //cursor for string 2
    var lcss = 0; //largest common subsequence
    var local_cs = 0; //local common substring
    while ((c1 < l1) && (c2 < l2)) {
        if (s1.charAt(c1) == s2.charAt(c2)) {
            local_cs++;
        } else {
            lcss += local_cs;
            local_cs = 0;
            if (c1 != c2) {
                c1 = c2 = Math.max(c1, c2); //using max to bypass the need for computer transpositions ('ab' vs 'ba')
            }
            for (var i = 0; i < maxOffset && (c1 + i < l1 || c2 + i < l2); i++) {
                if ((c1 + i < l1) && (s1.charAt(c1 + i) == s2.charAt(c2))) {
                    c1 += i;
                    local_cs++;
                    break;
                }
                if ((c2 + i < l2) && (s1.charAt(c1) == s2.charAt(c2 + i))) {
                    c2 += i;
                    local_cs++;
                    break;
                }
            }
        }
        c1++;
        c2++;
    }
    lcss += local_cs;
    return Math.round(Math.max(l1, l2) - lcss);
}

Common Sift4:

// Sift4 - common version
// online algorithm to compute the distance between two strings in O(n)
// maxOffset is the number of characters to search for matching letters
// maxDistance is the distance at which the algorithm should stop computing the value and just exit (the strings are too different anyway)
function sift4(s1, s2, maxOffset, maxDistance) {
    if (!s1 || !s1.length) {
        if (!s2) {
            return 0;
        }
        return s2.length;
    }

    if (!s2 || !s2.length) {
        return s1.length;
    }

    var l1 = s1.length;
    var l2 = s2.length;

    var c1 = 0; //cursor for string 1
    var c2 = 0; //cursor for string 2
    var lcss = 0; //largest common subsequence
    var local_cs = 0; //local common substring
    var trans = 0; //number of transpositions ('ab' vs 'ba')
    var offset_arr = []; //offset pair array, for computing the transpositions
    while ((c1 < l1) && (c2 < l2)) {
        if (s1.charAt(c1) == s2.charAt(c2)) {
            local_cs++;
            var isTrans = false;
            //see if current match is a transposition
            var i = 0;
            while (i < offset_arr.length) {
                var ofs = offset_arr[i];
                if (c1 <= ofs.c1 || c2 <= ofs.c2) {
                    // when two matches cross, the one considered a transposition is the one with the largest difference in offsets
                    isTrans = Math.abs(c2 - c1) >= Math.abs(ofs.c2 - ofs.c1);
                    if (isTrans) {
                        trans++;
                    } else {
                        if (!ofs.trans) {
                            ofs.trans = true;
                            trans++;
                        }
                    }
                    break;
                } else {
                    if (c1 > ofs.c2 && c2 > ofs.c1) {
                        offset_arr.splice(i, 1);
                    } else {
                        i++;
                    }
                }
            }
            offset_arr.push({
                c1: c1,
                c2: c2,
                trans: isTrans
            });
        } else {
            lcss += local_cs;
            local_cs = 0;
            if (c1 != c2) {
                c1 = c2 = Math.min(c1, c2); //using min allows the computation of transpositions
            }
            if (maxDistance) {
                var temporaryDistance = Math.max(c1, c2) - lcss + trans;
                if (temporaryDistance > maxDistance)
                    return temporaryDistance;
            }
            //if matching characters are found, remove 1 from both cursors (they get incremented at the end of the loop)
            //so that we can have only one code block handling matches
            for (var i = 0; i < maxOffset && (c1 + i < l1 || c2 + i < l2); i++) {
                if ((c1 + i < l1) && (s1.charAt(c1 + i) == s2.charAt(c2))) {
                    c1 += i - 1;
                    c2--;
                    break;
                }
                if ((c2 + i < l2) && (s1.charAt(c1) == s2.charAt(c2 + i))) {
                    c1--;
                    c2 += i - 1;
                    break;
                }
            }
        }
        c1++;
        c2++;
        // this covers the case where the last match is on the last token in list, so that it can compute transpositions correctly
        if ((c1 >= l1) || (c2 >= l2)) {
            lcss += local_cs;
            local_cs = 0;
            c1 = c2 = Math.min(c1, c2);
        }
    }
    lcss += local_cs;
    return Math.max(l1, l2) - lcss + trans; //add the cost of transpositions to the final result
}

Extended/General Sift4:

// Sift4 - extended version
// online algorithm to compute the distance between two strings in O(n)
// maxOffset is the number of positions to search for matching tokens
// options: the options for the function, allowing for customization of the scope and algorithm:
//         maxDistance: the distance at which the algorithm should stop computing the value and just exit (the strings are too different anyway)
//         tokenizer: a function to transform strings into vectors of tokens
//        tokenMatcher: a function to determine if two tokens are matching (equal)
//        matchingEvaluator: a function to determine the way a token match should be added to the local_cs. For example a fuzzy match could be implemented.
//        localLengthEvaluator: a function to determine the way the local_cs value is added to the lcss. For example longer continuous substrings could be awarded.
//        transpositionCostEvaluator: a function to determine the value of an individual transposition. For example longer transpositions should have a higher cost.
//        transpositionsEvaluator: a function to determine the way the total cost of transpositions affects the final result
// the options can and should be implemented at a class level, but this is the demo algorithm
function sift4(s1, s2, maxOffset, options) {

    options = extend(options, {
            maxDistance: null,
            tokenizer: function (s) {
                return s ? s.split('') : [];
            },
            tokenMatcher: function (t1, t2) {
                return t1 == t2;
            },
            matchingEvaluator: function (t1, t2) {
                return 1;
            },
            localLengthEvaluator: function (local_cs) {
                return local_cs;
            },
            transpositionCostEvaluator: function (c1, c2) {
                return 1;
            },
            transpositionsEvaluator: function (lcss, trans) {
                return lcss - trans;
            }
        });

    var t1 = options.tokenizer(s1);
    var t2 = options.tokenizer(s2);

    var l1 = t1.length;
    var l2 = t2.length;

    if (l1 == 0)
        return l2;
    if (l2 == 0)
        return l1;

    var c1 = 0; //cursor for string 1
    var c2 = 0; //cursor for string 2
    var lcss = 0; //largest common subsequence
    var local_cs = 0; //local common substring
    var trans = 0; //number of transpositions ('ab' vs 'ba')
    var offset_arr = []; //offset pair array, for computing the transpositions
    while ((c1 < l1) && (c2 < l2)) {
        if (options.tokenMatcher(t1[c1], t2[c2])) {
            local_cs += options.matchingEvaluator(t1[c1], t2[c2]);
            var isTrans = false;
            //see if current match is a transposition
            var i = 0;
            while (i < offset_arr.length) {
                var ofs = offset_arr[i];
                if (c1 <= ofs.c1 || c2 <= ofs.c2) {
                    // when two matches cross, the one considered a transposition is the one with the largest difference in offsets
                    isTrans = Math.abs(c2 - c1) >= Math.abs(ofs.c2 - ofs.c1);
                    if (isTrans) {
                        trans += options.transpositionCostEvaluator(c1, c2);
                    } else {
                        if (!ofs.trans) {
                            ofs.trans = true;
                            trans += options.transpositionCostEvaluator(ofs.c1, ofs.c2);
                        }
                    }
                    break;
                } else {
                    if (c1 > ofs.c2 && c2 > ofs.c1) {
                        offset_arr.splice(i, 1);
                    } else {
                        i++;
                    }
                }
            }
            offset_arr.push({
                c1: c1,
                c2: c2,
                trans: isTrans
            });
        } else {
            lcss += options.localLengthEvaluator(local_cs);
            local_cs = 0;
            if (c1 != c2) {
                c1 = c2 = Math.min(c1, c2); //using min allows the computation of transpositions
            }
            if (options.maxDistance) {
                var temporaryDistance = options.localLengthEvaluator(Math.max(c1, c2)) - options.transpositionsEvaluator(lcss, trans);
                if (temporaryDistance > options.maxDistance)
                    return Math.round(temporaryDistance);
            }
            //if matching tokens are found, remove 1 from both cursors (they get incremented at the end of the loop)
            //so that we can have only one code block handling matches
            for (var i = 0; i < maxOffset && (c1 + i < l1 || c2 + i < l2); i++) {
                if ((c1 + i < l1) && options.tokenMatcher(t1[c1 + i], t2[c2])) {
                    c1 += i - 1;
                    c2--;
                    break;
                }
                if ((c2 + i < l2) && options.tokenMatcher(t1[c1], t2[c2 + i])) {
                    c1--;
                    c2 += i - 1;
                    break;
                }
            }
        }
        c1++;
        c2++;
        // this covers the case where the last match is on the last token in list, so that it can compute transpositions correctly
        if ((c1 >= l1) || (c2 >= l2)) {
            lcss += options.localLengthEvaluator(local_cs);
            local_cs = 0;
            c1 = c2 = Math.min(c1, c2);
        }
    }
    lcss += options.localLengthEvaluator(local_cs);
    return Math.round(options.localLengthEvaluator(Math.max(l1, l2)) - options.transpositionsEvaluator(lcss, trans)); //add the cost of found transpositions
}

function extend(obj, def) {
    var result = {};
    for (var prop in def) {
        if (!obj || !obj.hasOwnProperty(prop)) {
            result[prop] = def[prop];
        } else {
            result[prop] = obj[prop];
        }
    }
    return result;
}

// possible values for the options
// tokenizers:
function nGramTokenizer(s, n) { //tokenizer:function(s) { return nGramTokenizer(s,2); }
    var result = [];
    if (!s)
        return result;
    for (var i = 0; i <= s.length - n; i++) {
        result.push(s.substr(i, n));
    }
    return result;
}

function wordSplitTokenizer(s) { //tokenizer:wordSplitTokenizer
    if (!s)
        return [];
    return s.split(/\s+/);
}

function characterFrequencyTokenizer(s) { //tokenizer:characterFrequencyTokenizer (letters only)
    var result = [];
    for (var i = 0; i <= 25; i++) {
        var val = 0;
        if (s) {
            for (j = 0; j < s.length; j++) {
                var code = s.charCodeAt(j);
                if (code == i + 65 || code == i + 97)
                    val++;
            }
        }
        result.push(val);
    }
    return result;
}

//tokenMatchers:
function sift4TokenMatcher(t1, t2) { //tokenMatcher:sift4TokenMatcher
    var similarity = 1 - sift4(t1, t2, 5) / Math.max(t1.length, t2.length);
    return similarity > 0.7;
}

//matchingEvaluators:
function sift4MatchingEvaluator(t1, t2) { //matchingEvaluator:sift4MatchingEvaluator
    var similarity = 1 - sift4(t1, t2, 5) / Math.max(t1.length, t2.length);
    return similarity;
}

//localLengthEvaluators:
function rewardLengthEvaluator(l) {
    if (l < 1)
        return l; //0 -> 0
    return l - 1 / (l + 1); //1 -> 0.5, 2-> 0.66, 9 -> 0.9
}

function rewardLengthEvaluator2(l) {
    return Math.pow(l, 1.5); // 0 -> 0, 1 -> 1, 2 -> 2.83, 10 -> 31.62
}

//transpositionCostEvaluators:
function longerTranspositionsAreMoreCostly(c1, c2) {
    return Math.abs(c2 - c1) / 9 + 1;
}

As always, I will be most happy to know if you used my algorithm and how it performed, as well as receive any suggestion that you might have.

Options explained

Here is some explanation for the options of the general algorithm.

It no longer searches for characters, but for tokens. That is why the default tokenizer function splits the values into characters so that the algorithm would work on an array of one character long tokens. Other options are possible, like splitting the strings by empty spaces so that the comparisons are done on words or transforming a string into an array of strings N characters long, the so called N-grams. The tokenizer can be anything, like the characterFrequencyTokenizer, which turns each word in an array of 25 values representing the number of letters in the word for each letter a-z.

The tokenMatcher function returns true if two tokens are matching. They can be fuzzy matched, for example the sift4tokenMatcher example function uses Sift inside Sift to determine the character distance between two tokens and returns true if they match more than 70%.

The matchingEvaluator is a function that returns the value that will be added to the "common substring" length value when two tokens match. The default is 1, but one can use some other metric, like the similarity, for example. Of course, the common substring length has lost its meaning when these functions change, but the variable local_cs is still used.

The lengthEvaluator is taking the length value of the local common substring and returns a value that will be added to the longest common subsequence value. Usually it returns the same value as the one provided, but some functions could reward longer substrings.

FAQ

Q: Can you make Sift4 to work case insensitive?
A: Just turn the strings to lower or upper case before you compare them. Since this algorithm is more general, the concept of 'case' might not apply. Or implement a case insensitive tokenMatcher.

Q: Can you make Sift4 to compare strings based on their meaning, like using synonyms?
A: Use a tokenizer function that splits the strings into words, then replaces them with the most used of their synonyms. A more complex solution would require to analyze the strings beforehand and turn them into some ordered synonym or equivalent expressions equivalents, then use Sift4 with a word tokenizer (one is provided in the Extended algorithm source)

Q: I need an implementation for this programming language, can you help?
A: I can, but I might not have the time. Ask anyway, maybe I can be persuaded :)

Q: I have been using Sift3 until now, how do I upgrade to Sift4?
A: The best way I can think of is to implement Sift4 Simplest, as it needs only the Sift3 code and some minor changes. Since you never needed tokens before, I doubt you need them now. But if you do, I can help, see the above question.

Q: How can I reward you for this fantastic piece of software engineering?
A: While I did this for free and I don't expect to make any money out of it and while this algorithm is completely free to use and change as you see fit, I don't mind having a beer every now and then ;)

Q: Your algorithm really sucks because... reasons.
A: It may. I would be glad to discuss the reasons, though, and try to fix any problem you encounter.

Q: I compared Sift4 with another algorithm that is much more exact and there are differences.
A: Of course, they are different algorithms. This is a fuzzy distance calculator, it doesn't give you the exact value. There are still edge cases. But the idea of Sift is to be fast and relatively accurate, rather than very accurate. You need more accuracy, try to combine Sift with Levenshtein for example, computing Levenshtein only where Sift says the strings are above a certain similarity.

Q: I want to make maxOffset dependent on the length of the strings compared. Can you do that?
A: That is a perfect example why maxOffset should be a parameter of the function rather than a member of the class. Since this implementation is so far Javascript only, just compute the maxOffset that is convenient to you before you compare.

Q: I want to vary the weight of matches based on the position of the match, for example matches at the beginning of the string could be more valuable than those at the end.
A: The position of the match is indeed not sent to the functions that can be specified in the options object of the Sift4 Extended, but that can be trivially changed in the code. I don't think this particular request is very common, though, and I prefer to keep it out of the published implementation to make the code easier to understand.

Q: I found a bug!
A: Let me know it and I will try and fix it.

Q: If you need to compare large lists of strings, it is better to precompute some things, like specific hashes or suffix trees, etc. This will speed up the comparison tremendously!
A: Sift is what is called an online algorithm. It does not precompute anything, it just gets the two strings and the parameters for its functioning and returns the distance. You are correct in what you are saying, but that kind of solution is not in the scope of Sift, at least not version 4.

Q: What are the edge cases for Sift?
A: Probably there are several, but I didn't really spot them. One of them is that one might find both letters at a position matching letters at other positions, but only one will count. Example 'abxx' and 'bayy'. The algorithm will look at position 0, find no match, then try to find the closest match for each letter. Starting with position 0 in the first string it will find 'a' matched in position 1 in the second. It will increase both counters and lcss will be increase as well. Next check will be 'b', the character at position 1 in the first string matched with position 2 in the second string. No match, therefore both counters will be reset to 1, and starting search again. The 'b' match is lost and distance is 3 instead of 2. Also I think there might be some situations where the counters are not equal and the biggest of them reaches the end of its string, thus terminating the algorithm, but there could have been more matches. Incidentally I tried to fix both these issues and the error from Levenshtein was not really affected, but I am not 100% sure of the implementation.

Q: The algorithm continues to be asymmetric, Sift4(s1,s2) can be different from Sift4(s2,s1).
A: Yes. This is one of the artifacts of the linear nature of the algorithm. There is a function that is symmetric and that is Math.min(Sift4(a,b),Sift4(b,a)), however it is twice as slow, obviously.

Implementations in other languages

C# implementation [expand/collapse]

public class Sift4
{
    private Options _options;

    public Sift4(Options options)
    {
        if (options == null) options = new Options();
        if (options.Tokenizer == null)
        {
            options.Tokenizer = (s) => s == null
            ? new string[0]
            : s.ToCharArray().Select(c => c.ToString()).ToArray();
        }
        if (options.TokenMatcher == null)
        {
            options.TokenMatcher = (t1, t2) => t1 == t2;
        }
        if (options.MatchingEvaluator == null)
        {
            options.MatchingEvaluator = (t1, t2) => 1;
        }
        if (options.LocalLengthEvaluator == null)
        {
            options.LocalLengthEvaluator = (l) => l;
        }
        if (options.TranspositionCostEvaluator == null)
        {
            options.TranspositionCostEvaluator = (c1, c2) => 1;
        }
        if (options.TranspositionsEvaluator == null)
        {
            options.TranspositionsEvaluator = (l, t) => l - t;
        }
        _options = options;
    }

    /// <summary>
    /// General distance algorithm uses all the parameters in the options object and works on tokens
    /// </summary>
    /// <param name="s1"></param>
    /// <param name="s2"></param>
    /// <param name="maxOffset"></param>
    /// <returns></returns>
    public double GeneralDistance(string s1, string s2, int maxOffset)
    {
        var t1 = _options.Tokenizer(s1);
        var t2 = _options.Tokenizer(s2);

        var l1 = t1.Length;
        var l2 = t2.Length;

        if (l1 == 0) return _options.LocalLengthEvaluator(l2);
        if (l2 == 0) return _options.LocalLengthEvaluator(l1);

        var c1 = 0;  //cursor for string 1
        var c2 = 0;  //cursor for string 2
        var lcss = 0.0;  //largest common subsequence
        var local_cs = 0.0; //local common substring
        var trans = 0.0;  //cost of transpositions ('axb' vs 'xba')
        var offset_arr = new LinkedList<OffsetPair>();  //offset pair array, for computing the transpositions

        while ((c1 < l1) && (c2 < l2))
        {
            if (_options.TokenMatcher(t1[c1], t2[c2]))
            {
                local_cs += _options.MatchingEvaluator(t1[c1], t2[c2]);
                var isTransposition = false;
                var op = offset_arr.First;
                while (op != null)
                {  //see if current match is a transposition
                    var ofs = op.Value;
                    if (c1 <= ofs.C1 || c2 <= ofs.C2)
                    {
                        // when two matches cross, the one considered a transposition is the one with the largest difference in offsets
                        isTransposition = Math.Abs(c2 - c1) >= Math.Abs(ofs.C2 - ofs.C1);
                        if (isTransposition)
                        {
                            trans += _options.TranspositionCostEvaluator(c1, c2);
                        }
                        else
                        {
                            if (!ofs.IsTransposition)
                            {
                                ofs.IsTransposition = true;
                                trans += _options.TranspositionCostEvaluator(ofs.C1, ofs.C2);
                            }
                        }
                        break;
                    }
                    else
                    {
                        var next_op = op.Next;
                        if (c1 > ofs.C2 && c2 > ofs.C1)
                        {
                            offset_arr.Remove(op);
                        }
                        op = next_op;
                    }
                }
                offset_arr.AddLast(new OffsetPair(c1, c2)
                {
                    IsTransposition = isTransposition
                });
            }
            else
            {
                lcss += _options.LocalLengthEvaluator(local_cs);
                local_cs = 0;
                if (c1 != c2)
                {
                    c1 = c2 = Math.Min(c1, c2);  //using min allows the computation of transpositions
                }
                if (_options.MaxDistance != null)
                {
                    var temporaryDistance = _options.LocalLengthEvaluator(Math.Max(c1, c2)) - _options.TranspositionsEvaluator(lcss, trans);
                    if (temporaryDistance > _options.MaxDistance) return Math.Round(temporaryDistance, MidpointRounding.AwayFromZero);
                }
                //if matching tokens are found, remove 1 from both cursors (they get incremented at the end of the loop)
                //so that we can have only one code block handling matches 
                for (var i = 0; i < maxOffset && (c1 + i < l1 || c2 + i < l2); i++)
                {
                    if ((c1 + i < l1) && _options.TokenMatcher(t1[c1 + i], t2[c2]))
                    {
                        c1 += i - 1;
                        c2--;
                        break;
                    }
                    if ((c2 + i < l2) && _options.TokenMatcher(t1[c1], t2[c2 + i]))
                    {
                        c1--;
                        c2 += i - 1;
                        break;
                    }
                }
            }
            c1++;
            c2++;
            // this covers the case where the last match is on the last token in list, so that it can compute transpositions correctly
            if ((c1 >= l1) || (c2 >= l2))
            {
                lcss += _options.LocalLengthEvaluator(local_cs);
                local_cs = 0;
                c1 = c2 = Math.Min(c1, c2);
            }
        }
        lcss += _options.LocalLengthEvaluator(local_cs);
        return Math.Round(_options.LocalLengthEvaluator(Math.Max(l1, l2)) - _options.TranspositionsEvaluator(lcss, trans), MidpointRounding.AwayFromZero); //apply transposition cost to the final result
    }

    /// <summary>
    /// Static distance algorithm working on strings, computing transpositions as well as stopping when maxDistance was reached.
    /// </summary>
    /// <param name="s1"></param>
    /// <param name="s2"></param>
    /// <param name="maxOffset"></param>
    /// <param name="maxDistance"></param>
    /// <returns></returns>
    public static double CommonDistance(string s1, string s2, int maxOffset, int maxDistance = 0)
    {
        var l1 = s1 == null ? 0 : s1.Length;
        var l2 = s2 == null ? 0 : s2.Length;

        if (l1 == 0) return l2;
        if (l2 == 0) return l1;

        var c1 = 0;  //cursor for string 1
        var c2 = 0;  //cursor for string 2
        var lcss = 0;  //largest common subsequence
        var local_cs = 0; //local common substring
        var trans = 0;  //number of transpositions ('axb' vs 'xba')
        var offset_arr = new LinkedList<OffsetPair>();  //offset pair array, for computing the transpositions

        while ((c1 < l1) && (c2 < l2))
        {
            if (s1[c1] == s2[c2])
            {
                local_cs++;
                var isTransposition = false;
                var op = offset_arr.First;
                while (op != null)
                {  //see if current match is a transposition
                    var ofs = op.Value;
                    if (c1 <= ofs.C1 || c2 <= ofs.C2)
                    {
                        // when two matches cross, the one considered a transposition is the one with the largest difference in offsets
                        isTransposition = Math.Abs(c2 - c1) >= Math.Abs(ofs.C2 - ofs.C1);
                        if (isTransposition)
                        {
                            trans++;
                        }
                        else
                        {
                            if (!ofs.IsTransposition)
                            {
                                ofs.IsTransposition = true;
                                trans++;
                            }
                        }
                        break;
                    }
                    else
                    {
                        var next_op = op.Next;
                        if (c1 > ofs.C2 && c2 > ofs.C1)
                        {
                            offset_arr.Remove(op);
                        }
                        op = next_op;
                    }
                }
                offset_arr.AddLast(new OffsetPair(c1, c2)
                {
                    IsTransposition = isTransposition
                });
            }
            else
            {
                lcss += local_cs;
                local_cs = 0;
                if (c1 != c2)
                {
                    c1 = c2 = Math.Min(c1, c2);  //using min allows the computation of transpositions
                }
                if (maxDistance > 0)
                {
                    var temporaryDistance = Math.Max(c1, c2) - (lcss - trans);
                    if (temporaryDistance > maxDistance) return temporaryDistance;
                }
                //if matching tokens are found, remove 1 from both cursors (they get incremented at the end of the loop)
                //so that we can have only one code block handling matches 
                for (var i = 0; i < maxOffset && (c1 + i < l1 || c2 + i < l2); i++)
                {
                    if ((c1 + i < l1) && s1[c1 + i] == s2[c2])
                    {
                        c1 += i - 1;
                        c2--;
                        break;
                    }
                    if ((c2 + i < l2) && s1[c1] == s2[c2 + i])
                    {
                        c1--;
                        c2 += i - 1;
                        break;
                    }
                }
            }
            c1++;
            c2++;
            // this covers the case where the last match is on the last token in list, so that it can compute transpositions correctly
            if ((c1 >= l1) || (c2 >= l2))
            {
                lcss += local_cs;
                local_cs = 0;
                c1 = c2 = Math.Min(c1, c2);
            }
        }
        lcss += local_cs;
        return Math.Max(l1, l2) - (lcss - trans); //apply transposition cost to the final result
    }

    /// <summary>
    /// Standard Sift algorithm, using strings and taking only maxOffset as a parameter
    /// </summary>
    /// <param name="s1"></param>
    /// <param name="s2"></param>
    /// <param name="maxOffset"></param>
    /// <returns></returns>
    public static int SimplestDistance(string s1, string s2, int maxOffset)
    {
        var l1 = s1 == null ? 0 : s1.Length;
        var l2 = s2 == null ? 0 : s2.Length;

        if (l1 == 0) return l2;
        if (l2 == 0) return l1;

        var c1 = 0;  //cursor for string 1
        var c2 = 0;  //cursor for string 2
        var lcss = 0;  //largest common subsequence
        var local_cs = 0; //local common substring

        while ((c1 < l1) && (c2 < l2))
        {
            if (s1[c1] == s2[c2])
            {
                local_cs++;
            }
            else
            {
                lcss += local_cs;
                local_cs = 0;
                if (c1 != c2)
                {
                    c1 = c2 = Math.Max(c1, c2);
                }
                //if matching tokens are found, remove 1 from both cursors (they get incremented at the end of the loop)
                //so that we can have only one code block handling matches 
                for (var i = 0; i < maxOffset && (c1 + i < l1 && c2 + i < l2); i++)
                {
                    if ((c1 + i < l1) && s1[c1 + i] == s2[c2])
                    {
                        c1 += i - 1;
                        c2--;
                        break;
                    }
                    if ((c2 + i < l2) && s1[c1] == s2[c2 + i])
                    {
                        c1--;
                        c2 += i - 1;
                        break;
                    }
                }
            }
            c1++;
            c2++;
        }
        lcss += local_cs;
        return Math.Max(l1, l2) - lcss;
    }

    private class OffsetPair
    {
        public int C1 { get; set; }
        public int C2 { get; set; }
        public bool IsTransposition { get; set; }

        public OffsetPair(int c1, int c2)
        {
            this.C1 = c1;
            this.C2 = c2;
            this.IsTransposition = false;
        }
    }

    public class Options
    {
        /// <summary>
        /// If set, the algorithm will stop if the distance reaches this value
        /// </summary>
        public int? MaxDistance { get; set; }

        /// <summary>
        /// The function that turns strings into a list of tokens (also strings)
        /// </summary>
        public Func<string, string[]> Tokenizer { get; set; }

        /// <summary>
        /// The function that determines if two tokens are matching (similar to characters being the same in the simple algorithm)
        /// </summary>
        public Func<string, string, bool> TokenMatcher { get; set; }

        /// <summary>
        /// The function that determines the value of a match of two tokens (the equivalent of adding 1 to the lcss when two characters match)
        /// This assumes that the TokenMatcher function is a lot less expensive than this evaluator. If that is not the case, 
        /// you can optimize the speed of the algorithm by using only the matching evaluator and then calculating if two tokens match on the returned value.
        /// </summary>
        public Func<string, string, double> MatchingEvaluator { get; set; }

        /// <summary>
        /// Determines if the local value (computed on subsequent matched tokens) must be modified.
        /// In case one wants to reward longer matched substrings, for example, this is what you need to change.
        /// </summary>
        public Func<double, double> LocalLengthEvaluator { get; set; }

        /// <summary>
        /// The function determining the cost of an individual transposition, based on its counter positions.
        /// </summary>
        public Func<int, int, double> TranspositionCostEvaluator { get; set; }

        /// <summary>
        /// The function determining how the cost of transpositions affects the final result
        /// Change it if it doesn't suit you.
        /// </summary>
        public Func<double, double, double> TranspositionsEvaluator { get; set; }
    }
}

PHP implementation [expand/collapse]

// Sift4 - common version
// online algorithm to compute the distance between two strings in O(n)
// $maxOffset is the number of characters to search for matching letters
// $maxDistance is the distance at which the algorithm should stop computing the value and just exit (the strings are too different anyway)
function sift4($s1, $s2, $maxOffset, $maxDistance = 0) {
    if (!$s1 || !strlen($s1)) {
        if (!$s2) {
            return 0;
        }
        return strlen($s2);
    }
    if (!$s2 || !strlen($s2)) {
        return strlen($s1);
    }
    $l1 = strlen($s1);
    $l2 = strlen($s2);
    $c1 = 0; //cursor for string 1
    $c2 = 0; //cursor for string 2
    $lcss = 0; //largest common subsequence
    $local_cs = 0; //local common substring
    $trans = 0; //number of transpositions ('ab' vs 'ba')
    $offset_arr = array(); //offset pair array, for computing the transpositions
    while (($c1 < $l1) && ($c2 < $l2)) {
        if (substr($s1, $c1, 1) == substr($s2, $c2, 1)) {
            $local_cs++;
            $isTrans = false;
            $i = 0;
            while ($i < sizeof($offset_arr)) { //see if current match is a transposition
                $ofs = $offset_arr[$i];
                if ($c1 <= $ofs['c1'] || $c2 <= $ofs['c2']) {
                    $isTrans = abs($c2 - $c1) >= abs($ofs['c2'] - $ofs['c1']);
                    if ($isTrans) {
                        $trans++;
                    } else {
                        if (!$ofs['trans']) {
                            $ofs['trans'] = true;
                            $trans++;
                        }
                    }
                    break;
                } else {
                    if ($c1 > $ofs['c2'] && $c2 > $ofs['c1']) {
                        array_splice($offset_arr, $i, 1);
                    } else {
                        $i++;
                    }
                }
            }
            array_push($offset_arr, array('c1' = > $c1, 'c2' = > $c2, 'trans' = > $isTrans));
        } else {
            $lcss+= $local_cs;
            $local_cs = 0;
            if ($c1 != $c2) {
                $c1 = $c2 = min($c1, $c2); //using min allows the computation of transpositions
            }
            if ($maxDistance) {
                $temporaryDistance = max($c1, $c2) - $lcss + $trans;
                if ($temporaryDistance > $maxDistance) return $temporaryDistance;
            }

            //if matching characters are found, remove 1 from both cursors (they get incremented at the end of the loop)
            //so that we can have only one code block handling matches
            for ($i = 0;$i < $maxOffset && ($c1 + $i < $l1 || $c2 + $i < $l2);$i++) {
                if (($c1 + $i < $l1) && (substr($s1, $c1 + $i, 1) == substr($s2, $c2, 1))) {
                    $c1+= $i - 1;
                    $c2--;
                    break;
                }
                if (($c2 + $i < $l2) && (substr($s1, $c1, 1) == substr($s2, $c2 + $i, 1))) {
                    $c1--;
                    $c2+= $i - 1;
                    break;
                }
            }
        }
        $c1++;
        $c2++;
        // this covers the case where the last match is on the last token in list, so that it can compute transpositions correctly
        if (($c1 >= $l1) || ($c2 >= $l2)) {
            $lcss+= $local_cs;
            $local_cs = 0;
            $c1 = $c2 = min($c1, $c2);
        }
    }
    $lcss+= $local_cs;
    return max($l1, $l2) - $lcss + $trans; //apply transposition cost to final result
    
}

Thanks to Ferenc Szatmári for corrections in the PHP code

The Simplest and General versions of the algorithm remain as an exercise to you, since I haven't been working in PHP for more than a decade.

T-SQL implementation [expand/collapse]

---<summary>
---Static distance algorithm working on strings, computing transpositions as well as stopping when maxDistance was reached.
---</summary>
---<param name="s1"></param>
---<param name="s2"></param>
---<param name="maxOffset"></param>
---<param name="maxDistance"></param>
---<returns></returns>
CREATE FUNCTION Sift4common(@s1          NVARCHAR(max), 
                            @s2          NVARCHAR(max), 
                            @maxOffset   INT, 
                            @maxDistance INT) 
returns INT 
AS 
  BEGIN 
      DECLARE @l1 INT = Len(Isnull(@s1, '')); 
      DECLARE @l2 INT = Len(Isnull(@s2, '')); 

      IF ( @l1 = 0 ) 
        RETURN @l2; 

      IF ( @l2 = 0 ) 
        RETURN @l1; 

      DECLARE @c1 INT = 0; 
      DECLARE @c2 INT = 0; 
      DECLARE @lcss INT = 0; 
      DECLARE @local_cs INT = 0; 
      DECLARE @trans INT = 0; 
      DECLARE @offset_arr TABLE 
        ( 
           C1    INT, 
           C2    INT, 
           Trans BIT 
        ) 
      DECLARE @i INT 
      DECLARE @temporaryDistance FLOAT 
      DECLARE @result INT 
      DECLARE @oc1 INT, 
              @oc2 INT, 
              @otr BIT 
      DECLARE @isTrans BIT 

      WHILE ( ( @c1 < @l1 ) 
              AND ( @c2 < @l2 ) ) 
        BEGIN 
            IF ( Substring(@s1, @c1 + 1, 1) = Substring(@s2, @c2 + 1, 1) ) 
              BEGIN 
                  SET @local_cs=@local_cs + 1; 
                  SET @isTrans=0 
                  SET @oc1=NULL; 

                  SELECT TOP 1 @oc1 = o.C1, 
                               @oc2 = o.C2, 
                               @otr = o.Trans 
                  FROM   @offset_arr o 
                  WHERE  @c1 <= o.C1 
                          OR @c2 <= o.C2 

                  IF ( @oc1 IS NOT NULL ) 
                    BEGIN 
                        SET @isTrans=CASE 
                                       WHEN Abs(@c2 - @c1) >= Abs(@oc2 - @oc1) 
                                     THEN 1 
                                       ELSE 0 
                                     END 

                        IF ( @isTrans = 1 ) 
                          BEGIN 
                              SET @trans=@trans + 1 
                          END 
                        ELSE IF ( @otr = 0 ) 
                          BEGIN 
                              SET @trans=@trans + 1 

                              UPDATE @offset_arr 
                              SET    Trans = 1 
                              WHERE  C1 = @oc1 
                                     AND C2 = @oc2 
                          END 
                    END 

                  DELETE FROM @offset_arr 
                  WHERE  @c1 > C1 
                         AND @c1 > C2 
                         AND @c2 > C1 
                         AND @c2 > C2; 

                  INSERT INTO @offset_arr 
                  VALUES      (@c1, 
                               @c2, 
                               @isTrans); 
              END 
            ELSE 
              BEGIN 
                  SET @lcss = @lcss + @local_cs; 
                  SET @local_cs = 0; 

                  IF ( @c1 != @c2 ) 
                    -- using min allows the computation of transpositions  
                    BEGIN 
                        IF ( @c1 < @c2 ) 
                          BEGIN 
                              SET @c2=@c1; 
                          END 
                        ELSE 
                          BEGIN 
                              SET @c1=@c2; 
                          END 
                    END 
                IF ( @maxDistance > 0 ) 
                  BEGIN 
                    IF ( @c1 > @c2 ) 
                    BEGIN 
                      SET @temporaryDistance = @c1 - ( @lcss - @trans / 2.0 ); 
                    END 
                    ELSE 
                    BEGIN 
                      SET @temporaryDistance = @c2 - ( @lcss - @trans / 2.0 ); 
                    END 

                  IF ( @temporaryDistance > @maxDistance ) 
                    RETURN Round(@temporaryDistance, 0); 
              END 


                  --if matching tokens are found, remove 1 from both cursors (they get incremented at the end of the loop)
                  --so that we can have only one code block handling matches   
                  SET @i=0; 

                  WHILE ( @i < @maxOffset 
                          AND ( @c1 + @i < @l1 
                                 OR @c2 + @i < @l2 ) ) 
                    BEGIN 
                        IF ( ( @c1 + @i < @l1 ) 
                             AND Substring(@s1, @c1 + @i + 1, 1) = 
                                 Substring(@s2, @c2 + 1, 1) 
                           ) 
                          BEGIN 
                              SET @c1 = @c1 + @i - 1; 
                              SET @c2=@c2 - 1; 

                              BREAK; 
                          END 

                        IF ( ( @c2 + @i < @l2 ) 
                             AND Substring(@s1, @c1 + 1, 1) = Substring(@s2, 
                                                              @c2 + @i + 1, 1 
                                                              ) 
                           ) 
                          BEGIN 
                              SET @c1 = @c1 - 1; 
                              SET @c2=@c2 + @i - 1; 

                              BREAK; 
                          END 

                        SET @i=@i + 1; 
                    END 
              END 

            SET @c1=@c1 + 1; 
            SET @c2=@c2 + 1; 

            IF ( ( @c1 >= @l1 ) 
                  OR ( @c2 >= @l2 ) ) 
              BEGIN 
                  SET @lcss = @lcss + @local_cs; 
                  SET @local_cs = 0; 

                  IF ( @c1 < @c2 ) 
                    BEGIN 
                        SET @c2=@c1; 
                    END 
                  ELSE 
                    BEGIN 
                        SET @c1=@c2; 
                    END 
              END 
        END 

      SET @lcss = @lcss + @local_cs; 

      --apply the transposition cost to the final result 
      IF ( @l1 > @l2 ) 
        BEGIN 
            SET @result = @l1 - ( @lcss - @trans ); 
        END 
      ELSE 
        BEGIN 
            SET @result = @l2 - ( @lcss - @trans ); 
        END 

      RETURN @result; 
  END

Clearly a general version of the algorithm is not possible in Transact SQL.

Here is a MySQL version, gracefully provided by Ferenc Szatmári:

BEGIN 
DECLARE l1 INT DEFAULT Length(IFNULL(s1, '')); 
DECLARE l2 INT DEFAULT Length(IFNULL(s2, '')); 


DECLARE c1 INT Default 0; 
DECLARE c2 INT default 0; 
DECLARE lcss INT default 0; 
DECLARE local_cs INT default 0; 
DECLARE trans INT default 0; 
DECLARE i INT;
DECLARE temporaryDistance FLOAT;
DECLARE result INT;
DECLARE oc1 INT;
DECLARE oc2 INT;
DECLARE otr smallint;
DECLARE isTrans smallint;

drop temporary table if exists offset_arr;
CREATE TEMPORARY TABLE IF not EXISTS offset_arr
( 
C1 INT, 
C2 INT,
Trans BIT
)engine=memory;


/*      delete from offset_arr;*/


IF l1 = 0 THEN
RETURN l2; 
END IF;
IF l2 = 0 THEN 
RETURN l1; 
END IF;  






WHILE ( ( c1 < l1 ) AND ( c2 < l2 ) ) 
DO 
IF ( Substring(s1, c1 + 1, 1) = Substring(s2, c2 + 1, 1) ) THEN

SET local_cs=local_cs + 1; 
SET isTrans=0;
SET oc1=NULL;
SELECT  o.C1, o.C2,o.Trans  into oc1, oc2, otr
FROM offset_arr o 
WHERE c1 <= o.C1 OR c2 <= o.C2
LIMIT 1;
IF oc1 IS NOT NULL THEN

SET isTrans=CASE WHEN ABS(c2-c1)>=ABS(oc2-oc1) THEN 1 ELSE 0 END;
IF (isTrans=1) THEN
SET trans=trans+1;
ELSE
IF (otr=0) THEN


SET trans=trans+1;
UPDATE offset_arr SET Trans=1 WHERE C1=oc1 AND C2=oc2;
END IF;
END IF;  
END IF;


DELETE FROM offset_arr 
WHERE  c1 > C1 
AND c1 > C2
AND c2 > C1 
AND c2 > C2; 


INSERT INTO offset_arr 
VALUES (c1, c2, isTrans); 

ELSE 

SET lcss = lcss + local_cs; 
SET local_cs = 0; 


IF ( c1 != c2 ) THEN
-- using min allows the computation of transpositions 


IF ( c1 < c2 ) THEN
SET c2=c1; 
ELSE 
SET c1=c2; 
END IF;

IF ( maxDistance > 0 ) THEN


IF ( c1 > c2 ) THEN
SET temporaryDistance = c1 - ( lcss - trans / 2.0 ); 
ELSE 
SET temporaryDistance = c2 - ( lcss - trans / 2.0 ); 
END IF;


IF ( temporaryDistance > maxDistance ) THEN
RETURN Round(temporaryDistance, 0); 
END IF;  
END IF;

END IF;


/*if matching tokens are found, remove 1 from both cursors (they get incremented at the end of the loop)
--so that we can have only one code block handling matches  */
SET i=0; 


label : WHILE ( i < maxOffset AND (c1 + i < l1 OR c2 + i < l2) ) do


IF ( ( c1 + i < l1 ) 
AND Substring(s1, c1 + i + 1, 1) = Substring(s2, c2 + 1, 1) 
) THEN


SET c1 = c1 + i - 1; 
SET c2=c2 - 1; 


leave label; 
END IF; 


IF ( ( c2 + i < l2 ) 
AND Substring(s1, c1 + 1, 1) = Substring(s2, c2 + i + 1, 1) 
)THEN 


SET c1 = c1 - 1; 
SET c2=c2 + i - 1; 


leave label;  
END IF;


SET i=i + 1; 
END WHILE;
END IF; 


SET c1=c1 + 1; 
SET c2=c2 + 1; 


IF ( ( c1 >= l1 ) OR ( c2 >= l2 ) ) THEN
SET lcss = lcss + local_cs; 
SET local_cs = 0; 


IF ( c1 < c2 ) THEN
SET c2=c1; 
ELSE 
SET c1=c2; 
END IF;
END IF;
END while;


SET lcss = lcss + local_cs; 


/*apply the transposition cost to the final result*/
IF ( l1 > l2 ) THEN
SET result = l1 - (lcss - trans);
ELSE 
SET result = l2 - (lcss - trans);
END IF;
RETURN result; 
END

Java implementation [expand/collapse]

Here is a Java version, gracefully provided by Nathanæl Fischer:

/**
 * Sift4 - common version
 * online algorithm to compute the distance between two strings in O(n)
 * Algorithm by siderite, java port by Nathan Fischer 2016
 * /blog/super-fast-and-accurate-string-distance.html
 * @param s1
 * @param s2
 * @param maxOffset the number of characters to search for matching letters
 * @return
 */
public static double sift4(String s1, String s2, int maxOffset) {
    class Offset {
        int c1;
        int c2;
        boolean trans;

        Offset(int c1, int c2, boolean trans) {
            this.c1 = c1;
            this.c2 = c2;
            this.trans = trans;
        }
    }

    if (s1 == null || s1.isEmpty())
        return s2 == null ? 0 : s2.length();

    if (s2 == null || s2.isEmpty())
        return s1.length();

    int l1 = s1.length();
    int l2 = s2.length();

    int c1 = 0; //cursor for string 1
    int c2 = 0; //cursor for string 2
    int lcss = 0; //largest common subsequence
    int local_cs = 0; //local common substring
    int trans = 0; //number of transpositions ('ab' vs 'ba')
    LinkedList < Offset > offset_arr = new LinkedList < > (); //offset pair array, for computing the transpositions

    while ((c1 < l1) && (c2 < l2)) {
        if (s1.charAt(c1) == s2.charAt(c2)) {
            local_cs++;
            boolean isTrans = false;
            //see if current match is a transposition
            int i = 0;
            while (i < offset_arr.size()) {
                Offset ofs = offset_arr.get(i);
                if (c1 <= ofs.c1 || c2 <= ofs.c2) {
                    // when two matches cross, the one considered a transposition is the one with the largest difference in offsets
                    isTrans = Math.abs(c2 - c1) >= Math.abs(ofs.c2 - ofs.c1);
                    if (isTrans) {
                        trans++;
                    } else {
                        if (!ofs.trans) {
                            ofs.trans = true;
                            trans++;
                        }
                    }
                    break;
                } else {
                    if (c1 > ofs.c2 && c2 > ofs.c1) {
                        offset_arr.remove(i);
                    } else {
                        i++;
                    }
                }
            }
            offset_arr.add(new Offset(c1, c2, isTrans));
        } else {
            lcss += local_cs;
            local_cs = 0;
            if (c1 != c2) {
                c1 = c2 = Math.min(c1, c2); //using min allows the computation of transpositions
            }
            //if matching characters are found, remove 1 from both cursors (they get incremented at the end of the loop)
            //so that we can have only one code block handling matches
            for (int i = 0; i < maxOffset && (c1 + i < l1 || c2 + i < l2); i++) {
                if ((c1 + i < l1) && (s1.charAt(c1 + i) == s2.charAt(c2))) {
                    c1 += i - 1;
                    c2--;
                    break;
                }
                if ((c2 + i < l2) && (s1.charAt(c1) == s2.charAt(c2 + i))) {
                    c1--;
                    c2 += i - 1;
                    break;
                }
            }
        }
        c1++;
        c2++;
        // this covers the case where the last match is on the last token in list, so that it can compute transpositions correctly
        if ((c1 >= l1) || (c2 >= l2)) {
            lcss += local_cs;
            local_cs = 0;
            c1 = c2 = Math.min(c1, c2);
        }
    }
    lcss += local_cs;
    return Math.round(Math.max(l1, l2) - lcss + trans); //add the cost of transpositions to the final result
}

Powershell implementation [expand/collapse]

Powershell implementation of simple Sift4, by Kirk Sayre:

function Calc-Sift4Distance 
{
  <# 
      .SYNOPSIS 

      Compute the edit distance between 2 strings using the sift4 string
      edit distance algorithm.

      .PARAMETER s1

      The 1st string

      .PARAMETER s2

      The 2nd string

      .PARAMETER maxOffset

      The maximum common substring length for which to search. The default
      is 10.

      .RETURN

      The # of edits needed to make the given 2 strings equal.
  #>

  Param(
    [Parameter(Mandatory = $True, ValueFromPipelineByPropertyName = $True)]
    [String]
    $s1,

    [Parameter(Mandatory = $True, ValueFromPipelineByPropertyName = $True)]
    [String]
    $s2,

    [Parameter(ValueFromPipelineByPropertyName = $True)]
    [Int]
    $maxOffset = 10
  )

  # Handle null or empty strings.
  if ((-not $s1) -or ($s1.length -eq 0)) 
  {
    if (-not $s2) 
    {
      return 0
    }
    return $s2.length
  }

  if ((-not $s2) -or ($s2.length -eq 0)) 
  {
    return $s1.length
  }

  # Initialization.
  $l1 = $s1.length
  $l2 = $s2.length
  $c1 = 0 # Cursor for string 1.
  $c2 = 0 # Cursor for string 2.
  $lcss = 0 # Largest common subsequence.
  $local_cs = 0 # Local common substring.

  # Scan strings.
  while (($c1 -lt $l1) -and ($c2 -lt $l2)) 
  {
    if ($s1[$c1] -eq $s2[$c2]) 
    {
      $local_cs++
    }
    else 
    {
      $lcss += $local_cs
      $local_cs = 0
      if ($c1 -ne $c2) 
      {
        # Using max to bypass the need for computer transpositions ('ab' vs 'ba').
        $c1 = $c2 = (@($c1, $c2) | Measure-Object -Maximum).Maximum
      }

      for ($i = 0; (($i -lt $maxOffset) -and ((($c1 + $i) -lt $l1) -or (($c2 + $i) -lt $l2))); $i++) 
      {
        if ((($c1 + $i) -lt $l1) -and ($s1[$c1 + $i] -eq $s2[$c2])) 
        {
          $c1 += $i
          $local_cs++
          break
        }

        if ((($c1 + $i) -lt $l2) -and ($s1[$c1] -eq $s2[$c2 + $i])) 
        {
          $c2 += $i
          $local_cs++
          break
        }
      }
    }
    $c1++
    $c2++
  }
  $lcss += $local_cs
  return [math]::Round((@($l1, $l2) | Measure-Object -Maximum).Maximum - $lcss)
}

C++ implementation [expand/collapse]

Thanks to Hugo Amaro, a C++ implementation:

struct sift_offset {
  int c1;
  int c2;
  bool trans;
};

template < typename T >
  int sift4(T * s1, int s1_size, T * s2, int s2_size, int maxOffset, int maxDistance) {
    if (!s1 || !s1_size) {
      if (!s2) {
        return 0;
      }
      return s2_size;
    }

    if (!s2 || !s2_size) {
      return s1_size;
    }

    int l1 = s1_size;
    int l2 = s2_size;

    int c1 = 0; //cursor for string 1
    int c2 = 0; //cursor for string 2
    int lcss = 0; //largest common subsequence
    int local_cs = 0; //local common substring
    int trans = 0; //number of transpositions ('ab' vs 'ba')
    std::vector < sift_offset > offset_arr; //offset pair array, for computing the transpositions

    while ((c1 < l1) && (c2 < l2)) {
      if (s1[c1] == s2[c2]) {
        local_cs++;
        bool isTrans = false;
        //see if current match is a transposition
        int i = 0;
        while (i < offset_arr.size()) {
          sift_offset ofs = offset_arr[i];
          if (c1 <= ofs.c1 || c2 <= ofs.c2) {
            // when two matches cross, the one considered a transposition is the one with the largest difference in offsets
            isTrans = std::abs(c2 - c1) >= std::abs(ofs.c2 - ofs.c1);

            if (isTrans) {
              trans++;
            } else {
              if (!ofs.trans) {
                ofs.trans = true;
                trans++;
              }
            }
            break;
          } else {
            if (c1 > ofs.c2 && c2 > ofs.c1) {
              offset_arr.erase(offset_arr.begin() + i);

            } else {
              i++;
            }
          }
        }
        offset_arr.push_back({
          c1,
          c2,
          isTrans
        });
      } else {
        lcss += local_cs;
        local_cs = 0;
        if (c1 != c2) {
          c1 = c2 = (int) min(c1, c2); //using min allows the computation of transpositions
        }
        //if matching characters are found, remove 1 from both cursors (they get incremented at the end of the loop)
        //so that we can have only one code block handling matches 
        for (int i = 0; i < maxOffset && (c1 + i < l1 || c2 + i < l2); i++) {
          if ((c1 + i < l1) && (s1[c1 + i] == s2[c2])) {
            c1 += i - 1;
            c2--;
            break;
          }
          if ((c2 + i < l2) && (s1[c1] == s2[c2 + i])) {
            c1--;
            c2 += i - 1;
            break;
          }
        }
      }
      c1++;
      c2++;
      if (maxDistance) {
        int temporaryDistance = (int) max(c1, c2) - lcss + trans;
        if (temporaryDistance >= maxDistance) return std: round(temporaryDistance);
      }

    }
    // this covers the case where the last match is on the last token in list, so that it can compute transpositions correctly
    if ((c1 >= l1) || (c2 >= l2)) {
      lcss += local_cs;
      local_cs = 0;
      c1 = c2 = (int) min(c1, c2);
    }
  }
lcss += local_cs;
return std::round((int) max(l1, l2) - lcss + trans); //add the cost of transpositions to the final result
}

You can find a Go implementation here, written by Jason W. Hutchinson.
There is also a Swift implementation here.
A Perl 6 (now called Raku) implementation can be found here.

Changing the grouping template of the AngularJS grid

Published Jul 1, 2014

and has 1 comment

I had this AngularJS grid that I wanted to show totals for certain categories and types. To be more precise, I had a list of items with Category and Type and I want to know the total for all categories and, for each category, the total for each type. This works perfectly if I load all items as individual, but I had hundred of thousands of items so it was clearly impractical. The solution? Send totals for every category and type, then just add them in the grid. In order to do that, though, I had to change the template of the "grouping row", the one that in ngGrid has the ngAggregate class.

It seems that all that is required for that is to change the aggregate template in the grid options. If you are not interested in the details, jump directly to the solution.

There is already an aggregate template in ngGrid.js, one that (at this time) looks like this:

<div ng-click="row.toggleExpand()" ng-style="rowStyle(row)" class="ngAggregate">
    <span class="ngAggregateText">{{row.label CUSTOM_FILTERS}} ({{row.totalChildren()}} {{AggItemsLabel}})</span>
    <div class="{{row.aggClass()}}"></div>
</div>

So we see that that the number displayed in an aggregate row is coming from a function of the row object called totalChildren, which is defined in ngAggregate.prototype and looks like this:

ngAggregate.prototype.totalChildren = function () {
    if (this.aggChildren.length > 0) {
        var i = 0;
        var recurse = function (cur) {
            if (cur.aggChildren.length > 0) {
                angular.forEach(cur.aggChildren, function (a) {
                    recurse(a);
                });
            } else {
                i += cur.children.length;
            }
        };
        recurse(this);
        return i;
    } else {
        return this.children.length;
    }
};

Maybe one could change the function to cover specific types of objects and return a sum instead of a count, but that is not the scope of the current post.

The solution described here will involve a custom function and a custom template. Here is how you do it:

Define the options for the grid. I am sure you already have it defined somewhere, if not, it is advisable you would. Sooner or later you will want to customize the output and functionality.
Add a new property to the options called aggregateTemplate. This will look probably like the default template, but with another function instead of totalChildren.
Define the function that will aggregate the items.

Solution 1:

Define template:

$scope.gridOptions = {
        data: 'Data.Units',
        enableColumnResize: true,
        showColumnMenu: false,
        showFilter: true,
        multiSelect: false,
        showGroupPanel: true,
        enablePaging: false,
        showFooter: true,
        columnDefs: [
            { field: 'Country', displayName: 'Country', width: 190 },
            { field: 'Type', displayName: 'Unit Type' },
            { field: 'Count', displayName: 'Count', width: 180}
        ],
        aggregateTemplate: "<div ng-click=\"row.toggleExpand()\" ng-style=\"rowStyle(row)\" class=\"ngAggregate\">" +
    "    <span class=\"ngAggregateText\">{{row.label CUSTOM_FILTERS}} ({{aggFunc(row)}} {{AggItemsLabel}})</span>" +
    "    <div class=\"{{row.aggClass()}}\"></div>" +
    "</div>"
    };

Define function:

$scope.aggFunc = function (row) {
        var sumColumn='Count';
        var total = 0;
        angular.forEach(row.children, function(entry) {
            total+=entry.entity[sumColumn];
        });
        angular.forEach(row.aggChildren, function(entry) {
            total+=$scope.aggFunc(entry);
        });
        return total;
    };

What we did here is we replaced row.totalChildren() with aggFunc(row) which we defined in the scope. What it does is add to the total the value of 'Count' rather than just count the items. It goes through row.children, which contains normal row items, then through aggChildren, which contains aggregate rows, which we pass through the same function in order to get their total.

Well, this works perfectly, but doesn't that mean we need to use this for each grid? There is a lot of code duplication. Let's first put the template in the cache so we can reuse it:

module.run(["$templateCache", function($templateCache) {

  $templateCache.put("aggregateCountTemplate.html",
    "<div ng-click=\"row.toggleExpand()\" ng-style=\"rowStyle(row)\" class=\"ngAggregate\">" +
    "    <span class=\"ngAggregateText\">{{row.label CUSTOM_FILTERS}} ({{aggFunc(row)}} {{AggItemsLabel}})</span>" +
    "    <div class=\"{{row.aggClass()}}\"></div>" +
    "</div>"
  );

}]);

Now the gridOptions change to

$scope.gridOptions = {
        [...]
        aggregateTemplate: "aggregateCountTemplate.html"
    };

and we can reuse the template anytime we want. Alternatively we can create a file and reference it, without using the cache:

$scope.gridOptions = {
        [...]
        aggregateTemplate: "/templates/aggregateCountTemplate.html"
    };

Now, if we could replace the aggFunc function with a row function, adding it to ngAggregate.prototype. Unfortunately we cannot do that, since ngAggregate is a 'private' object. The only thing we can do is to add some sort of static function. The solution is to add it in the root scope, so that is available everywhere.

Solution 2:

Here is the content of the file aggregateCountTemplateCache.js, that I created and load every time in the site. It does two things: inject the function in the root scope of the application and add the template to the cache. The only other thing to do is to use the aggregateTemplate: "aggregateCountTemplate.html" grid options.

var module = angular.module('app', ['ngResource', 'ui.bootstrap', 'ui', 'ngGrid', 'importTreeFilter', 'ui.date', 'SharedServices', 'ui.autocomplete']);

module.run(["$templateCache","$rootScope", function($templateCache,$rootScope) {

  $rootScope.aggregateCountFunc = function (row) {
        var total = 0;
        angular.forEach(row.children, function(entry) {
            total+=entry.entity.Count;
        });
        angular.forEach(row.aggChildren, function(entry) {
            total+=$rootScope.aggregateCountFunc(entry);
        });
        return total;
    };

  $templateCache.put("aggregateCountTemplate.html",
    "<div ng-click=\"row.toggleExpand()\" ng-style=\"rowStyle(row)\" class=\"ngAggregate\">" +
    "    <span class=\"ngAggregateText\">{{row.label CUSTOM_FILTERS}} ({{aggregateCountFunc(row)}}{{AggItemsLabel}})</span>" +
    "    <div class=\"{{row.aggClass()}}\"></div>" +
    "</div>"
  );

}]);

Enjoy!

Incredible loss of performance when encapsulating a DOM element in jQuery (1.7.2)

Published Mar 27, 2014

and has 0 comments

I was working on a pretty nice task that involved translating the text in a page in real time. For this I created a one page function that would do magic on elements that were added or changed in the page. On specific pages it moved with abysmal speed and I had no idea why. So I went to profile the thing and I was shocked to see that the problem did not come from my long piece of code, but from a simple encapsulation of an element in a jQuery object. I was using it only to have a nicer interface for getting the name of the element and changing an attribute. Here is the code:

var j=jQuery(elem);
if (j.is('img[alt]')) {
   j.attr('alt',translate(j.attr('alt')));
}

Replaced it with:

if (/^img$/i.test(elem.tagName)) {
  var alt=elem.getAttribute('alt');
  if (alt) {
    elem.setAttribute('alt',translate(alt));
  }
}

And it worked very fast indeed. The element might have been body so maybe the encapsulation tries to also parse the children or something like that or perhaps the problem was fixed with later versions of the library. However, think about how many times we used this kind of code without thinking twice about it. Think twice about it! :)

Accessing AngularJS services from outside AngularJS

Published Mar 25, 2014

and has 3 comments

If you read an Angular book or read a howto, you will think that Angular is the greatest discovery since fire. Everything is neatly stacked in modules, controllers, templates, directives, factories, etc. The problem comes when you want to use some code of your own, using simple Javascript that does specific work, and then you want to link it nicely with AngularJS. It is not always easy. My example concerns the simple display of a dialog which edits an object. I want it to work on every page, so I added it to the general layout template. The layout does not have a controller. Even if I add it, the dialog engine I have been using was buggy and I've decided to just use jQuery.dialog.

So here is my conundrum: How to load the content of a dialog from an Angular template, display it with jQuery.dialog, load the information with jQuery.get, then bind its input elements to an Angular scope object. I've tried the obvious: just load the template in the dialog and expect Angular to notice a new DOM element was added and parse it and work its magic. It didn't work. Why can't I just call an angular.refresh(elem); function and get it over with, I thought. There are several other solutions. One is to not create the content dynamically at all, just add it to the layout, mark it with ng-controller="something" and then, in the controller, save the object you are interested in or the scope as some sort of globally accessible object that you instantiate from jQuery.get. The dialog would just move the element around, afterwards. That means you need to create a controller, maybe in another file, to be nice, then load it into your page. Another is to create some sort of directive or script tag that loads the Angular template dynamically and to hope it works.

Long story short, none of these solutions appealed to me. I wanted a simple refresh(elem) function. And there is one. It is called angular.injector. You call it with the names of the modules you need to load ('ng' one of them and usually the main application module the second). The result is a function that can use invoke to get the same results as a controller constructor. And that is saying something: if you can do the work that the controller does in your block of code, you don't need a zillion controllers making your life miserable, nor do you need to mark the HTML uselessly for very simple functionality.

Without further ado, here is a function that takes as parameters an element and a data object. The function will force angular to compile said element like it was part of the angular main application, then bind to the main scope the properties of the data object:

function angularCompile(elem, data) {
    // create an injector
    var $injector = angular.injector(['ng','app']);
            
    // use the type inference to auto inject arguments, or use implicit injection
    $injector.invoke(function($rootScope, $compile, $document){
        var compiled = $compile(elem || $document);
        compiled($rootScope);
        if (data) {
            for (var k in data) {
                if (data.hasOwnProperty(k)) {
                    $rootScope[k]=data[k];
                }
            }
        }
           $rootScope.$digest();
    });
}

Example usage:

angularCompile(dialog[0],{editedObject: obj}); // will take the jQuery dialog element, compile it, and add to the scope the editedObject property with the value of obj.

Full code:

OpenTranslationDialog=function(Rule, onProceed, onCancel) {
  jQuery.ajax({
          type: 'GET',
          url: '/Content/ng-templates/dialogs/Translation.html',
          data: Rule,
          success: function(data) {
            var dialog=jQuery('<div></div>')
              .html(data)
              .dialog({
                resizable:true,
                width:700,
                modal:true,
                    buttons: {
                      "Save": function() {
                    var url='/some/api/url';
                    jQuery.ajax({
                        type:'PUT',
                        url:url,
                        data:Rule,
                        success:function() {
                          if (onProceed) onProceed();
                              $(this).dialog( "close" );
                        },
                        error:function() {
                          alert('There was an error saving the rule');
                        }
                      });
                      },
                      Cancel: function() {
                    if (onCancel) onCancel();
                          $(this).dialog( "close" );
                      }
                    }
              });

            angularCompile(dialog[0],{Rule:Rule});
          },
          error:function() {
              alert('There was an error getting the dialog template');
                  }
      });
}

Before you take my word on it, though, beware: I am an Angular noob and my desire here was to hack away at it in order to merge my own code with the nice structured code of my colleagues, who now hate me. Although they liked angular.injector when I showed it to them :)

Configuring the proxy settings via a script and testing it

Published Feb 4, 2014

and has 0 comments

If you go to the system Internet Settings (in Network Connections, or Internet Explorer or Chrome), and you advance to the Tab "Connections", then click LAN Settings, then go to Advanced... I mean, why wouldn't you? ... there is a checkbox called "Use automatic configuration script". The script is supposed to dynamically return the correct proxy for an URL. The practice is called Proxy Auto Configuration for some reason. The script is Javascript and it uses some predefined functions to return either "DIRECT" (don't use a proxy) or "PROXY address:port" (use the proxy at that address and port). You can chain the options by separating them with a semicolon like this: "PROXY 1.2.3.4:55 ; PROXY 10.20.30.40:50; DIRECT". And before you search like a madman for it, there is no way to specify the username/password for those proxy servers in your config file. You still have to type them when asked.

Use this solution to fix problems with proxies that work well for outside sites, but not for internal networks. For reasons too weird to explain here (but explained here: Understanding Web Proxy Configuration) you cannot just put your script on the local drive and use it, instead you have to read it from an http URL. If you don't have the possibility (or it's too annoying) to install IIS or some other web server in order to serve the pac file, try using it from the local drive with a file:// URL (not just C:\...). However, it is a deprecated method and you may experience issues, with .NET software or Internet Explorer 11, for example.

Here is a sample file that connects directly to any URL that is part of a domain or is part of an IP class:

function FindProxyForURL(url, host) {

var defProxy="10.20.30.40:50"; // the misbehaving or incomplete proxy

var domains=[
   ".mysite.com",
   ".xxx",
   "localhost"
];
var ipClasses=[
   "11.22.33.0",
   "55.0.0.0",
   "127.0.0.0"
];

for (var i=0; i<domains.length; i++) {
  if (dnsDomainIs(host,domains[i])) return "DIRECT";
}

var MYHOST = dnsResolve(host);

for (var i=0; i<ipClasses.length; i++) {
  var mask=getMask(ipClasses[i]);
  if (isInNet(MYHOST, ipClasses[i],mask)) return "DIRECT";
}

return "PROXY "+defProxy;

function getMask(ip) {
 var splits=ip.split('.');
 for (var i=0; i<splits.length; i++) {
  if (splits[i]!='0') splits[i]='255';
 }
    return splits.join('.');
}

}

Just add the domains or the IP classes to the arrays in order to connect directly to them. Do not forget to add the local IP classes as well for direct connection, including 127.0.0.0 to access your own localhost.

In order to test or debug your .pac files, use the PacParser open source utility. A reference to the functions you can use in your script can be found here: Using Automatic Configuration, Automatic Proxy, and Automatic Detection

Transposition of ranges in SQL and charts

Published Oct 25, 2013

and has 0 comments

I had this database table containing ranges (a start value and an end value). The challenge was creating a query that overlaps and transposes those ranges so I can say how many ranges are at any point in the total interval or values. As an example, "SELECT * FROM Ranges" would result in a table like:

Start	End
10	20
10	30
25	35
20	40

and I am looking for something like this:

Value	Count
0	0
1	0
...	...
10	2
11	2
...	...
24	2
25	3
26	3

A naive implementation would get the minimum Start (or start with 0, as I did) and the maximum End, create an in memory or temporary table (Values) from min to max using an ugly WHILE block, then join it with the Ranges tables something like:

SELECT v.Val,Count(1) as Nr
FROM #Values v
INNER JOIN Ranges r
ON r.Start<=v AND r.[End]>=v

This kind of works, but for large ranges it becomes difficult. It takes a lot to create the Values table and the join and for extreme cases, like I had with values from 0 to 6 billion, it becomes impossible. The bottleneck here is this Values table, which is pretty much a horror to create and maintain. But what if you don't need all the values?

Before I tell you the solution I found, be warned that you have to properly define what a range is. Is a range 10-20 actually 10-19? In my case it was so, so that is why there are some subtractions with 1 or less than rather than less or equal conditions.

The solution is this:

SELECT DISTINCT Val
INTO #Values
FROM (
  SELECT 0 as Val
    UNION ALL
  SELECT Start FROM Ranges
    UNION ALL
  SELECT [End]-1 FROM Ranges
) x
ORDER BY Val

The idea is that after you compute the ranges count per each of the start and end values you know that between one and the next the count of ranges will remain the same. The join is significantly faster, there is no ugly WHILE block and you don't need a 6 billion value table. It's easier to plot on a chart as well, with either of these variations:
See the variations:

SELECT v.Val,
       Count(r.Start) as Nr
FROM #Values v
LEFT JOIN Ranges r
  ON r.Start<=v.Val AND r.[End]>v.Val
GROUP BY v.Val

The result of the query above will be:

Value	Nr
0	0
10	2
19	2
20	2
25	3
29	3
34	2
39	1

Hope this helps you

OpenLayers & AngularJS - add features, choose their appearance and behaviour with clustering

Published Oct 10, 2013

and has 0 comments

Being a beginner in both OpenLayers and AngularJS it took me a long while to do this simple thing: add stuff on a map and make it show as I wanted. There were multiple gotchas and I intend to chronicle each and every one of those bastards.
First, while creating a map and doing all kinds of stuff with it using OpenLayers is a breeze, doing it "right" with AngularJS is not as simple. I thought I would not reinvent the wheel and looked for some integration of the two technologies and I found AzimuthJS. In order to add a map with Azimuth all you have to do is:

<div ol-map controls="zoom,navigation,layerSwitcher,attribution,mousePosition" control-opts="{navigation:{handleRightClicks:true}}">
    <az-layer name="Street" lyr-type="tiles"></az-layer>
    <az-layer name="Airports" lyr-type="geojson" lyr-url="examples/data/airports.json" projection="EPSG:4326"></az-layer>
</div>

You may notice that it has a simple syntax, it offers the possibility of multiple layers and one of them is even loading features dynamically from a URL. Perfect so far.
First problem: the API that I am using is not in the GeoJSON format that Azimuth know how to handle and I cannot or will not change the API. I've tried a lot of weird crap, including adding a callback on the loadend layer event for a GeoJson layer in order to reparse the data and configure what I wanted. It all worked, but it was incredibly ugly. I've managed to add the entire logic in a Javascript file and do it all in that event. It wasn't any different from doing it from scratch in Javascript without any Angular syntax, though. So what I did was to create my own OpenLayers.Format. It wasn't so complicated, basically I inherited from OpenLayers.Format.JSON and added my own read logic. Here is the result:

OpenLayers.Format.RSI = OpenLayers.Class(OpenLayers.Format.JSON, {

    read: function(json, type, filter) {
        type = (type) ? type : "FeatureCollection";
        var results = null;
        var obj = null;
        if (typeof json == "string") {
            obj = OpenLayers.Format.JSON.prototype.read.apply(this,
                                                              [json, filter]);
        } else { 
            obj = json;
        }    
        if(!obj) {
            OpenLayers.Console.error("Bad JSON: " + json);
        }

        var features=[];
        for (var i=0; i<obj.length; i++) {
            var item=obj[i];
            var point=new OpenLayers.Geometry.Point(item.Lon,item.Lat).transform('EPSG:4326', 'EPSG:3857');
            if (!isNaN(point.x)&&!isNaN(point.y)) {
                var feature=new OpenLayers.Feature.Vector(point,item);
                features.push(feature);
            }
        }
        
        return features;
    },
    

    CLASS_NAME: "OpenLayers.Format.RSI" 

});

All I had to do is load this in the page. But now the problem was that Azimuth only knows some types of layers based on a switch block. I've not refactored the code to be plug and play, instead I shamelessly changed it to try to use the GeoJson code with the format I provide as the lyr-type, if it exists in the OpenLayers.Format object. That settled that. By running the code so far I see the streets layer and on top of it a lot of yellow circles for each of my items.
Next problem: too many items. The map was very slow because I was adding over 30000 items on the map. I was in need of clustering. I wasted almost an entire day trying to figure out what it wouldn't work until I realised that it was an ordering issue. Duh! But still, in this new framework that I was working on I didn't want to add configuration in a Javascript event, I wanted to be able to configure as much as possible via AngularJS parameters. I noticed that Azimuth already had support for strategy parameters. Unfortunately it only supported an actual strategy instance as the parameter rather than a string. I had, again, to change the Azimuth code to first search for the name of the strategy parameters in OpenLayers.Strategy and if not found to $parse the string. Yet it didn't work as expected. The clustering was not engaging. Wasting another half an hour I realised that, at least in the case of this weirdly buggy Cluster strategy, I not only needed it, but also a Fixed strategy. I've changed the code to add the strategy instead of replacing it and suddenly clustering was working fine. I still have to make it configurable, but that is a detail I don't need to go into right now. Anyway, remember that the loadend event was not fired when only the Cluster strategy was in the strategies array of the layer; I think you need the Fixed strategy to load data from somewhere.
Next thing I wanted to do was to center the map on the features existent on the map. The map also needed to be resized to the actual page size. I added a custom directive to expand a div's height down to an element which I styled to be always on the bottom of the page. The problem now was that the map was getting instantiated before the div was resized. This means that maybe I had to start with a big default height of the div. Actually that caused a lot of problems since the map remained as big as first defined and centering the map was not working as expected. What was needed was a simple map.updateSize(); called after the div was resized. In order to then center and zoom the map on the existent features I used this code:

        var bounds={
            minLon:1000000000,
            minLat:1000000000,
            maxLon:-1000000000,
            maxLat:-1000000000
        };
    
        for (var i=0; i<layer.features.length; i++) {
            var feature=layer.features[i];
            var point=feature.geometry;
            if (!isNaN(point.x)&&!isNaN(point.y)) {
                bounds.minLon=Math.min(bounds.minLon,point.x);
                bounds.maxLon=Math.max(bounds.maxLon,point.x);
                bounds.minLat=Math.min(bounds.minLat,point.y);
                bounds.maxLat=Math.max(bounds.maxLat,point.y);
            }
        }
        map.updateSize();
        var extent=new OpenLayers.Bounds(bounds.minLon,bounds.minLat,bounds.maxLon,bounds.maxLat);
        map.zoomToExtent(extent,true);

Now, while the clustering was working OK, I wanted to show stuff and make those clusters do things for me. I needed to style the clusters. This is done via:

        layer.styleMap=new OpenLayers.StyleMap({
                "default": defaultStyle,
                "select": selectStyle
            });

        layer.events.on({
            "featureselected": clickFeature
        });

        var map=layer.map;

        var hover = new OpenLayers.Control.SelectFeature(
            layer, {hover: true, highlightOnly: true}
        );
        map.addControl(hover);
        hover.events.on({"featurehighlighted": displayFeature});
           hover.events.on({"featureunhighlighted": hideFeature});
        hover.activate();

        var click = new OpenLayers.Control.SelectFeature(
            layer, {hover: false}
        );
        map.addControl(click);
        click.activate();

I am adding two OpenLayers.Control.SelectFeature controls on the map, one activates on hover, the other on click. The styles that are used in the style map define different colors and also a dynamic radius based on the number of features in a cluster. Here is the code:

        var defaultStyle = new OpenLayers.Style({
            pointRadius: "${radius}",
            strokeWidth: "${width}",
            externalGraphic: "${icon}",
            strokeColor: "rgba(55, 55, 28, 0.5)",
            fillColor: "rgba(55, 55, 28, 0.2)"
        }, {
            context: {
                width: function(feature) {
                    return (feature.cluster) ? 2 : 1;
                },
                radius: function(feature) {
                    return feature.cluster&&feature.cluster.length>1
                        ? Math.min(feature.attributes.count, 7) + 2
                        : 7;
                }
            }
        });

You see that the width and radius are defined as dynamic functions. But here we have an opportunity that I couldn't let pass. You see, in these styles you can also define the icons. How about defining the icon dynamically using canvas drawing and then toDataURL? And I did that! It's not really that useful, but it's really interesting:

        function fIcon(feature,type) {
                    var iconKey=type+'icon';
                    if (feature[iconKey]) return feature[iconKey];
                    if(feature.cluster&&feature.cluster.length>1) {
                        var canvas = document.createElement("canvas");
                        var radius=Math.min(feature.cluster.length, 7) + 2;
                        canvas.width = radius*2;
                        canvas.height = radius*2;
                        var ctx = canvas.getContext("2d");
                            ctx.fillStyle = this.defaultStyle.fillColor;
                        ctx.strokeStyle = this.defaultStyle.strokeColor;
                        //ctx.fillRect(0,0,canvas.width,canvas.height);
                        ctx.beginPath();
                        ctx.arc(radius,radius,radius,0,Math.PI*2);
                        ctx.fill();
                        ctx.stroke();
                            ctx.fillStyle = this.defaultStyle.strokeColor;
                        var bounds={
                            minX:1000000000,
                            minY:1000000000,
                            maxX:-1000000000,
                            maxY:-1000000000
                        };
                        for(var c = 0; c < feature.cluster.length; c++) {
                              var child=feature.cluster[c];
                            var x=feature.geometry.x-child.geometry.x;
                            var y=feature.geometry.y-child.geometry.y;
                                bounds.minX=Math.min(bounds.minX,x);
                                bounds.minY=Math.min(bounds.minY,y);
                                bounds.maxX=Math.max(bounds.maxX,x);
                                bounds.maxY=Math.max(bounds.maxY,y);
                        }
                        var q=0;
                        q=Math.max(Math.abs(bounds.maxX),q);
                        q=Math.max(Math.abs(bounds.maxY),q);
                        q=Math.max(Math.abs(bounds.minX),q);
                        q=Math.max(Math.abs(bounds.minY),q);
                        q=radius/q;
                        var zoom=2;
                        for(var c = 0; c < feature.cluster.length; c++) {
                              var child=feature.cluster[c];
                            var x=-(feature.geometry.x-child.geometry.x)*q+radius;
                            var y=(feature.geometry.y-child.geometry.y)*q+radius;
                                ctx.fillRect(parseInt(x-zoom/2), parseInt(y-zoom/2), zoom, zoom);
                        }
                        feature[iconKey] = canvas.toDataURL("image/png");
                    } else {
                        feature[iconKey] = OpenLayers.Marker.defaultIcon().url;
                    }
                    return feature[iconKey];
                };

        defaultStyle.context.icon=function(feature) {
            return fIcon.call(defaultStyle,feature,'default');
        }
        selectStyle.context.icon=function(feature) {
            return fIcon.call(selectStyle,feature,'select');
        }

This piece of code builds a map of the features in the cluster, zooms it to the size of the cluster icon, then also draws a translucent circle as a background.
I will not bore you with the displayFeature and clickFeature code, enough said that the first would set the html title on the layer element and the other would either zoom and center or display the info card for one single feature. There is a gotcha here as well, probably caused initially by the difference in size between the map and the layer. In order to get the actual pixel based on latitude and longitude you have to use map.getLayerPxFromLonLat(lonlat), not map.getPixelFromLonLat(lonlat). The second will work, but only after zooming or moving the map once. Pretty weird.

There are other issues that come to mind now, like making the URL for the data dynamic, based on specific parameters, but that's for another time.

Detecting if an URL can be loaded in an IFrame

Published Apr 1, 2013

and has 2 comments

I was trying to solve a problem on this blog, where the opening of links in their own fancy javascript window would fail if the server did not allow opening their pages in frames. The result would be an ugly empty black window and an ugly javascript error in the browser console in the form of Refused to display '[some URL]' in a frame because it set 'X-Frame-Options' to 'SAMEORIGIN'.

So I started looking for a way to detect these pesky URLs. First attempt was using jQuery.Ajax with method 'HEAD', which inquires the HTTP headers only from a given URL. There is no reason I can see to deny access to 'HEAD' requests, but the browser does it anyway based on... HTTP headers! Not to mention that this solution fails for more links than a frame because of Ajax cross-site scripting issues.

Second attempt: use an adhoc hidden iframe to detect if the URL can be opened. This worked, but at a cost that prohibits me using the solution in the blog. I will publicize it, though, maybe it works for other scenarios. It uses jQuery, so you will have to translate it yourself into the raw Javascript version or to make it use your favorite framework.

The code first:

var Result={
    CouldNotLoadUrl:1,
    UrlLoadedButContentCannotBeAccessed:2,
    UrlLoadedContentCanBeAccessed:3
};


function isAvailable(url, callback, timeout) {
    if (!+(timeout)||+(timeout)<0) {
        timeout=5000;
    }
    var timer=setTimeout(function() {
        ifr.remove();
        callback(Result.CouldNotLoadUrl,url);
    },timeout);
    var ifr=$('<iframe></iframe>')
        .hide()
        .appendTo('body');
    ifr.on('load',function() {
        if (timer) clearTimeout(timer);
        var result;
        try {
            var iframe=ifr[0];
            var doc=(iframe.contentWindow||iframe.contentDocument).location.href;
            result=Result.UrlLoadedContentCanBeAccessed;
        } catch(ex) {
            result=Result.UrlLoadedButContentCannotBeAccessed;
            alt=ex;
        }
        ifr.remove();
        callback(result,url,alt);
    });
    ifr.attr('src',url);
}

You use it like this:

isAvailable('https://siderite.dev',function(result,url,alt) {
    switch(result) {
        case Result.CouldNotLoadUrl:
            alert('Could not load '+url+' in an iframe (timeout after '+alt+' milliseconds)');
            break;
        case Result.UrlLoadedButContentCannotBeAccessed:
            alert(url+' loaded in an iframe, but content is innaccessible ('+alt+')');
            break;
        case Result.UrlLoadedContentCanBeAccessed:
            alert(url+' loaded in an iframe and content is accessible');
            break;
    }
},10000);

You will need to have jQuery loaded and to have a html body loaded in the DOM (so if you copy these into an empty html file to test, make sure you add <body></body> before the script or execute isAvailable on the DOM Ready event.

And now the explanation.
First, it is imperative to first append the iframe element to body before binding the load event. That is because jQuery creates the element in a document fragment and this process fires a load event by itself! Then, different browsers act differently. Google Chrome does not fire a load event for an iframe with an URL that has this problem. Internet Explorer does fire the event, but the iframe's content document is not accessible (and this can be caught in a try/catch block). FireFox does fire the event, but only the leaf properties of the content document throw an exception, like the href of the location. In order to fix all of these, I used a timeout for Chrome, to return a false result after a time, then an access to ifr[0].contentDocument.location.href to make it throw an exception in both Internet Explorer and FireFox.

Finally, the reason why I cannot use it on the blog is that it would force the browser of the viewer to load all the URLs completely in the background in order to add a silly click event on the links. I have one more idea in mind, though, and that is to detect the frame loading problem when I open it and in that case to create the content of the iframe manually to contain a link to the URL. I will attempt it sometime soon.

Update: I found a solution that seems reasonable enough. When creating the iframe in which I want to nicely load the page that the link points to, I am not just creating an empty frame, but I also add content: a link that points to the same page. The SAMEORIGIN problem is still there, so the link opens the URL in target="_blank" and has a click handler that closes the dialog 100 milliseconds later. Thus, when changing the frame src, if the content of the frame does not change, the user will have the option to click the link and see the page open in a new tab/window.