Ienumerable and yield return in C#... and in Javascript!? Iterators and generators

Published Feb 21, 2017

IEnumerable/IEnumerator - the iterator design pattern implemented in the C# language

C# started up with the IEnumerable interface, which exposed one method called GetEnumerator. This method would return a specialized object (implementing IEnumerator) that could give you from a collection of items the current item and could also advance. Unlike Java on which C# is originally based, this interface was inherited by even the basest of collection objects, like arrays. This allowed a special construct in the language: foreach. One uses it very simply:

for (var item in ienumerable) // do something

. No need to know how or from where the item is exactly retrieved from the collection, just get the first, then the next and so on until there are no more items.

In .NET 2.0 generics came, along their own interfaces like IEnumerable<T>, holding items of a specific type, but the logic is the same. It also introduced another language element called yield. One didn't need to write an IEnumerator implementation anymore, they could just define a method that returned an IEnumerable and inside "yield return" values. Something like this:

public class Program
{
    public static void Main(string[] args)
    {
        var enumerator = Fibonacci().GetEnumerator();
        for (var i = 0; enumerator.MoveNext() && i < 10; i++)
        {
            var v = enumerator.Current;
            Console.WriteLine(v);
        }
        Console.ReadKey();
    }

    public static IEnumerable<int> Fibonacci()
    {
        var i1 = 0;
        var i2 = 1;
        while (true)
        {
            yield return i2;
            i2 += i1;
            i1 = i2 - i1;
        }
    }
}

This looks a bit weird. A method is running a while(true) loop with no breaks. Shouldn't it block the execution of the program? No, because of the yield construct. While the Fibonacci series is infinite, we would only get the first 10 values. You can also see how the enumerator works, when used explicitly.

Iterators and generators in Javascript ES6

EcmaScript version 6 (or ES6 or ES2015) introduced the same concepts in Javascript. An iterator is just an object that has a next() method, returning an object containing the value and done properties. If done is true, value is disregarded and the iteration stops, if not, value holds the current value. An iterable object will have a method that returns an iterator, the method's name being Symbol.iterator. The for...of construct of the language iterates the iterable. String, Array, TypedArray, Map and Set are all built-in iterables, because the prototype objects of them all have a Symbol.iterator method. Example:

var iterable=[1,2,3,4,5];
for (v of iterable) {
    console.log(v);
}

But what about generating values? Well, let's do it using the knowledge we already have:

var iterator={
    i1:0,
    i2:1,
    next:function() {
        var result = { value: this.i2 }
        this.i2+=this.i1;
        this.i1=this.i2-this.i1;
        return result;
    }
};

var iterable = {};
iterable[Symbol.iterator]=() => iterator;

var iterator=iterable[Symbol.iterator]();
for (var i=0; i<10; i++) {
    var v=iterator.next();
    console.log(v.value);
}

As you can see, it is the equivalent of the Fibonacci code written in C#, but look at how unwieldy it is. Enter generators, a feature that allows, just like in C#, to define functions that generate values and the iterable associated with them:

function* Fibonacci() {
    var i1=0;
    var i2=1;
    while(true) {
        yield i2;
        i2+=i1;
        i1=i2-i1;
    }
}

var iterable=Fibonacci();

var iterator=iterable[Symbol.iterator]();
for (var i=0; i<10; i++) {
    var v=iterator.next();
    console.log(v.value);
}

No, that's not a C pointer, thank The Architect, it's the way Javascript ES6 defines generators. Same code, much clearer, very similar to the C# version.

Uses

OK, so these are great for mathematics enthusiasts, but what are we, regular dudes, going to do with iterators and generators? I mean, for better or for worse we already have for and .forEach in Javascript, what do we need for..of for? (pardon the pun) And what do generators help with?

Well, in truth, one could get away simply without for..of. The only object where .forEach works differently is the Map object, where it returns only the values, as different from for..of which returns arrays of [key,value]. However, considering generators are new, I would expect using for..of with them to be more clear in code and more inline with what foreach does in C#.

Generators have the ability to easily define series that may be infinite or of items which are expensive resources to get. Imagine a download of a large file where each chunk is a generated item. An interesting use scenario is when the .next function is used with parameters. This is Javascript, so an iterator having a .next method only means it has to be named like that. You can pass an arbitrary number of parameters. So here it is, a generator that not only dumbly spews out values, but also takes inputs in order to do so.

In order to thoroughly explore the value of iterators and generators I will use my extensive knowledge in googling the Internet and present you with this very nice article: The Hidden Power of ES6 Generators: Observable Async Flow Control which touches on many other concepts like async/await (oh, yeah, that should be another interesting C# steal), observables and others that are beyond the scope of this article.

I hope you liked this short intro into these interesting new features in ES6.

Difference between "var f = function" and "function f" in Javascript

Published Feb 21, 2017

and has 0 comments

Can you tell me what is the difference between these to pieces of code?

var f=function() {
  alert('f u!');
}

function f() {
  alert('f u!');
}

It's only one I can think of and with good code hygiene it is one that should never matter. It is related to 'hoisting', or the idea that in Javascript a variable declaration is hoisted to the top of the function or scope before execution. In other words, I can see there is a function f before the declarations above. And now the difference: 'var f = function' will have its declaration hoisted, but not its definition. f will exist at the beginning of the scope, but it will be undefined; the 'function f' format will have both declaration and definition hoisted, so that at the beginning of the scope you will have available the function for execution.

Promises in Javascript

Published Feb 19, 2017

and has 2 comments

Intro

I am not the authoritative person to go to for Javascript Promises, but I've used them extensively (pardon the pun) in Bookmark Explorer, my Chrome extension. In short, they are a way to handle methods that return asynchronously or that for whatever reason need to be chained. There are other ways to do that, for example using Reactive Extensions, which are newer and in my opinion better, but Rx's scope is larger and that is another story altogether.

Learning by examples

Let's take a classic and very used example: AJAX calls. The Javascript could look like this:

function get(url, success, error) {
    var xhttp = new XMLHttpRequest();
    xhttp.onreadystatechange = function () {
        if (this.readyState == 4) {
            if (this.status == 200) {
                success(this.response);
            } else {
                error(this.status);
            }
        }
    };
    xhttp.open("GET", url, true);
    xhttp.send();
}

get('/users',function(users) {
    //do something with user data
},function(status) {
    alert('error getting users: '+status);
})

It's a stupid example and not by far complete, but it shows a way to encapsulate an AJAX call into a function that receives an URL, a handler for success and one for error. Now let's complicate things a bit. I want to get the users, then for each active user I want to get the document list, then return the ones that contain a string. For simplicity's sake, let's assume I have methods that do all that and receive a success and an error handler:

var search = 'some text';
var result = [];
getUsers(function (users) {
    users
    .filter(function (user) {
        return user.isActive();
    })
    .forEach(function (user) {
        getDocuments(user, function (documents) {
            result = result.concat(documents.filter(function (doc) {
                        return doc.text.includes(text);
                    }));
        }, function (error) {
            alert('Error getting documents for user ' + user + ': ' + error);
        });
    });
}, function (error) {
    alert('Error getting users: ' + error);
});

It's already looking wonky. Ignore the arrow anti pattern, there are worse issues. One is that you never know when the result is complete. Each call for user documents takes an indeterminate amount of time. This async programming is killing us, isn't it? We would have much better liked to do something like this instead:

var result = [];
var search = 'some text';
var users = getUsers();
users
.filter(function (user) {
    return user.isActive();
})
.forEach(function (user) {
    var documents = getDocuments(user);
    result = result.concat(documents.filter(function (doc) {
                return doc.text.includes(text);
            }));

});

First of all, no async, everything is deterministic and if the function for getting users or documents fails, well, it can handle itself. When this piece of code ends, the result variable holds all information you wanted. But it would have been slow, linear and simply impossible in Javascript, which doesn't even have a Pause/Sleep option to wait for stuff.

Now I will write the same code with methods that use Promises, then proceed on explaining how that would work.

var result = [];
var search = 'some text';
var userPromise = getUsers();
userPromise.then(function (users) {
    var documentPromises = users
        .filter(function (user) {
            return user.isActive();
        })
        .map(function (user) {
            return getDocuments(user);
        });
    var documentPromise = Promise.all(documentPromises);
    documentPromise
    .then(function (documentsArray) {
        documentsArray.forEach(function (documents) {
            result = result.concat(documents.filter(function (doc) {
                        return doc.text.includes(search);
                    });
        });
        // here the result is complete
    })
    .catch (function (reason) {
        alert('Error getting documents:' + reason);
    });
});

Looks more complicated, but that's mostly because I added some extra variables for clarity.

The first thing to note is that the format of the functions doesn't look like the async callback version, but like the synchronous version: var userPromise=getUsers();. It doesn't return users, though, it returns the promise of users. It's like a politician function. This Promise object encapsulates the responsibility of announcing when the result is actually available (the .then(function) method) or when the operation failed (the .catch(function) method). Now you can pass that object around and still use its result (successful or not) when available at whatever level of the code you want it.

At the end I used Promise.all which handles all that uncertainty we were annoyed about. Not only does it publish an array of all the document getting operations, but the order of the items in the array is the same as the order of the original promises array, regardless of the time it took to execute any of them. Even more, if any of the operations fails, this aggregate Promise will immediately exit with the failure reason.

To exemplify the advantages of using such a pattern, let's assume that getting the users sometimes fails due to network errors. The Internet is not the best in the world where the user of the program may be, so you want to not fail immediately, instead retry the operation a few times before you do. Here is how a getUsersWithRetry would look:

function getUserWithRetry(times, spacing) {
    var promise = new Promise(function (resolve, reject) {
            var f = function () {
                getUsers()
                .then(resolve)
                .catch (function (reason) {
                    if (times <= 0) {
                        reject(reason);
                    } else {
                        times--;
                        setTimeout(f, spacing);
                    }
                });
            }
            f();
        });
    return promise;
}

What happens here? First of all, like all the get* methods we used so far, we need to return a Promise object. To construct one we give it as a parameter a function that receives two other functions: resolve and reject. Resolve will be used when we have the value, reject will be used when we fail getting it. We then create a function so that it can call itself and in it, we call getUsers. Now, if the operation succeeds, we will just call resolve with the value we received. The operation would function exactly like getUsers. However, when it fails, it checks the number of times it must retry and only fails (calling reject) when that number is zero. If it still has retries, it calls the f function we defined, with a timeout defined in the parameters. We finally call the f function just once.

Here is another example, something like the original get function, but that returns a Promise:

function get(url) {
  // Return a new promise.
  return new Promise(function(resolve, reject) {
    // Do the usual XHR stuff
    var req = new XMLHttpRequest();
    req.open('GET', url);

    req.onload = function() {
      // This is called even on 404 etc
      // so check the status
      if (req.status == 200) {
        // Resolve the promise with the response text
        resolve(req.response);
      }
      else {
        // Otherwise reject with the status text
        // which will hopefully be a meaningful error
        reject(Error(req.statusText));
      }
    };

    // Handle network errors
    req.onerror = function() {
      reject(Error("Network Error"));
    };

    // Make the request
    req.send();
  });
}

Copied it like a lazy programmer from JavaScript Promises: an Introduction.

Notes

An interesting thing to remember is that the .then() method also returns a Promise, so one can do stuff like get('/users').then(JSON.parse).then(function(users) { ... }). If the function called by .then() is returning a Promise, then that is what .then() will return, allowing for stuff like someOperation().then(someOtherOperation).catch(errorForFirstOperation).then(handlerForSecondOperation). There is a lot more about promises in the Introduction in the link above and I won't copy/paste it here.

The nice thing about Promises is that they have been around in the Javascript world since forever in various libraries, but only recently as native Javascript objects. They have reached a maturity that was tested through the various implementations that led to the one accepted today by the major browsers.

Promises solve some of the problems in constructing a flow of asynchronous operations. They make the code cleaner and get rid of so many ugly extra callback parameters that many frameworks used us to. This flexibility is possible because in Javascript functions are first class members, meaning they can be passed around and manipulated just like any other parameter type.

The disadvantages are more subtle. Some parts of your code will be synchronous in nature. You will have stuff like a=sum(b,c); in functions that return values. Suddenly, functions don't return actual values, but promised values. The time they take to execute is also unclear. Everything is in flux.

Conclusion

I hope this has opened your eyes to the possibilities of writing your code in a way that is both more readable and easy to encapsulate. Promises are not limited to Javascript, many other languages have their own implementations. As I was saying in the intro, I feel like Promises are a simple subcase of Reactive Extensions streams in the sense that they act like data providers, but are limited to only one possible result. However, this simplicity may be more easy to implement and understand when this is the only scenario that needs to be handled.

Javascript Proxy in ES6! Great feature I knew nothing about.

Published Feb 18, 2017

and has 0 comments

I watched this Beau teaches JavaScript video and I realized how powerful Proxy, this new feature of EcmaScript version 6, is. In short, you call new Proxy(someObject, handler) and you get an object that behaves just like the original object, but has your code intercept most of the access to it, like when you get/set a property or method or when you ask if the object has a member by name. It is great because I feel I can work with normal Javascript, then just insert my own logging, validation or some other checking code. It's like doing AOP and metaprogramming in Javascript.

Let's explore this a little bit. The video in the link above already shows some way to do validation, so I am going to create a function that takes an object and returns a proxy that is aware of any modification to the original object, adding the isDirty property and clearDirt() method.

function dirtify(obj) {
    return new Proxy(obj,{
        isDirty : false,
        get : function(target, property, receiver) {
            if (property==='isDirty') return this.isDirty;
            if (property==='clearDirt') {
                var self=this;
                var f = function() {
                    self.isDirty=false;
                };
                return f.bind(target);
            }
            console.log('Getting '+property);
            return target[property];
        },
        has : function(target, property) {
            if (property==='isDirty'||property==='clearDirt') return true;
            console.log('Has '+property+'?');
            return property in target;
        },
        set : function(target, property, value, receiver) {
            if (property==='isDirty'||property==='clearDirt') return false;
            if (target[property]!==value) this.isDirty=true;
            console.log('Setting '+property+' to '+JSON.stringify(value));
            target[property]=value;
            return true;
        },
        deleteProperty : function(target, property) {
            if (property==='isDirty'||property==='clearDirt') return false;
            console.log('Delete '+property);
            if (target[property]!=undefined) this.isDirty=true;
            delete target[property];
            return true;
        }
    });
}
var obj={
    x:1
};
var proxy=dirtify(obj);
console.log('x' in proxy); //true
console.log(proxy.hasOwnProperty('x')); //true
console.log('isDirty' in proxy); //true
console.log(proxy.x); //1
console.log(proxy.hasOwnProperty('isDirty')); //false
console.log(proxy.isDirty); //false

proxy.x=2;
console.log(proxy.x); //2
console.log(proxy.isDirty); //true

proxy.clearDirt();
console.log(proxy.isDirty); //false

proxy.isDirty=true;
console.log(proxy.isDirty); //false
delete proxy.isDirty;
console.log(proxy.isDirty); //false

delete proxy.x;
console.log(proxy.x); //undefined
console.log(proxy.isDirty); //true

proxy.clearDirt();
proxy.y=2;
console.log(proxy.isDirty); //true

proxy.clearDirt();
obj.y=3;
console.log(obj.y); //3
console.log(proxy.y); //3
console.log(proxy.isDirty); //false

So, here I am returning a proxy that logs any access to members to the console. It also simulates the existence of isDirty and clearDirt members, without actually setting them on the object. You see that when setting the isDirty property to true, it still reads false. Any setting of a property to a different value or deleting an existing property is setting the internal isDirty property to true and the clearDirt method is setting it back to false. To make it more interesting, I am returning true for the 'in' operator, but not for the hasOwnProperty, when querying if the attached members exist. Also note that this is a real proxy, if you change a value in the original object, the proxy will also reflect it, but without intercepting the change.

Imagine the possibilities!

More info:
ES6 Proxy in Depth
Metaprogramming with proxies
Metaprogramming in ES6: Part 3 - Proxies

Javascript debounce function

Published Jan 25, 2017

and has 0 comments

Often we need to attach functions on Javascript events, but we need them to not be executed too often. Mouse move or scroll events can fire several times a second and executing some heavy computation directly would make everything slow and unresponsive. That's why we use a method called debounce, that takes your desired function and returns another function that will only get executed so many times in a time interval.

Long story short, here is the code:

function debounce(fn, wait) {
    var timeout=null;
    var c=function(){ clearTimeout(timeout); timeout=null; };
    var t=function(fn){ timeout=setTimeout(fn,wait); };
    return function() {
        var context=this;
        var args=arguments;
        var f=function(){ fn.apply(context,args); };
        if (!timeout) {
            t(c);
            f();
        } else {
            c();
            t(f);
        }
    }
}

with a more compressed ES6 version here.

It has reached a certain kind of symmetry, so I like it this way. Let me explain how I got to it.

First of all, there is often a similar function used for debouncing. Everyone remembers it and codes it off the top of the head, but it is partly wrong. It looks like this:

function debounce(fn, wait) {
    var timeout=null;
    return function() {
        var context=this;
        var args=arguments;
        var f=function(){ fn.apply(context,args); };
        clearTimeout(timeout);
        timeout=setTimeout(f,wait);
    };
}

It seems OK, right? Just extend the time until the function gets executed. Well, the problem is that the first time you call the function you will have to wait before you see any result. If the function is called more often than the wait period, your code will never get executed. That is why Google shows this page as *the* debounce reference: JavaScript Debounce Function. And it works, but good luck trying to understand its flow so completely that you can code it from memory. My problem was with the callNow variable, as well as the rarity of cases when I would need to not call the function immediately the first time, thus making the immediate variable redundant.

So I started writing my own code. And it looked like the "casual" debounce function, with an if block added. If the timeout is already set, then just reset it; that's the expected behavior. When isn't this the expected behavior? When calling it the first time or after a long period of inactivity. In other words when the timeout is not set. And the code looked like this:

function debounce(fn, wait) {
    var timeout=null;
    return function() {
        var context=this;
        var args=arguments;
        var f=function(){ fn.apply(context,args); };
        if (timeout) {
            clearTimeout(timeout);
            timeout=setTimeout(f,wait);
        } else {
            timeout=setTimeout(function() {
                clearTimeout(timeout);
                timeout=null;
            });
            f();
        }
    };
}

The breakthrough came with the idea to use the timeout anyway, but with an empty function, meaning that the first time it is called, the function will execute your code immediately, but also "occupy" the timeout with an empty function. Next time it is called, the timeout is set, so it will be cleared and reset with a timeout using your initial code. If the interval elapses, then the timeout simply gets cleared anyway and next time the call of the function will be immediate. If we abstract the clearing of timeout and the setting of timeout in the functions c and t, respectively, we get the code you saw at the beginning of the post. Note that many people using setTimeout/clearTimeout are in the scenario in which they set the timeout immediately after they clear it. This is not always the case. clearTimeout is a function that just stops a timer, it does not change the value of the timeout variable. That's why, in the cases when you want to just clear the timer, I recommend also setting the timeout variable to null or 0.

For the people wanting to look cool, try this version:

function debounce(fn, wait) {
    var timeout=null;
    var c=function(){ clearTimeout(timeout); timeout=null; };
    var t=function(fn){ timeout=setTimeout(fn,wait); };
    return function() {
        var context=this;
        var args=arguments;
        var f=function(){ fn.apply(context,args); };
        timeout
            ? c()||t(f)
            : t(c)||f();
    }
}

Now, doesn't this look sharp? The symmetry is now obvious. Based on the timeout, you either clear it immediately and time out the function or you time out the clearing and execute the function immediately.

Update 26 Apr 2017: Here is an ES6 version of the function:

function debounce(fn, wait) {
    let timeout=null;
    const c=()=>{ clearTimeout(timeout); timeout=null; };
    const t=fn=>{ timeout=setTimeout(fn,wait); };
    return ()=>{
        const context=this;
        const args=arguments;
        let f=()=>{ fn.apply(context,args); };
        timeout
            ? c()||t(f)
            : t(c)||f();
    }
}

Huge increases of Javascript code performance when using 'strict' (in a very specific case)

Published Jul 25, 2016

and has 0 comments

I've run into a very interesting discussion on StackOverflow regarding the significant decrease in execution time when using 'use strict' in Javascript.

Basically, there was a simple function added to the prototype of string to count the occurrences of a specific character in the string. Adding

'use strict';

to the function would make it run ten times faster. The answer boiled down to the fact that for normal Javascript the keyword this forces the type to 'object': typeof(this)==='object', while in strict mode it's whatever the object is (in this case string). In other words, to emulate the same behavior without using strict mode we need to "cast" this to the type we need, using a variable

var str=this+'';

It was a cool thing to learn, with the added bonus that I found out about using console.time to measure Javascript performance as well.

Update: oh, by the way, the performance decreased 4 times! by using .charAt instead of the indexer in order to get the character at a certain position.

Bookmark Explorer v2.7 - new features

Published Jun 29, 2016

and has 9 comments

Just when I thought I don't have anything else to add, I found new stuff for my Chrome browser extension.

Bookmark Explorer now features:

configurable interval for keeping a page open before bookmarking it for Read Later (so that all redirects and icons are loaded correctly)
configurable interval after which deleted bookmarks are no longer remembered
remembering deleted bookmarks no matter what deletes them
more Read Later folders: configure their number and names
redesigned options page
more notifications on what is going on

The extension most resembles OneTab, in the sense that it is also designed to save you from opening a zillion tabs at the same time, but differs a lot by its ease of use, configurability and the absolute lack of any connection to outside servers: everything is stored in Chrome bookmarks and local storage.

Enjoy!

Comparing dates in Javascript

Published Jun 26, 2016

and has 0 comments

The Date object in Javascript is not a primitive, it is a full fledged object, with a constructor and various instances, with methods that mutate their values. That means that the meaning of equality between two dates is ambiguous: what does it mean that date1 equals date2? That they are the same object or that they point to the same instance in time? In Javascript, it means they are the same object. Let me give you some code:

var date1=new Date();
var date2=new Date(date1.getTime()); // now date2 points to the same moment in time as date1
console.log(date1==date2) // outputs "false"
date1=date2;
console.log(date1==date2) // outputs "true", date1 and date2 are pointing to the same object

So, how can we compare two dates? First thought it to turn them into numeric values (how many milliseconds from the beginning of 1970) and compare them. Which works, but looks ugly. Instead of using date1.getTime() == date2.getTime() one might use the fact that the valueOf function of the Date object also returns the same numeric value as getTime and turn the comparison into a substraction instead. To compare the two dates, just check if date2 - date1 == 0.

A jQuery custom selector for 'not hidden' elements

Published Jun 16, 2016

and has 0 comments

I was working on a project of mine that also has some unit tests. In one of them, the HTML structure is abstracted and sent as a jQuery created element to a page script. However, the script uses the custom jQuery selector :visible, which completely fails in this case. You see, none of the elements are visible unless added to the DOM of a page. The original jQuery selector goes directly to existing browser methods to check for visibility:

jQuery.expr.filters.visible = function( elem ) {

    // Support: Opera <= 12.12
    // Opera reports offsetWidths and offsetHeights less than zero on some elements
    // Use OR instead of AND as the element is not visible if either is true
    // See tickets #10406 and #13132
    return elem.offsetWidth > 0 || elem.offsetHeight > 0 || elem.getClientRects().length > 0;
};

So I've decided to write my own simple selector which replaces it. Here it is:

$.extend($.expr[':'], {
    nothidden : function (el) {
        el=$(el);
        while (el.length) {
            if (el[0].ownerDocument===null) break;
            if (el.css('display')=='none') return false;
            el=el.parent();
        }
        return true;
    }
});

It goes from the selected element to its parent, recursively, until it doesn't find anything or finds some parent which is with display none. I was only interested in the CSS display property, so if you want extra stuff like visibility or opacity, change it yourself. What I wanted to talk about was that strange ownerDocument property check. It all stems from a quirk in jQuery which causes $(document).css(...) to fail. The team decided to ignore the bug report regarding it. But, the question is, what happens when I create an element with jQuery and don't attach it to the DOM? Well, behind the scene, all elements are being created with document.createElement or document.createDocumentFragment which, it makes sense, fill the ownerDocument property with the document object that created the element. The only link in the chain that doesn't have an ownerDocument is the document itself. You might want to remember this in case you want to go up the .parent() ladder yourself.

Now, warning: I just wrote this and it might fail in some weird document in document cases, like IFRAMEs and stuff like that. I have not tested it except my use case, which luckily involves only one type of browser.

Bookmark Explorer is now stable.

Published May 30, 2016

and has 0 comments

Bookmark Explorer, a Chrome browser extension that allows you to navigate inside bookmark folders on the same page, saving you from a deluge of browser tabs, has now reached version 2.4.0. I consider it stable, as I have no new features planned for it and the only changes I envision in the near future is switching to ECMAScript 6 and updating the unit test (in other words, nothing that concerns the user).

Let me remind you of its features:

lets you go to the previous/next page in a bookmark folder, allowing sequential reading of selected news or research items
has context menu, popup buttons and keyboard shortcut support
shows a page with all the items in the current bookmark folder, allowing selection, deletion, importing/exporting of simple URL lists
shows a page with all the bookmarks that were deleted, allowing restoring them, clearing them, etc.
keyboard support for both pages
notifies you if the current page has been bookmarked multiple times
no communication with the Internet, it works just as well offline - assuming the links would work offline, like local files
absolutely free

Install it from Google's Chrome Web store.

Chrome Extension: Facebook Filter

Published May 15, 2016

and has 0 comments

Update 17 June 2016: I've changed the focus of the extension to simply change the aspect of stories based on status, so that stories with content are highlighted over simple shares. I am currently working on another extension that is more adaptive, but it will be branded differently.

Update 27 May 2016: I've published the very early draft of the extension because it already does a cool thing: putting original content in the foreground and shrinking the reposts and photo uploads and feeling sharing and all that. You may find and install the extension here.

Have you ever wanted to decrease the spam in your Facebook page but couldn't do it in any way that would not make you miss important posts? I mean, even if you categorize all your contacts into good friends, close friends, relatives, acquaintances, then you unfollow the ones that really spam too much and you hide all posts that you don't like, you have no control over how Facebook decides to order what you see on the page. Worse than that, try to refresh repeatedly your Facebook page and see wildly oscillating results: posts appear, disappear, reorder themselves. It's a mess.

Well, true to this and my word I have started work on a Chrome extension to help me with this. My plan is pretty complicated, so before I publish the extension on the Chrome Webstore, like I did with my previous two efforts, I will publish this on GitHub while I am still working on it. So, depending on where I am, this might be alpha, beta or stable. At the moment of this writing - first commit - alpha is a pretty big word.

Here is the plan for the extension:

Detect the user has opened the Facebook page
Inject jQuery and extension code into the page
Detect any post as it appears on the page
Extract as many features as possible
Allow the user to create categories for posts
Allow the user to drag posts into categories or out of them
Use AI to determine the category a post most likely belongs to
Alternatively, let the user create their own filters, a la Outlook
Show a list of categories (as tabs, perhaps) and hide all posts under the respective categories

This way, one might skip the annoying posts, based on personal preferences, while still enjoying the interesting ones. At the time of this writing, the first draft, the extension only works on https://www.facebook.com, not on any subpages, it extracts the type of the post and sets a CSS class on it. It also injects a CSS which makes posts get dimmer and smaller based on category. Mouse over to get the normal size and opacity.

How to make it work for you:

In Chrome, go to Manage Extensions (chrome://extensions/)
Click on the Developer Mode checkbox
Click on the Load unpacked extension... button
Select a folder where you have downloaded the source of this extension
Open a new tab and load Facebook there
You should see the posts getting smaller and dimmer based on category.

Change statusProcessor.css to select your own preferences (you may hide posts altogether or change the background color, etc).

As usual, please let me know what you think and contribute with code and ideas.

DNS Resolver - A Chrome extension to help you with malfunctioning DNS or IP blocking.

Published May 13, 2016

and has 4 comments

I've written another Chrome extension that I consider in beta, but so far it works. Really ugly makeshift code, but I am not gathering data about the way I will use it, then I am going to refactor it, just as I did with Bookmark Explorer. You may find the code at GitHub and the extension at the Chrome webstore.

This is how it works: Every time you access anything with the browser, the extension will remember the IPs for any given host. It will hold a list of the IPs, in reverse order (last one first), that you can just copy and paste into your hosts file. The hosts file is found in c:/Windows/System32/drivers/etc/hosts and on Linux in /etc/hosts. Once you add a line in the format "IP host" in it, the computer will resolve the host with the provided IP. Every time there is a problem with DNS resolution, the extension will add the latest known IP into the hosts text. Since the extension doesn't have access to your hard drive, you need to edit the file yourself. The icon of DNS resolver will show the number of hosts that it wants to resolve locally or nothing, if everything is OK.

The extension allows manual selection of an IP for a host and forced inclusion or exclusion from the list of IP/host lines. Data can be erased (all at once for now) as well. The extension does not communicate with the outside, but it does store a list of all domains you visit, so it is a slight privacy risk - although if someone has access to the local store of a browser extension, it's already too late. There is also the possibility of the extension to replace the host with IP directly in the browser requests, but this only works for the browser and fails in case the host name is important, as in the case of multiple servers using the same IP, so I don't recommend using it.

There are two scenarios for which this extension is very useful:

The DNS server fails for some reason or gives you a wrong IP
Someone removed the IP address from DNS servers or replaced it with one of their own, like in the case of governments censorship

I have some ideas for the future:

Sharing of working IP/host pairs - have to think of privacy before that, though
Installing a local DNS server that can communicate locally with the extension, so no more hosts editing - have to research and create one
Upvoting/Downvoting/flagging shared pairs - with all the horrible head-ache this comes with

As usual, let me know what you think here, or open issues on GitHub.

Unit testing Javascript: QUnit

Published May 11, 2016

and has 0 comments

I have started writing Chrome extensions, mainly to address issues that my browser is not solving, like opening dozens of tabs and lately DNS errors/blocking and ad blocking. My code writing process is chaotic at first, just writing stuff and changing it until things work, until I get to something I feel is stable. Then I feel the need to refactor the code, organizing and cleaning it and, why not, unit testing it. This opens the question on how to do that in Javascript and, even if I have known once, I needed to refresh my understanding with new work. Without further ado: QUnit, a Javascript testing framework. Not that all code here will be in ES5 or earlier, mainly because I have not studied ES6 and I want this to work with most Javascript.

QUnit

QUnit is something that has withstood the test of time. It was first launched in 2008, but even now it is easy to use with simple design and clear documentation. Don't worry, you can use it even without jQuery. In order to use it, create an HTML page that links to the Javascript and CSS files from QUnit, then create your own Javascript file containing the tests and add it to the page together with whatever you are testing.

Already this raises the issue of having Javascript code that can be safely embedded in a random web page, so consider how you may encapsulate the code. Other testing frameworks could run the code in a headless Javascript engine, so if you want to be as generic as possible, also remove all dependencies on an existing web page. The oldest and simplest way of doing this is to use the fact that an orphan function in Javascript has its own scope and always has this pointing to the global object - in case of a web page, this would be window. So instead of something like:

i=0;
while (i<+(document.getElementById('inpNumber').value)) {
  i++;
  // do something
}

do something like this:

(function() {

  var global=this;

  var i=0;
  while (i<+(global.document.getElementById('inpNumber').value)) {
    i++;
    // do something
  }

})();

It's a silly example, but it does several things:

It keeps variable i in the scope of the anonymous function, thus keeping it from interfering with other code on the page
It clearly defines a global object, which in case of a web page is window, but may be something else
It uses global to access any out of scope values

In this particular case, there is still a dependency on the default global object, but if instead one would pass the object somehow, it could be abstracted and the only change to the code would be the part where global is defined and acquired.

Let's start with QUnit. Here is a Hello World kind of thing:

QUnit.test("Hello World", function (assert) {
    assert.equal(1+1, 2, "One plus one is two");
});

We put it in 'tests.js' and include it into a web page that looks like this:

<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width">
  <title>Unit Tests</title>
  <link rel="stylesheet" href="https://code.jquery.com/qunit/qunit-1.23.1.css">
</head>
<body>
  <script src="https://code.jquery.com/qunit/qunit-1.23.1.js"></script>
  <div id="qunit"></div>
  <div id="qunit-fixture"></div>

  <script src="tests.js"></script>
</body>
</html>

The result:

As you can see, we declare a test with the static QUnit.test function, which receives a name and a function as parameters. Within the function, the assert object will do everything we need, mainly checking to see if a result conforms to an expected value or a block throws an exception. I will not go through a detailed explanation on simple uses like that. If you are interested peruse the QUnit site for tutorials.

Modules

What I want to talk about are slightly more advanced scenarios. The first thing I want to address is the concept of modules. If we declare all the tests, regardless on how many scripts they are arranged in, the test page will just list them one after another, in a huge blob. In order to somehow separate them in regions, we need a module. Here is another example:

QUnit.module("Addition");
QUnit.test("One plus one", function (assert) {
    assert.equal(1+1, 2, "One plus one is two");
});
QUnit.module("Multiplication");
QUnit.test("Two by two", function (assert) {
    assert.equal(2*2, 4, "Two by two is four");
});

resulting in:

It may look the same, but a Module: dropdown appeared, allowing one to choose which module to test or visualize. The names of the tests also includes the module name. Unfortunately, the resulting HTML doesn't have containers for modules, something one can collapse or expand at will. That is too bad, but it can be easily fixed - this is not the scope of the post, though. A good strategy is just to put all related tests in the same Javascript file and use QUnit.module as the first line.

Asynchronicity

Another interesting issue is asynchronous testing. If we want to test functions that return asynchronously, like setTimeout or ajax calls or Promises, then we need to use assert.async. Here is an example:

QUnit.config.testTimeout = 1000;
QUnit.module("Asynchronous tests");
QUnit.test("Called after 100 milliseconds", function (assert) {
    var a=assert.async();
    setTimeout(function() {
        assert.ok(true, "Assertion was called from setTimeout");
        a();
    });
},100);

First of all, we needed to declare that we expect a result asynchronously, therefore we call assert.async() and hold a reference to the result. The result is actually a function. After we make all the assertions on the result, we call that function in order to finish the test. I've added a line before the test, though, which sets the testTimeout configuration value. Without it, an async test that fails would freeze the test suite indefinitely. You can easily test this by setting testTimeout to less than the setTimeout duration.

Asynchronous tests raise several questions, though. The example above is all nice and easy, but what about cases when the test is more complex, with multiple asynchronous code blocks that follow each other, like a Promise chain? What if the assertions themselves need to be called asynchronously, like when checking for the outcome of a click handler? If you run jQuery(selector).click() an immediately following assertion would fail, since the click handler is executed in another context, for example. One can imagine code like this, but look how ugly it is:

QUnit.test("Called after 500 milliseconds", function (assert) {
    var a = assert.async();
    setTimeout(function () {
        assert.ok(true, "First setTimeout");
        setTimeout(function () {
            assert.ok(true, "Second setTimeout");
            setTimeout(function () {
                assert.ok(true, "Third setTimeout");
                setTimeout(function () {
                    assert.ok(true, "Fourth setTimeout");
                    a();
                }, 100);
            }, 100);
        }, 100);
    }, 100);
    setTimeout(function () {
        assert.notOk(true, "Test timed out");
    }, 500)
});

In order to solve at least this arrow antipattern I've created a stringFunctions function that looks like this:

function stringFunctions() {
    if (!arguments.length)
        throw 'needs functions as parameters';
    var f = function () {};
    var args = arguments;
    for (var i = args.length - 1; i >= 0; i--) {
        (function () {
            var x = i;
            var func = args[x];
            if (typeof(func) != 'function')
                throw 'parameter ' + x + ' is not a function';
            var prev = f;
            f = function () {
                setTimeout(function () {
                    func();
                    prev();
                }, 100);
            };
        })();
    };
    f();
};

which makes the previous code look like this:

QUnit.test("Called after 500 milliseconds", function (assert) {
    var a = assert.async();
    stringFunctions(function () {
        assert.ok(true, "First setTimeout");
    }, function () {
        assert.ok(true, "Second setTimeout");
    }, function () {
        assert.ok(true, "Third setTimeout");
    }, function () {
        assert.ok(true, "Fourth setTimeout");
    }, a);
    setTimeout(function () {
        assert.notOk(true, "Test timed out");
    }, 500)
});

Of course, this is a specific case, but at least in a very common scenario - the one when the results of event handlers are checked - stringFunctions with 1ms instead of 100ms is very useful. Click on a button, see if a checkbox is available, check the checkbox, see if the value in a span has changed, stuff like that.

Testing average jQuery web code

Another thing I want to address is how to test Javascript that is intended as a web page companion script, with jQuery manipulations of the DOM and event listeners and all that. Ideally, all this would be stored in some sort of object that is instantiated with parameters that specify the test context, the various mocks and so on and so on. Since it is not an ideal world, I want to show you a way to test a typical such script, one that executes a function at DOMReady and does everything in it. Here is an example:

$(function () {

    $('#btnSomething').click(function () {
        $('#divSomethingElse').empty();
    });

});

The code assumes $ is jQuery, then it adds a handler to a button click to empty another item. Think on how this should be tested:

Declare a QUnit test
In it, execute the script
Then make some assertions

I was a bit lazy and changed the scripts themselves to check if a testContext exists and use that one. Something like this:

(function ($) {

    var global = this;
    var jQueryContext = global.testContext && global.testContext.document ? global.testContext.document : global.document;
    var chrome = global.testContext && global.testContext.chrome ? global.testContext.chrome : global.chrome;
    // etc.

    $(function () {

        $('#btnSomething', jQueryContext).click(function () {
            $('#divSomethingElse', jQueryContext).empty();
        });

    });

})(jQuery);

which has certain advantages. First, it makes you aware of all the uses of jQuery in the code, yet it doesn't force you to declare everything in an object and having to refactor everything. Funny how you need to refactor the code in order to write unit tests in order to be able to refactor the code. Automated testing gets like that. It also solves some problems with testing Javascript offline - directly from the file system, because all you need to do now is define the testContext then load the script by creating a tag in the testing page and setting the src attribute:

var script = document.createElement('script');
script.onload = function () {
    // your assertions here
};
script.src = "http://whatever.com/the/script.js";
document.getElementsByTagName('head')[0].appendChild(script);

In this case, even if you are running the page from the filesystem, the script will be loaded and executed correctly. Another, more elegant solution would load the script as a string and execute it inside a closure where jQuery was replaced with something that uses a mock document by default. This means you don't have to change your code at all, but you need to be able to read the script as a text, which is impossible on the filesystem. Some really messy script tag creation would be needed

QUnit.test("jQuery script Tests", function (assert) {

    var global = (function () {
        return this;
    })();

    function setIsolatedJquery() {
        global.originalJquery = jQuery.noConflict(true);
        var tc = global.testContext.document;
        global.jQuery = global.$ = function (selectorOrHtmlOrFunction, context) {
            if (typeof(selectorOrHtmlOrFunction) == 'function')
                return global.originalJquery.apply(this, arguments);
            var newContext;
            if (!context) {
                newContext = tc; //if not specified, use the testContext
            } else {
                if (typeof(context) == 'string') {
                    newContext = global.originalJquery(context, tc); //if context is a selector, use it inside the testContext
                } else {
                    newContext = context; // use the one provided
                }
            }
            return global.originalJquery(selectorOrHtmlOrFunction, newContext)
        }
    };
    function restoreJquery() {
        global.jQuery = global.$ = global.originalJquery;
        delete global.originalJquery;
    }

    var a = assert.async();

    global.testContext = {
        document : jQuery('<div><button id="btnSomething">Something</button><div id="divSomethingElse"><span>Content</span></div></div>')
    };
    setIsolatedJquery();

    var script = document.createElement('script');
    script.onload = function () {

        assert.notEqual($('#divSomethingElse').children().length, 0, "SomethingElse has children");
        $('#btnSomething').click();
        setTimeout(function () {
            assert.equal($('#divSomethingElse').children().length, 0, "clicking Something clears SomethingElse");
            restoreJquery();
            a();
        }, 1);
    };
    script.src = "sample.js";
    document.getElementsByTagName('head')[0].appendChild(script);

});

There you have it: an asynchronous test that replaces jQuery with something with an isolated context, loads a script dynamically, performs a click in the isolated context, checks the results. Notice the generic way in which to get the value of the global object in Javascript.

Bottom-Up or Top-Bottom approach

A last point I want to make is more theoretical. After some consultation with a colleague, I've finally cleared up some confusion I had about the direction of automated tests. You see, once you have the code - or even in TDD, I guess, you know what every small piece of code does and also the final requirements of the product. Where should you start in order to create automated tests?

One solution is to start from the bottom and check that your methods call everything they need to call in the mocked dependencies. If you method calls 'chrome.tabs.create' and you have mocked chrome, your tabs.create method should count how many times it is called and your assertion should check that the count is 1. It has the advantage of being straightforward, but also tests details that might be irrelevant. One might refactor the method to call some other API and then the test would fail, as it tested the actual implementation details, not a result. Of course, methods that return the same result for the same input values - sometimes called immutable - are perfect for this type of testing.

Another solution is to start from the requirements and test that the entire codebase does what it is supposed to do. This makes more sense, but the combination of possible test cases increases exponentially and it is difficult to spot where the problem lies if a test fails. This would be called acceptance testing.

Well, the answer is: both! It all depends on your budget, of course, as you need to take into consideration not only the writing of the tests, but their maintenance as well. Automated acceptance tests would not need to change a lot, only when requirements change, while unit tests would need to be changed whenever the implementation is altered or new code is added.

Conclusion

I am not an expert on unit testing, so what I have written here describes my own experiments. Please let me know if you have anything to add or to comment. My personal opinion on the matter is that testing provides a measure of confidence that minimizes the stress of introducing changes or refactoring code. It also forces people to think in terms of "how will I test this?" while writing code, which I think is great from the viewpoint of separation of concerns and code modularity. On the other hand it adds a relatively large resource drain, both in writing and (especially) in maintaining the tests. There is also a circular kind of issue where someone needs to test the tests. Psychologically, I also believe automated testing only works for certain people. Chaotic asses like myself like to experiment a lot, which makes testing a drag. I don't even know what I want to achieve and someone tries to push testing down my throat. Later on, though, tests would be welcome, if only my manager allows the time for it. So it is, as always, a matter of logistics.

More info about unit testing with QUnit on their page.

Getting thumbnail images for the most popular embedded video providers out there

Published Apr 21, 2016

and has 0 comments

During one revamp of the blog I realized that I didn't have images for some of my posts. I had counted pretty much on the Blogger system that provides a post.thumbnailUrl post metadata that I can use in the display of the post, but the url is not always there. Of course if you have a nice image in the post somewhere prominently displayed, the thumbnail URL will be populated, but what if you have a video? Surprisingly, Blogger has a pretty shitty video to thumbnail mechanism that prompted me to build my own.

So the requirements would be: get me the image representing a video embedded in my page, using Javascript only.

Well, first of all, videos can be actual video tags, but most of the time they are iframe elements coming from a reliable global provider like YouTube, Dailymotion, Vimeo, etc, and all the information available is the URL of the display frame. Here is the way to get the thumbnail for these scenarios:

YouTube

Given the iframe src value:

// find youtube.com/embed/[videohash] or youtube.com/embed/video/[videohash]
var m = /youtube\.com\/embed(?:\/video)?\/([^\/\?]+)/.exec(src);
if (m) {
    // the thumbnail url is https://img.youtube.com/vi/[videohash]/0.jpg
    imgSrc = 'https://img.youtube.com/vi/' + m[1] + '/0.jpg';
}

If you have embeds in the old object format, it is best to replace them with the iframe one. If you can't change the content, it remains your job to create the code to give you the thumbnail image.

Dailymotion

Given the iframe src value:

//find dailymotion.com/embed/video/[videohash]
var m=/dailymotion\.com\/embed\/video\/([^\/\?]+)/.exec(src);
if (m) {
    // the thumbnail url is at the same URL with `thumbnail` replacing `embed`
    imgSrc=src.replace('embed','thumbnail');
}

Vimeo

Vimeo doesn't have a one URL thumbnail format that I am aware of, but they have a Javascript accessible API.

// find vimeo.com/video/[videohash]
m=/vimeo\.com\/video\/([^\/\?]+)/.exec(src);
if (m) {
    // set the value to the videohash initially
    imgSrc=m[1];
    $.ajax({
        //call the API video/[videohash].json
        url:'https://vimeo.com/api/v2/video/'+m[1]+'.json',
        method:'GET',
        success: function(data) {
            if (data&&data.length) {
                // and with the data replace the initial value with the thumbnail_medium value
                replaceUrl(data[0].thumbnail_medium,m[1]);
            }
        }
    });
}

In this example, the replaceUrl function would look for img elements to which the videohash value is attached and replace the url with the correct one, asynchronously.

TED

I am proud to announce that I was the one pestering them to make their API available over Javascript.

// find ted.com/talks/[video title].html
m=/ted\.com\/talks\/(.*)\.html/.exec(src);
if (m) {
    // associate the video title with the image element
    imgSrc=m[1];
    $.ajax({
        // call the oembed.json?url=frame_url
        url:'https://www.ted.com/services/v1/oembed.json?url='+encodeURIComponent(src),
        method:'GET',
        success: function(data) {
            // set the value of the image element asynchronously
            replaceUrl(removeSchemeForHttpsEnabledSites(data.thumbnail_url),m[1]);
        }
    });
    return false;
}

video tags

Of course there is no API to get the image from an arbitrary video URL, but the standard for the video tag specifies a poster attribute that can describe the static image associated with a video.

// if the element is a video with a poster value
if ($(this).is('video[poster]')) {
    // use it
    imgSrc=$(this).attr('poster');
}

WordsAlive.js - make the words on your page come alive! Javascript library

Published Apr 13, 2016

and has 0 comments

I had this crazy idea that I could make each word on my page come alive. The word "robot" would walk around the page, "explosion" would explode, "rotate" would rotate, color words would appear in the color they represent, no matter how weird named like OliveDrab , Chocolate , Crimson , DeepPink , DodgerBlue and so on, "radioactive" would pulse green, "Siderite" would appear in all its rocking glory and so on. And so I did!

The library is on GitHub and I urge you to come and bring your own ideas. Every effect that you see there is an addon that can be included or not in your setup.

Also see directly on GitHub pages.