This is something that appeared in C# 5, so a long time ago, with .NET 4.5, but I only found out about it recently. Remember when you wanted to know the name of a property when doing INotifyPropertyChanged? Or when you wanted to log the name of the method that was calling? Or you wanted to know which line in which source file is responsible for calling a certain piece of code? All of this can be done with the Caller Information feature.

  And it is easy enough to use, just decorate a method parameter with an explicit default value with any of these three attributes:

The parameter value, if not set when calling the method, will be filled in with the member name or file name or line number. It's something that the compiler does, so no overhead from reflection. Even better, it works on the caller of the method, not the interior of the method. Imagine you had to write a piece of code to do the same. How would you reference the name of the method calling the method you are in?

Example from Microsoft's site:

public void DoProcessing()
{
    TraceMessage("Something happened.");
}

public void TraceMessage(string message,
        [System.Runtime.CompilerServices.CallerMemberName] string memberName = "",
        [System.Runtime.CompilerServices.CallerFilePath] string sourceFilePath = "",
        [System.Runtime.CompilerServices.CallerLineNumber] int sourceLineNumber = 0)
{
    System.Diagnostics.Trace.WriteLine("message: " + message);
    System.Diagnostics.Trace.WriteLine("member name: " + memberName);
    System.Diagnostics.Trace.WriteLine("source file path: " + sourceFilePath);
    System.Diagnostics.Trace.WriteLine("source line number: " + sourceLineNumber);
}

// Sample Output:
//  message: Something happened.
//  member name: DoProcessing
//  source file path: c:\Visual Studio Projects\CallerInfoCS\CallerInfoCS\Form1.cs
//  source line number: 31

  So you want to use a queue, a structure that has items added at one side and removed on another, in Javascript code. Items are added to the tail of the queue, while they are removed at the head. We, Romanians, are experts because in the Communist times resources were scarce and people often formed long queues to get to them, sometimes only on the basis of rumour. They would see a line of people and ask "Don't they have meat here?" and the answer would come "No, they don't have milk here. It's next building they don't have meat at". Anyway...

  There is an option that can be used directly out of the box: the humble array. It has methods like .push (add an item), .pop (remove the latest added item - when you use it as a stack) and .shift (remove the oldest added item - when you use it as a queue). For small cases, that is all you need.

  However, I needed it in a high performance algorithm and if you think about it, removing the first element of an array usually means shifting (hence the name of the function) all elements one slot and decreasing the length of the array. Consider a million items array. This is not an option.

  One of the data structure concepts we are taught at school is the linked list. Remember that? Each item has a reference to the next (and maybe the previous) item in the list. You explore it by going from one item to the next, without indexing, and you can remove any part of the list or add to any part of the list just by changing the value of these references. This also means that for each value you want stored you have the value, the reference(s) and the overhead of handling a more complex data object. Again, consider a million numbers array. It's not the right fit for this problem.

  Only one option remains: still using an array, but moving the start and the end of the array in an abstract manner only, so that all queue/dequeue operations take no effort. This means keeping a reference to the tail and the head of the queue in relation to the length of the queue and of the underlying array.

  But first let's establish a baseline. Let's write a test and implement a queue using the default array pop/shift implementation:

// the test
const size = 100000;
const q=new Queue();
time(()=> { for (let i=0; i<size; i++) q.enqueue(i); },'Enqueue '+size+' items');
time(()=> { for (let i=0; i<size; i++) q.dequeue(i); },'Dequeue '+size+' items');
time(()=> { for (let i=0; i<size/10; i++) {
	for (let j=0; j<10; j++) q.enqueue(i);
	for (let j=0; j<9; j++) q.dequeue(i);
} },'Dequeue and enqueue '+size+' items');

// the Queue implementation
class Queue {
  constructor() {
    this._arr = [];
  }

  enqueue(item) {
    this._arr.push(item);
  }

  dequeue() {
    return this._arr.shift();
  }
}

// the results
Enqueue 100000 items, 10ms
Dequeue 100000 items, 1170ms
Dequeue and enqueue 100000 items, 19ms

  The Enqueue operation is just adding to an array, enqueuing and dequeuing by leaving an item at ever series of dequeues is slightly slower, as the amount of array shifting is negligible. Dequeuing, though, is pretty heavy. Note that increasing just a little bit the amount of items leads to an exponential increase in time:

Enqueue 200000 items, 12ms
Dequeue 200000 items, 4549ms
Dequeue and enqueue 200000 items, 197ms

  Now let's improve the implementation of the queue. We will keep enqueue using Array.push, but use a _head index to determine which items to dequeue. This means faster speed, but the queue will never shorten. It's the equivalent of Romanians getting their product, but remaining in the queue.

// the Queue implementation
class Queue {
  constructor() {
    this._arr = [];
    this._head = 0;
  }

  enqueue(item) {
    this._arr.push(item);
  }

  dequeue() {
    if (this._head>=this._arr.length) return;
    const result = this._arr[this._head];
    this._head++;
    return result;
  }
}

// the results
Enqueue 200000 items, 11ms
Dequeue 200000 items, 4ms
Dequeue and enqueue 200000 items, 11ms

  The performance has reached the expected level. Dequeuing is now even faster than enqueuing because it doesn't need to expand the array as items are added. However, for all scenarios the queue is only growing, even when dequeuing all the items. What I can do is reuse the slots of the dequeued items for the items to be added. Now it gets interesting!

  My point is that right now we can improve the functionality of our queue by replacing dequeued but still stored items with newly enqueued items. That is the equivalent of Romanians leaving the queue only after they get the meat and a new Romanian comes to take their place. If there are more people coming than getting served, then people that got their meat will all leave and we can just add people to the tail of the queue.

  So let's recap the algorithm:

  • we will use an array as a buffer
  • the queue items start at the head and end at the tail, but wrap around the array buffer
  • whenever we add an item, it will be added in the empty space inside the array and the tail will increment
  • if there is no empty space (queue length is the same as the array length) then the array will be rearranged so that it has space for new itms
  • when we dequeue, the item at the head will be returned and the head incremented
  • whenever the head or tail reach the end of the array, they will wrap around

Some more improvements:

  • if we enqueue a lot of items then dequeue them, the array will not decrease until we dequeue them all. An improvement is to rearrange the array whenever the queue length drops below half of that of the array. It will add computation, but reduce space.
  • when we make space for new items (when the array size is the same as the one of the logical queue) we should add more space than just 1, so I will add the concept of a growth factor and the smallest size increase.

Here is the code:

/**
 * A performant queue implementation in Javascript
 *
 * @class Queue
 */
class Queue {

    /**
     *Creates an instance of Queue.
     * @memberof Queue
     */
    constructor() {
        this._array = [];
        this._head = 0;
        this._tail = 0;
        this._size = 0;
        this._growthFactor = 0.1;
        this._smallestSizeIncrease = 64;
    }

    /**
     * Adding an iterator so we can use the queue in a for...of loop or a destructuring statement [...queue]
     */
    *[Symbol.iterator]() {
        for (let i = 0; i < this._size; i++) {
            yield this.getAt(i);
        }
    }

    /**
     * Returns the length of the queue
     *
     * @readonly
     * @memberof Queue
     */
    get length() {
        return this._size;
    }

    /**
     * Get item based on item in the queue
     *
     * @param {*} index
     * @returns
     * @memberof Queue
     */
    getAt(index) {
        if (index >= this._size) return;
        return this._array[(this._head + index) % this._array.length];
    }

    /**
     * Gets the item that would be dequeued, without actually dequeuing it
     *
     * @returns
     * @memberof Queue
     */
    peek() {
        return this.getAt(0);
    }

    /**
     * Clears the items and shrinks the underlying array
     */
    clear() {
        this._array.length = 0;
        this._head = 0;
        this._tail = 0;
        this._size = 0;
    }

    /**
     * Adds an item to the queue
     *
     * @param {*} obj
     * @memberof Queue
     */
    enqueue(obj) {
        // special case when the size of the queue is the same as the underlying array
        if (this._size === this._array.length) {
            // this is the size increase for the underlying array
            const sizeIncrease = Math.max(this._smallestSizeIncrease, ~~(this._size * this._growthFactor));
            // if the tail is behind the head, it means we need to move the data from the head to 
            // the end of the array after we increase the array size
            if (this._tail <= this._head) {
                const toMove = this._array.length - this._head;
                this._array.length += sizeIncrease;
                for (let i = 0; i < toMove; i++) {
                    this._array[this._array.length - 1 - i] = this._array[this._array.length - 1 - i - sizeIncrease];
                }
                this._head = (this._head + sizeIncrease) % this._array.length;
            }
            else
            // the array size can just increase (head is 0 and tail is the end of the array)
            {
                this._array.length += sizeIncrease;
            }
        }
        this._array[this._tail] = obj;
        this._tail = (this._tail + 1) % this._array.length;
        this._size++;
    }

    /**
     * Removed the oldest items from the queue and returns it
     *
     * @returns
     * @memberof Queue
     */
    dequeue() {
        if (this._size === 0) {
            return undefined;
        }
        const removed = this._array[this._head];
        this._head = (this._head + 1) % this._array.length;
        this._size--;
        // special case when the size of the queue is too small compared to the size of the array
        if (this._size > 1000 && this._size < this._array.length / 2 - this._smallestSizeIncrease) {
            if (this._head<this._tail) {
                this._array = this._array.slice(this._head,this._tail);
            } else {
                this._array=this._array.slice(this._head, this._array.length).concat(this._array.slice(0,this._tail));
            }
            this._head = 0;
            this._tail = 0;
        }   
        return removed;
    }
}

  Final notes:

  • there is no specification on how an array should be implemented in Javascript, therefore I've used the growth factor concept, just like in C#. However, according to James Lawson, the array implementation is pretty smart in modern Javascript engines, we might not even need it.
  • the optimization in dequeue might help with space, but it could be ignored if what you want is speed and don't care about the space usage
  • final benchmarking results are:
    Enqueue 200000 items, 15ms, final array size 213106
    Dequeue 200000 items, 19ms, final array size 1536
    Dequeue and enqueue 200000 items, 13ms, final array size 20071

  While working on my pet project Linqer (LINQ for Javascript and Typescript) I've spent quite a lot of time improving the performance of the Quicksort algorithm I am using for .orderBy. Therefore I am publishing it here, even if you could extract it just the same from the Linqer sources, with limited discussion on what is going on.

  Why

  First, why use it at all? Doesn't Javascript have the .sort method in the Array class? What's wrong with that?

  The answer is that the implementation for sort is different from browser to browser, or better said, from Javascript engine to Javascript engine. In Chrome, for example, the algorithm used is insertion sort, which is simple, in place, stable and reasonably fast. It is optimized for the most common usage: small arrays that need to be sorted for UI purposes and such. However, when using large arrays, the algorithm doesn't perform as well as one might expect.

  For Linqer I had an additional reason, because I would use ordering followed by skip and take methods that limited the scope of the need for sorting. Imagine a million items array that I wanted ordered and then needed the first ten items. Sorting the entire thing for just ten items would have been overkill. The default .sort function doesn't have parameters for such scenarios.

  And there is another reason: the default function used to compare array items is alphanumeric. [1, 2, 10] would get ordered as [1, 10, 2].

  Second, why Quicksort? There are a bunch of sorting algorithms out there. Mergesort, Heapsort, Radixsort, Timsort, Selectionsort. What's so special about Quicksort.

  I have to admit that I went for it by googling fast sorting algorithm. It does have "quick" in the name, doesn't it? I also found it elegant and easy to comprehend. And for my particular scenario, I liked that it used a divide et impera strategy which allowed me to ignore parts of the array if I didn't need the items there. In other words, it's very well suited both as a general sorting algorithm and a partial sorting algorithm.

  What

  I would like to tell you that it's simple to explain what Quicksort does, but it requires some amount of attention and time. In general terms, it chooses an arbitrary item (called a pivot) then orders the remaining items relative to the pivot, in two so called partitions: the smaller items to the left, the larger to the right. Then it repeats the process for each of the two sides. How the pivot is chosen and how the partitions are handled is what differentiates Quicksort algorithms and determines their performance.

  It is an in place algorithm, meaning it doesn't copy the array in some other type of structure and instead it moves items around inside it. It is not a stable algorithm, meaning the order of "equal" items is not preserved. The average computational complexity is O(n log n), with the worst cases O(n^2). The space complexity is harder to determine. Most people say it is O(1) because it uses no additional data structures, but that is not really correct. Being a recursive algorithm, the call stack gets used quite a lot, an invisible storage that should be computed in the data complexity.

  Unfortunately, the worst case scenarios are also very common: already sorted arrays and arrays filled with the same value. There are various optimizations to be used in order to handle this sort of thing. Also, Quicksort is efficient with large quantities of data, but less so with small numbers of items.

  How

  Finally, we get to the code. The _quicksort function receives:

  • an array
  • left and right index values determining the inclusive area that will be sorted (usually 0 and array.length-1)
  • a comparer function (item1,item2)=> 1, 0 or -1 and that defaults to _defaultComparer which tries to sort items based on the > and < operators
  • min and max index values determining the window of the array that we need to have sorted

The left and right indexes determine which section (before the sort) of the array will be sorted, the min and max indexes determine which items I am interested in (after the sort). This allows me to skip ordering partitions that are outside my area of interest.

As I said, the pivot choice is important. Some strategies are very popular:

  • the last item in the array as the pivot
    • this is the strategy used in the original incarnation of Quicksort
    • leads to very poor performance when the array is already sorted
  • the median item
    • this suggests parsing the array in order to get the value, implying extra computation
    • it only makes sense when the values in the array are numbers
  • the average between the first, the last and the middle item
    • it only makes sense when the values in the array are numbers
  • the item that is in the middle of the array
    • this is the one that I am using
  • a random item in the array
    • this makes the algorithm escape scenarios where the performance would be bad
    • the outcome of the sorting is unpredictable in terms of time used and stability of items
  • multiple pivots
    • an interesting concept, but one that complicated the algorithm too much for comfort

Then there is the matter of the partitioning. I've used an optimization that involves two indexes, one at the start the other at the end of a partition, coming toward each other and swapping items that are on the wrong side of the pivot. In some implementations, if the pivot is the last item, the partitioning is from one side only. In others, multiple indexes are used to handle multiple pivots.

In most implementations, the algorithm recurses on _quicksort, but I refactored it to only recurse on the partitioning. Then, because I didn't want to get stack overflows when bad data was used, I've eliminated the recursion and instead used a stack of my own where the partitions to be sorted are stored and wait their turn. This is where the data complexity comes around. In my case I was using a little more data than I actually needed, because I was adding partitions to the stack and also incrementing the index of the current partition, meaning the stack array grew with handled partitions. Even if there is no computation performance benefit, I've optimized this as well by adding a queueIndex which is used to recycle the slots in the partition array that are behind the partitionIndex. New partitions are being added behind the partitionIndex and queueIndex is incremented. When the loop reaches the last partition in the stack, a new loop is started with the partitions from 0 to queueIndex. Thus, for a ten million item array the partition stack rarely goes above 40000 in length.

A further optimization is to use insertion sort on partitions that have become too small (under 64 items). It irks me to have had to do this, I would have liked to use a "pure" algorithm, but this improved the performance and minimized the size of the partition stack.

The Code

That's about it. Here is the code:

    function _insertionsort(arr, leftIndex, rightIndex, comparer) {
        for (let j = leftIndex; j <= rightIndex; j++) {
            const key = arr[j];
            let i = j - 1;
            while (i >= leftIndex && comparer(arr[i], key) > 0) {
                arr[i + 1] = arr[i];
                i--;
            }
            arr[i + 1] = key;
        }
    }
    function _swapArrayItems(array, leftIndex, rightIndex) {
        const temp = array[leftIndex];
        array[leftIndex] = array[rightIndex];
        array[rightIndex] = temp;
    }
    function _partition(items, left, right, comparer) {
        const pivot = items[(right + left) >> 1];
        while (left <= right) {
            while (comparer(items[left], pivot) < 0) {
                left++;
            }
            while (comparer(items[right], pivot) > 0) {
                right--;
            }
            if (left < right) {
                _swapArrayItems(items, left, right);
                left++;
                right--;
            }
            else {
                if (left === right)
                    return left + 1;
            }
        }
        return left;
    }
    const _insertionSortThreshold = 64;
    function _quicksort(items, 
                        left, right, comparer = _defaultComparer,
                        minIndex = 0, maxIndex = Number.MAX_SAFE_INTEGER) {
        if (!items.length)
            return items;
        const partitions = [];
        partitions.push({ left, right });
        while (partitions.length) {
            ({ left, right } = partitions.pop());
            if (right - left < _insertionSortThreshold) {
                _insertionsort(items, left, right, comparer);
                continue;
            }
            const index = _partition(items, left, right, comparer);
            if (left < index - 1 && index - 1 >= minIndex) {
                partitions.push({ left, right: index - 1 });
            }
            if (index < right && index < maxIndex) {
                partitions.push({ left: index, right });
            }
        }

        return items;
    }
    _defaultComparer = (item1, item2) => {
        if (item1 > item2)
            return 1;
        if (item1 < item2)
            return -1;
        return 0;
    };

So I was trying to optimize a sorting algorithm, only the metrics did not make any sense. On the test page I had amazing performance, on the other it was slow as hell. What could possibly be the problem?

The difference between the two tests was that one was sorting inline (normal array reading and writing) and the other was using a more complex function and iterated a data source. So I went to test the performance of iterating itself.

The code tests the speed of adding all the items of a large array in three cases:

  • a classic for...in loop that increments an index and reads the array at that index
  • a for...of loop that iterates the items of the array directly
  • a for...of loop that iterates over a generator function that yields the values of the array
time(()=>{ let sum=0; for (let i=0; i<arr.length; i++) sum+=arr[i]; },'for in');
time(()=>{ let sum=0; for (const v of arr) sum+=v; },'iterator for of');
time(()=>{ let sum=0; for (const v of (function*(){ for (let i=0; i<arr.length; i++) yield arr[i]; })()) sum+=v; },'generator for of');

time is a function I used to compute the speed of the execution. The array is 100 million integers. And here are the results:

for in: 155.12999997008592
for of: 1105.7250000303611
for of: 2309.88499999512

I haven't yet processed what it means, but I really thought using an iterator was going to be at least as fast as a for loop that uses index accessing to read values. Instead there is a whooping 7 to 14 times decrease in speed.

So from now on I will avoid for...of in high performance scenarios.

Just a thing I learned today: using the bitwise not operator (~) on a number in Javascript ignores its fractional part (it converts it to integer first), therefore using it twice gives you the integer part of original number. Thanks to fetishlace for clarifications.

Notes:

  • this is equivalent to (int)number in languages that support the int type
  • this is equivalent to Math.trunc for numbers in the integer range
  • this is equivalent to Math.floor only for positive numbers in the integer range

Examples:
~~1.3 = 1
~~-6.5432 = -6
~~(2 ** 32 + 0.5) = 0
~~10000000000 = 1410065408

P.S. If you don't know what ** is, it's called the Exponentiation operator and it came around in Javascript ES2016 (ES7) and it is the equivalent of Math.pow. 2 ** 3 = Math.pow(2,3) = 8

  Sometimes a good regular expression can save a lot of time and lead to a robust, yet flexible system that works very efficiently in terms of performance. It may feel like having superpowers. I don't remember when exactly I've decided they were useful, but it happened during my PHP period, when the Linux command line system and its multitude of tools made using regular expressions a pretty obvious decision. Fast forward (waaaay forward) and now I am a .NET developer, spoiled by the likes of Visual Studio and the next-next approach to solving everything, yet I still use regular expressions, well... regularly, sometimes even when I shouldn't. The syntax is concise, flexible and easy to use most of the time.

  And yet I see many senior developers avoiding regular expressions like they were the plague. Why is that? In this post I will argue that the syntax makes a regular pattern be everything developers hate: unreadable and hard to maintain. Having many of the characters common in both XML and JSON have special meaning doesn't help either. And even when bent on learning them, having multiple flavors depending on the operating system, language and even the particular tool you use makes it difficult. However, using small incremental steps to get the job done and occasionally look for references to less used features is usually super easy, barely an inconvenience. As for using them in your code, new language features and a few tricks can solve most problems one has with regular expressions.

 The Syntax Issue

The most common scenario for the use of regular expressions is wanting to search, search and replace or validate strings and you almost always start with a bunch of strings that you want to match. For example, you want to match both dates in the format 2020-01-21 and 21/01/2020. Immediately there are problems:

  • do you need to escape the slashes?
  • if you match the digits, are you going for two digit month and day segments or do you also accept something like 21/1/2020?
  • is there the possibility of having strings in your input that look like 321/01/20201, in which case you will still match 21/01/2020, but it's clearly not a date?
  • do you need to validate stuff like months being between 1-12 and days between 1-31? Or worse, validate stuff like 30 February?

But all of these questions, as valid as they are, can be boiled down to one: given my input, is my regular expression matching what I want it to match? And with that mindset, all you have to do is get a representative input and test your regular expression in a tester tool. There are many out there, free, online, all you have to do is open a web site and you are done. My favourite is RegexStorm, because it tests .NET style regex, but a simple Google search will find many and varied tools for reading and writing and testing regular expressions.

The syntax does present several problems that you will hit every time:

  • you cannot reuse parts of the pattern
    • in the example above, even if you have clearly three items that you look for - year, month, day - you will need to copy/paste the pattern for each variation you are looking for
  • checking the same part of the input string multiple times is not what regular expressions were created for and even those that support various methods to do that do it in a hackish way
    • example: find a any date in several formats, but not the ones that are in March or in 2017
    • look behind and look ahead expressions are usually used for scenarios like this, but they are not easy to use and reduce a lot of the efficiency of the algorithm
  • classic regular expression syntax doesn't support named groups, meaning you often need to find and maintain the correct index for the match
    • what index does one of the capturing groups have?
    • if you change something, how do other indexes change?
    • how do you count groups inside groups? Do they even count if they match nothing?
  • the "flags" indicating how the regular engine should interpret the pattern are different in every implementation
    • /x/g will look for all x characters in the string in Javascript
    • C# doesn't even need a global flag and the flags themselves, like CaseInsensitive, are .NET enums in code, not part of the regular pattern string
  • from the same class of issues as the two above (inconsistent implementation), many a tool uses a regular expression syntax that is particular to it
    • a glaring example is Visual Studio, which does not use a fully compatible .NET syntax

  The Escape Issue

The starting point for writing any regular expression has to be a regex escaped sample string that you want to match. Escaping means telling the regular expression engine that characters from your sample string that have meaning to it are just simple characters. Example 12.23.34.56, which is an IP address, if used exactly like that as a regular pattern, will match 12a23b34c56, because the dot is a catch all special character for regular expressions. The pattern working as expected would be 12\.23\.34\.56. Escaping brings several severe problems:

  • it makes the pattern less humanly readable
    • think of a phrase in which all white space has been replaced with \s+ to make it more robust (it\s+makes\s+the\s+pattern\s+less\s+humanly\s+readable)
  • you only escape with a backslash in every flavor of regular expressions, but the characters that are considered special are different depending on the specific implementation
  • many characters found in very widely used data formats like XML and JSON are special characters in regular expressions and the escaping character for regex is a special character in JSON and also string in many popular programming languages, forcing you to "double escape", which magnifies the problem
    • this is often an annoyance when you try to store regular patterns in config files

  Readability and Maintainability

Perhaps the biggest issue with regular expressions is that they are hard to maintain. Have you ever tried to understand what a complex regular expression does? The author of the code started with some input data and clear objectives of what they wanted to match, went to regular expression testers, found what worked, then dumped a big string in the code or configuration file. Now you have neither the input data or the things that they wanted matched. You have to decompile the regular pattern in your head and try to divine what it was trying to do. Even when you manage to do that, how often do developers redo the testing step so they verify the changes in a regular expressions do what was intended?

Combine this with the escape issue and the duplication of subpatterns issue and you get every developer's nightmare: a piece of code they can't understand and they are afraid to touch, one that is clearly breaking every tenet of their religion, like Don't Repeat Yourself or Keep It Simple Silly, but they can't change. It's like an itch they can't scratch. The usual solution for code like that is to unit test it, but regular expression unit tests are really really ugly:

  • they contain a lot of text variables, on many lines
  • they seem to test the functionality of the regular expression engine, not that of the regular expression pattern itself
  • usually regular expressions are used internally, they are not exposed outside a class, making it difficult to test by themselves

  Risk

Last, but not least, regular expressions can work poorly in some specific situations and people don't want to learn the very abstract computer science concepts behind regular expression engines in order to determine how to solve them.

  • typical example is lazy modifiers (.*? instead of .*) which tell the engine to not be greedy (get the least, not the most)
    • ex: for input "ineffective" the regular expression .*n will work a lot worse than .*?n, because it will first match the entire word, then see it doesn't end with n, then backtrack until it gets to "in" which it finally matches. The other syntax just stops immediately as it finds the n.
  • another typical example is people trying to find the HTML tag that has a an attribute and they do something like \<.*href=\"something\".*\/\> and what it matches is the entire HTML document up to a href attribute and the end of the last atomic tag in the document.
  • the golden hammer strikes again
    • people start with a simple regular expression in the proof of concept, they put it in a config file, then for the real life application they continuously tweak the configured pattern instead of trying to redesign the code, until they get to some horrible monstrosity
    • a regex in an online library that uses look aheads and look behinds solves the immediate problem you have, so you copy paste it in the code and forget about it. Then the production app has serious performance problems. 

  Solutions

  There are two major contexts in which to look for solutions. One is the search/replace situation in a tool, like a text editor. In this case you cannot play with code. The most you can hope for is that you will find a tester online that supports the exact syntax for regular expressions of the tool you are in. A social solution would be to throw shade on lazy developers that think only certain bits of regular expressions should be supported and implemented and then only in one particular flavor that they liked when they were children.

  The second provides more flexibility: you are writing the code and you want to use the power of regular expressions without sacrificing readability, testability and maintainability. Here are some possible solutions:

  • start with simple stuff and learn as you go
    • the overwhelming majority of the time you need only the very basic features of regular expressions, the rest you can look up when you need them
    • if the regular expression becomes too complex it is an indication that maybe it's not the best approach
  • store subpatterns in constants than you can then reuse using templated strings
    • ex: var yearPattern = @"(?<year>\d{4})"; var datePattern = $@"\b(?:{yearPattern}-(?<month>\d{2})-(?<day>\d{2})|(?<month>\d{2})\/(?<day>\d{2})\/{yearPattern})\b";
    • the example above only stores the year in another variable, but you can store the two different formats, the day, the month, etc
    • in the end your code might look more like the Backus-Naur (BNF) syntax, in which every separate component is described separately
  • use verbatim strings to make the patterns more readable by avoiding double escaping
    • in C# use @"\d+" not "\\d+"
    • in Javascript they are called template literals and use backticks instead of quotes, but they have the same mechanism for escaping characters as normal strings, so they are not a solution
  • use simple regular expressions or, if not, abstract their use
    • a good solution is using a fluent interface (check this discussion out) that allows the expressivity of human language for something that ends up being a regular expression
    • no, I am not advocating creating your own regular expression syntax... I just think someone probably already did it and you just have to find the right one for you :)
  • look for people who have solved the same problem with regular expression and you don't have to rewrite the wheel
  • always test your regular expressions on valid data and pay attention to the time it took to get the matches
  • double check any use of the string methods like IndexOf, StartsWith, Contains, even Substring. Could you use regular expressions?
    • note that you cannot really chain these methods. Replacing a regular expression like ^http[s]?:// with methods always involves several blocks of code and the introduction of cumbersome constants:
      if (text.StartsWith("http")) {
        // 4 works now, but when you change the string above, it stops working
        // you save "http" to a constant and then use .Length, you write yet another line of code
        text=text.Substring(4);
      } else {
        return false;
      }
      if (text.StartsWith("s")) {
        text=text.Substring(1);
      }
      return text.StartsWith("://");
      
      // this can be easily refactored to
      return text
               .IfStartsWith("http")
               .IfStartsWithOrIgnore("s")
               .IfStartsWith("://");
      // but you have to write your own helper methods
      // and you can't save the result in a config file​

  Conclusion

Regular expressions look daunting. Anyone not familiar with the subject will get scared by trying to read a regular expression. Yet most regular expression patterns in use are very simple. No one actually knows by heart the entire feature set of regular expressions: they don't need to. Language features and online tools can help tremendously to make regex readable, maintainable and even testable. Regular expressions shine for partial input validation and efficient string search and replace operations and can be easily stored in configuration files or data stores. When regular expressions become too complex and hard to write, it might be a sign you need to redesign your feature. Often you do not need to rewrite a regular expression, as many libraries with patterns to solve most common problems already exist.

  I was looking into the concept of partial sorting, something that would help in a scenario where you want the k smaller or bigger items from an array of n items and k is significantly smaller than n. Since I am tinkering with LInQer, my implementation for LINQ methods in Javascript, I wanted to tackle the OrderBy(...).Take(k) situation. Anyway, doing that I found out some interesting things.

  First, the default Javascript array .sort function has different implementations depending on browser and by that I mean different sort algorithms. Chrome uses insertion sort and Firefox uses merge sort. None of them is Quicksort, the one that would function best when the number of items is large, but that has worst case complexity O(n^2) when the array is already sorted (or filled with the same value).

I've implemented a custom function to do Quicksort and after about 30000 items it becomes faster than the default one.  For a million items the default sort was almost three times slower than the Quicksort implementation. To be fair, I only tested this on Chrome. I have suspicions that the merge sort implementation might be better.

  I was reporting in a previous version of this post that QuickSort, implemented in Javascript, was faster than the default .sort function, but it seems this was only an artifact of the unit tests I was using. Afterwards, I've found an optimized version of quicksort and it performed three times faster on ten million integers. I therefore conclude that the default sort implementation leaves a lot to be desired.

  Second, the .sort function is by default alphanumeric. Try it: [1,2,10].sort() will return [1,10,2]. In order to do it numerically you need to hack away at it with [1,2,10].sort((i1,i2)=>i1-i2). In order to sort the array based on the type of item, you need to do something like: [1,2,10].sort((i1,i2)=>i1>i2?1:(i1<i2?-1:0)).

  Javascript has two useful functions defined on the prototype of Object: toString and valueOf. They can also be overwritten, resulting in interesting hacks. For some reason, the default .sort function is calling toString on the objects in the array before sorting and not valueOf. Using the custom comparers functions above we force using valueOf. I can't test this on primitive types though (number, string, boolean) because for them valueOf cannot be overridden.

  And coming back to the partial sorting, you can't do that with the default implementation, but you can with Quicksort. Simply do not sort any partition that is above and below the indexes you need to get the items from. The increases in time are amazing!

  There is a difference between the browser implemented sort algorithms and QuickSort: they are stable. QuickSort does not guarantee the order of items with equal sort keys. Here is an example: [1,3,2,4,5,0].sort(i=>i%2) results in [2,4,0,1,3,5] (first even numbers, then odd, but in the same order as the original array). The QuickSort order depends on the choice of the pivot in the partition function, but assuming a clean down the middle index, the result is [4,2,0,5,1,3]. Note that in both cases the requirement has been met and the even numbers come first.

  A little gem I stumbled upon today that I didn't know about: https://placeholder.com/. You use it really simple like https://via.placeholder.com/300x150?text=Placeholder, but there is also a shorter URL that looks like this: https://placehold.it/1280x800/8020A0/FFFFFF?text=Siderite's blog.

  The simplest syntax is https://placehold.it/300 (a 300x300 image that displays "Placeholder" and "Powered by HTML.COM").

Update: The npm package can now be used in Node.js and Typescript (with Intellisense). Huge optimizations on the sorting and new extra functions published.

Update: Due to popular demand, I've published the package on npm at https://www.npmjs.com/package/@siderite/linqer. Also, while at it, I've moved the entire code to Typescript, fixed a few issues and also create the Linqer.Slim.js file, that only holds the most common methods. The minimized version can be found at https://siderite.github.io/LInQer/LInQer.slim.min.js and again you can use it freely on your websites. Now, I hope you can use it in Node JS as well, I have no idea how to test it and frankly I don't want to. I've also added a tests.slim.html that only tests the slim version of Linqer. This version is now official and I will try to limit any breaking changes from now on. WARNING: Now the type is Linqer.Enumerable, not just Enumerable as before. If you are already using it in code, just do const Enumerable = Linqer.Enumerable; and you are set to go.

Update: LInQer can now be found published at https://siderite.github.io/LInQer/LInQer.min.js and https://siderite.github.io/LInQer/LInQer.extra.min.js and you can freely use it in your web sites. Source code on GitHub.

Update: All the methods in Enumerable are now implemented, and some extra ones as well. Optimizations have been developed for operations like orderBy().take() and .count().

  This entire thing started from a blog post that explained something that seems obvious in retrospect, but you have to actually think about it to realise it: Array functions in Javascript may seem similar to LInQ methods in C#, but they work with arrays! Every time you filter or you map something you create a new array! The post suggested creating functions that would use the modern Javascript features of iterators and generator functions and supplant the standard Array functions. It was brilliant!

  And then the post abruptly veered and swerved into a tree. The author suggested we add the pipe operator to Javascript, just like they have in functional programming languages, so that we can use those static functions with hash characters as placeholders and... ugh! It was horrid! So I decided to solve the problem in a way that was more compatible with my own sensibilities: make it work just like LInQ (or at least like the Java streams).

This is how LInQer was born. You can find the sources on GitHub at /Siderite/LInQer. The library introduces a class called Enumerable, which can wrap any iterable (meaning arrays, but also generator functions or anything that supports for...of), then expose the methods that the Enumerable class in .NET exposes. There is even a unit test file called tests.html. Open it in the browser and see the supported methods. As you can see in the image of the blog post, the same logic (filtering, mapping and slicing a 10 million item array) took 2000ms using standard Array functions and 150ms using the Enumerable class.

"But wait! It says there that Enumerable exposes static methods! What exactly did you improve?", you will ask. In .NET we have something called "extension methods", which are indeed static methods that the language understands apply to specific classes or interfaces and allows you to call them like you would instance methods. So a method that looks like public static int Sum(this IEnumerable<int> enumerable) can be used statically Enumerable.Sum(arr) or like an instance method arr.Sum(). It's syntactic sugar. But enough about C#, in Javascript I've implemented Enumerable as a wrapper, as Javascript doesn't know about interfaces or static extension methods. This class then exposes prototype functions that work the same way as in LInQ.

Here is where the magic happens:

Enumerable.prototype = {
  [Symbol.iterator]() {
    const iterator = this._src[Symbol.iterator].bind(this._src);
    return iterator();
  },

In other words I wrap _src (the original iterable object) in my Enumerable by exposing the same iterator for both! Now both initial object and its wrapper can be used in for...of loops. The difference is that the wrapper now exposes all that juicy functionality. Let's see a simple example: select (which is the logical equivalent of Array.map):

select(op) {
  _ensureFunction(op);
  const gen = function* () {
    for (const item of this) {
      yield op(item);
    }
  }.bind(this);
  return new Enumerable(gen());
},

Here I am constructing a generator function which I bind to the Enumerable instance so that inside it 'this' is always that instance. Then I am returning the wrapper over the generator. In the function I am yielding the result of the op function on each item. Because of how generator functions work, only while iterating will it need to do its work, meaning that if I use the Enumerable into a for...of loop and breaking after three items, the op function will only be applied on those items, not on all of them. In order to use the Enumerable object in regular code, that works with arrays, you just use .toArray and you are done!

Let's see the code in the tests used to compare the standard Array function use with Enumerable:

// this is the large array we are using in both code blocks:
  const largeArray = Array(10000000).fill(10);

// Standard Array functions
  const someCalculation = largeArray
                             .filter(x=>x===10)
                             .map(x=>'v'+x)
                             .slice(100,110);
// Enumerable
  const someCalculation = Linqer.Enumerable.from(largeArray)
                             .where(x=>x===10)
                             .select(x=>'v'+x)
                             .skip(100)
                             .take(10)
                             .toArray();

There are differences from the C# Enumerable, of course. Some things don't make sense, like thenBy after orderBy. It can be done, but it's complicated and in my career I've only used thenBy twice! toDictionary, toHashSet, toLookup, toList have no meaning in Javascript. Use instead toMap, toSet, toObject, toArray. Last, but not least... join. Join is so complicated to use in LInQ that I almost never used it. Also, joining is usually done in the database, so I rarely needed it. I didn't see a point in implementing it. Cast and AsEnumerable also didn't make sense. Stuff like that.

But don't fret. Any function that you want to add to this can simply be added by your own code into Enumerable.prototype! So if you really need something custom, it's easy to add without modifying the original library. Also, I have no idea how kids use and pack libraries nowadays in Javascript. I am testing all this in the browser. If you want it with modules, npm, Node.js... you can do it yourself! :D

There is one disadvantage for the prototype approach and the reason why the initial article suggested to use standalone functions: tree-shaking, the fancy word used to express the automatic elimination of unused code. But I think there is a solution for that, one that I won't implement since I believe the library is small enough and minified and compressed it will be very small indeed. The solution would involve separating each of the LINQ methods (or categories of methods, like first, firstOrDefault, last and lastOrDefault, for example) in different files. Then you can use only what you need and the files would attach the functions to the Enumerable prototype.

In fact, there are a lot of LINQ methods I have never had use for, stuff like intersect and except or append and prepend. It makes sense to create a core Enumerable class and then add other stuff only when required. But as I said, it's small enough to not matter.

Bottom line, I hope you find this library useful or at least it inspires you as it did me when I've read the original blog post from Dan Shappir.

P.S. I am not the only one that had this idea, you might also want to look at linq.js which uses a completely different approach, using regular expressions. I think the closest to what LINQ should be is Manipula, that uses custom iterators for each operation rather than use the same class. Another possibility is linq-collections, which does the work via TypeScript, apparently. Also linq.es6... I am sure there are more examples if one looks closely. Feel free to let me know if you did or know of similar work and also please give me whatever feedback you see fit. I would like this library to be useful, not just a code example for my blog.

I hope you at least heard of the concept of Unit Testing as it is one of the principal pillars of software development. Its purpose is to automatically check the functionality of isolated components of your code, but it adds a lot of benefits:

  • your code becomes more modular - you only worry about the code you change
  • your code becomes more readable - clear dependency chain and tests demonstrate in code what was the purpose of particular features
  • you gain confidence that your code works as you are making changes to it - refactoring without unit tests in place is usually dangerous
  • changing any component with another implementation does not affect the overall application

This post is a companion to the Programming a simple game in pure HTML and Javascript in the sense that it is that game that we will be testing. Also, as per the requirements of that project, we will not be using any of the numerous unit testing frameworks available for Javascript code.

When we left off the development of the game we had three files:

  • complementary.html - the structure of the page
  • complementary.css - the visual design of the page components
  • complementary.js - all the code of the game, including the initialization and the game starting bit

In order to test our individual components, we need to separate them. So let's split complementary.js in four files:

  • complementary.js - just the game start (instantiating Game and initializing it)
  • game.js - the Game class
  • color.js - the Color class
  • elements.js - the custom HTML elements and their registration

Obviously the HTML will change to load all of these Javascript files. When we get to modules, this will become a non issue.

There are things that we could test on the custom HTML elements, but let's leave that aside. Also the two lines in complementary.js will not need testing. The simplest component should be the first to be tested and that is Color.

We start by creating a new HTML file (color tests.html) and we fill it with code that checks the Color class works as expected. First the code and then the discussion:

<html>

<head>
    <script src="color.js"></script>
    <script>
        // this can easily be changed to display a nice report in this page
        const assert ={
            true:(value, message)=> {
                if (value) {
                    console.log('Test PASSED ('+message+')');
                } else {
                    console.warn('Test FAILED');
                    alert(message);
                    throw new Error(message);
                }
            }
        };
    </script>
</head>

<body>
    <script>
        // Arrange
        const color1 = new Color(123);
        const color2 = new Color(123);
        const color3 = new Color(234);
        // Act
        const equalsWorks = color1.equals(color2);
        const notEqualsWorks = !color1.equals(color3);
        // Assert
        assert.true(equalsWorks,'Expected two colors initialized with the same value to be equal');
        assert.true(notEqualsWorks,'Expected two colors initialized with different values not to be equal');
    </script>

    <script>
        // Arrange
        const color = new Color();
        const doubleInvertedColor = color.complement().complement();
        // Act
        const complementAndEqualsWork = color.equals(doubleInvertedColor);
        // Assert
        assert.true(complementAndEqualsWork,'Expected the complementary or a complementary color to be the original color');
    </script>

    <script>
        // Arrange
        const acolor = new Color(0x6789AB);
        const stringColor = acolor.toString();
        // Act
        const toStringWorks = stringColor==='#6789ab';
        // Assert
        assert.true(toStringWorks,'Expected the HTML representation of the color to be #6789ab and it was '+stringColor);
    </script>

</body>

</html>

A test should follow the AAA structure:

  • Arrange - sets up the necessary items for the test to run
    • instantiate classes to be tested
    • mock functionality of dependencies - we will see this when we test Game
  • Act - executes the code intended to be tested and acquires results
  • Assert - verifies that the test results are the ones expected

Normally, a framework would take tests written in a certain way and then produce some sort of report, with nice green and red rows, with messages, with information on where errors occurred and so on. However, I intend to demonstrate the basics of unit testing, so no framework is actually needed.

A unit test:

  • is a piece of code (who unit tests the unit test?!)
  • it requires effort, it is just as prone to bugs as normal code and requires maintenance just like any other code (shit doesn't just happen, it takes time and effort)
  • it requires infrastructure that uses it to periodically test your changes (either someone does it manually or there is some setup to run it automatically and display/email the report)

In the code above I created a script tag for each test in which I am following AAA to create Color instances and then check their functionality does what it should. A very basic assert object is handling the reporting part. It remains homework for the reader to update that part or to plug in an existing unit testing framework.

The Color class is very simple:

  • it supports initializing with a color integer representing the RGB values of the color
  • toString method that returns the HTML representation of the color
  • complement method returns the complement of the color
  • equals method checks if two colors are equal

There are no external dependencies, meaning that it needs nothing from the outside in order to work. Game, for example, requires Color, which is a dependency for Game.

Open the color tests.html file in the browser and open the development tools (F12 or Ctrl-Shift-I) and refresh the page. You should see in the console that all the tests passed. Change something in a test so it fails and it should both throw an error in the console and show you a dialog with the failing test.

Now, let's test Game. The code of the game tests.html file:

<html>

<head>
    <script src="game.js"></script>
    <script>
        // this can easily be changed to display a nice report in this page
        const assert ={
            true:(value, message)=> {
                if (value) {
                    console.log('Test PASSED ('+message+')');
                } else {
                    console.warn('Test FAILED');
                    alert(message);
                    throw new Error(message);
                }
            }
        };
    </script>
</head>

<body>
    <script>
        // Arrange
        const game = new Game();
        // Act
        // Assert
    </script>

</body>

</html>

I used the same structure, the same assert object, only I loaded game.js instead of color.js. Then I wrote a test in which I do nothing than instantiate a Game. If you execute this page it will work just fine. No errors because we have not, in fact, executed anything. We need to execute .init(document), remember?

And now it becomes apparent why I chose to initialize the document from init instead of using window.document in my code. window.document is now the document of the test page, it has no complementary-board element in it. We haven't even defined any custom HTML elements or a Color class. In fact, we can now test that calling init with no parameter will fail:

    <script>
        // Arrange
        const game = new Game();
        // Act
        let anyError = null;
        try {
            game.init();
        } catch(error) {
            anyError = error;
        }
        // Assert
        assert.true(anyError!=null,'Expected the init method of a Game class to fail if run with no parameters ('+anyError+')');
    </script>

And, indeed, if we open the page now we get a console log like this: 

Test PASSED (Expected the init method of a Game class to fail if run with no parameters (TypeError: Cannot read property 'addEventListener' of undefined))

You just learned another important characteristic of a unit test: it tests both what should work and what shouldn't. This is one of the reasons why unit testing is hard. For each piece of code you need to test when it works and when it fails as expected. Now we have to test how Game should work, and that means we get into mocking.

Mocking is when you replace a dependency with something that looks exactly the same, but does something else. For unit tests mock objects need to do the simplest things and those things must be predictable.

Let's see how one of these tests would look:

    <script>
        // Arrange
        const mockDoc = {
            addEventListener:function() {
            }
        };
        const game2 = new Game();
        // Act
        let anyError2 = null;
        try {
            game2.init(mockDoc);
        } catch(error) {
            anyError2 = error;
        }
        // Assert
        assert.true(anyError2==null,'Expected the init method of a Game class to not fail if run with correct parameters ('+anyError2+')');
    </script>

Just by providing an object with an addEventListener function, the initialization of the game now works. mockDoc is a mocked document and this is called mocking.

Let's look at how would a "happy path" test look, one that assumes everything goes correctly and moves through and entire flow:

    <script>
        // Arrange
        const mockDoc3 = {
            testData : {},
            addEventListener:function(eventName,eventHandler) {
                this.testData.eventName = eventName;
                this.onLoad = eventHandler;
            },
            getElementsByTagName:function(tagName) {
                this.testData.tagName = tagName;
                return [this.mockBoard];
            },
            mockBoard: {
                setChoiceHandler:function(handler) {
                    this.choiceHandler=handler;
                }
            }
        };
        const game3 = new Game();
        // Act
        let anyError3 = null;
        try {
            game3.init(mockDoc3);
        } catch(error) {
            anyError3 = error;
        }
        Math = {
            random:function() {
                return 0.4;    
            },
            floor:function(x) { return x; },
            round:function(x) { return x; },
            pow:function(x,p) { if (p==1) return x; else throw new Error('Expected 1 as the exponent, not '+p); }
        };
        Color = {
            index:0,
            new:function(value) {
                return {
                    val: value,
                    equals: function(x) { return x.val==this.val; },
                    complement: function() { return Color.new(1000-this.val); }
                };
            },
            random:function() {
                this.index++;
                return Color.new(this.index*10);
            }
        }
        mockDoc3.onLoad();
        mockDoc3.mockBoard.choiceHandler(3);
        // Assert
        assert.true(anyError3==null,'Expected the init method of a Game class to not fail if run with correct parameters ('+anyError3+')');
        assert.true(mockDoc3.testData.eventName==='DOMContentLoaded','DOMContentLoaded was not handled!');
        assert.true(mockDoc3.testData.tagName==='complementary-board','Game is not looking for a complementary-board element');
        assert.true(game3._roundData.guideColor.val === 10,'Guide color object expected to have val=10 ('+JSON.stringify(game3._roundData.guideColor)+')');
        assert.true(game3._roundData.tries.size === 1,'Expected 1 unsuccessful try ('+JSON.stringify(game3._roundData.tries)+')');
        assert.true(game3._roundData.tries.has(3),'Expected unsuccessful try with value of 3 ('+JSON.stringify(game3._roundData.tries)+')');
        assert.true(game3._log.length === 0,'Expected no score after one unsuccessful try ('+JSON.stringify(game3._log)+')');
        // Act 2 (heh!)
        mockDoc3.mockBoard.choiceHandler(2);
        // Assert 2
        assert.true(game3._roundData.guideColor.val === 60,'Guide color object expected to have val=60 ('+JSON.stringify(game3._roundData.guideColor)+')');
        assert.true(game3._roundData.tries.size === 0,'Expected no unsuccessful tries after correct response ('+JSON.stringify(game3._roundData.tries)+')');
        assert.true(game3._log.length === 1,'Expected one item of score after correct answer ('+JSON.stringify(game3._log)+')');
        assert.true(game3._log[0] === 50,'Expected 50 as the score after one fail and one correct answer ('+JSON.stringify(game3._log)+')');
        
    </script>

There is a lot to unpack here, including the lazy parts of the test. Here are some of the issues with it:

  • it doesn't follow the AAA pattern, it asserts, then acts again, then asserts again
    • while this works, it doesn't encapsulate testing of a single feature
    • it is the equivalent of the classes or methods with multiple responsibilities from normal code
    • the correct way to do it is to write another test, have two choiceHandler calls in the Act part and assert only that particular path
  • it tests internal functionality
    • in order to write the test I had to look at how the Game class works internally and then tailor the test so it works
    • it accesses private data like _log and _roundData
  • the Game class did not abstract all of its dependencies
    • this is painfully obvious when mocking the Math object - in Javascript this is easy, but in other languages the Math functionality comes as an interface (a declaration of available members with no implementation)
    • this is not a test problem, but a Game problem which makes testing tedious
  • test contains logic
    • look at the Math and Color mocks, how they compute values and return objects
  • some values there are particularly chosen to "work"
    • have you noticed how random returns 0.4 so that multiplied with 5 (number of choices) it returns 2, which then is equal with the correct choice?
  • test is not independent
    • Math and Color are replaced with some static objects, thus changing the environment for following tests

But it works, even if it's not very well written. Here are some of its good features:

  • it tests a full game round with one bad and one good choice
  • it doesn't try to go around randomness by adding more code in the assertion part
    • it could have easily gone that way in order to determine the correct color in a random list, thus replicating or reverse engineering the Game logic into the test
  • all the expected results are completely predictable (the score values, the indexes, etc)
  • it tests only Game functionality, not others
    • I could have not mocked Color, loading color.js and thus relying in a single test on functionality from a dependency

The lessons learned from this tell us that both the Game code and the test code could have been written better in regard to maintainability, with dependencies clearly declared, easy to mock or replace. In fact, the greatest gain of a company with hiring a senior developer is not on how well code works, but how much time is saved and how much risk is avoided when the code is written with separation of concerns and unit testing in mind.

It is funny, but when a piece of code is not testable, writing a single unit test forces you to refactor it into a good shape. The rest of the tests only test functionality, as the class has been rewritten with testing in mind. So that is my advice: whenever you write something, even a silly game like Complementary, write one "happy path" unit test per class.

Homework for you: rewrite the Game class and its test class so that testing becomes easier and more correct.

This is a side post to Programming a simple game in pure HTML and Javascript in the sense that we are going to add that project to source control in GitHub using Visual Studio Code.

The first step is to create a repository. You need to create a GitHub account first, then either go to Your Repositories

or click the green New button on the top of your repositories list in the main page.

Make sure you give the repository a name and a decent description. I recommend adding a README and a licence (I use MIT), too.

Now that you have a remote repo on GitHub, you need to remember its URL, you will need it later.

Then download Git from their website. The Windows installer is simple to find, download and Next-next-next to success.

Open Visual Studio Code in your project's folder. On the left side menu there are a bunch of vertical buttons, click on the Source Control one 

Then initialize your local repository by clicking on the plus sign and selecting your project folder: 

Now you need to connect your local repo to your remote repo and you do this by typing some stuff in the Visual Studio Code Terminal window:

  1. set the remote by typing git remote add origin <your GitHub repo URL>
  2. verify the remote by typing git remote -v
  3. run a git pull to make sure your project loads the readme and license file by typing git pull origin master

Last step is to push your local files to the remote repository. You do this by

  1. clicking on the check mark at the top of the Source Control window and giving a description to this first commit

  2. pushing/publishing the commit, by clicking on the ... mark and choosing Publish Branch

There you have it. Your project is on source control.

You do this in order to not lose your changes, in order to track your changes forwards and backwards in time and to publish your work for others to see it (like other devs or potential employers)

Verify that all went well by refreshing your GitHub project page. The files in your local folder should now be visible in the file list on the page.

Now, every time you make changes to your code you push the changes on GitHub by committing (the check mark) and then pushing/syncing/publishing (from the ... menu)

The code for this series of posts can be found at https://github.com/Siderite/Complementary

  I was helping a friend with basic programming and I realized that I've been so caught up with the newest fads and development techniques that I've forgotten about simple programming, for fun, with just the base principles and tools provided "out of the box". This post will demonstrate me messing up writing a game using HTML and Javascript only.

Mise en place

This French phrase is used in professional cooking to represent the preparation of ingredients and utensils before starting the actual cooking. We will need this before starting developing our game:

  • description: the game will show a color and the player must choose from a selection of other colors the one that is complementary
    • two colors are complementary if when they are mixed, they cancel each other out, resulting in a grayscale "color" like white, black or some shade of gray. Wait! Was that the metaphor in Fifty Shades of Grey?
  • technological stack: HTML, Javascript, CSS
    • flavor of Javascript: ECMAScript 2015 (also known as ES6)
    • using modules: no - this would be nice, but modules obey CORS, so you won't be able to run it with the browser from the local file system.
    • unit testing: yes, but we have to do it as simply as possible (no external libraries)
  • development IDE: Visual Studio Code
    • it's free and if you don't like it, you can just use Notepad to the same result
  • source control: Git (on GitHub)

Installing Visual Studio Code

Installing VS Code is just as simple as downloading the installer and running it.

Then, select the Open Folder option, create a project folder (let's call it Complementary), then click on Select Folder.

The vanilla installation will help you with syntax highlighting, code completion, code formatting.

Project structure

For starters we will need the following files:

  • complementary.html - the actual page that will be open by the browser
  • complementary.js - the Javascript code
  • complementary.css - the CSS stylesheet

Other files will be added afterwards, but this is the most basic separation of concerns: code and data in the .js file, structure in .html and presentation in .css.

Starting to code

First, let's link the three files together by writing the simplest HTML structure:

<html>
    <head>
        <link rel="stylesheet" href="complementary.css"/>
        <script src="complementary.js"></script>
    </head>
    <body>
        
    </body>
</html>

This instructs the browser to load the CSS and JS files. 

In the Javascript file we encapsulate out logic into a Game class:

"use strict";
class Game {
  init(doc) {
    this._document = doc;
    this._document.addEventListener('DOMContentLoaded',this.onLoad.bind(this),false);
  }
  onLoad() {

  }
}

const game=new Game();
game.init(document);

We declared a class (a new concept in Javascript ES6) and a method called init that receives a doc. The idea here is that when the script is loaded, a new Game will be created and the initialization function will receive the current document so it can interact with the user interface. We used the DOMContentLoaded event to call onLoad only when the page document object model (DOM) has been completely loaded, otherwise the script would run before the elements have been loaded.

Also, not the use of the bind method on a function. addEventListener expects a function as the event handler. If we only specify this.onLoad, it will run the function, but with the this context of the event, which would be window, not our game object. this.onLoad.bind(this), on the other hand, is a function that will be executed in the context of our game.

Now, let's consider how we want to game to play out:

  • a guide color must be shown
    • this means the color needs to be generated
  • a list of colors to choose from must be displayed
    • colors need to be generated
    • one color needs to be complementary to the guide color
    • color elements need to respond to mouse clicks
  • a result must be computed from the chosen color
    • the outcome of the user choice must be displayed
    • the score will need to be calculated

This gives us the structure of the game user interface. Let's add:

  • a guide element
  • a choice list element
  • a score element
<html>
    <head>
        <link rel="stylesheet" href="complementary.css"/>
        <script type="module" src="complementary.js"></script>
    </head>
    <body>
        <div id="guideColor"></div>
        <div id="choiceColors"></div>
        <div id="score"></div>
    </body>
</html>

Note that we don't need to choose how they look (that's the CSS) or what they do (that's the JS).

This is a top-down approach, starting from user expectations and then filling in more and more details until it all works out.

Let's write the logic of the game. I won't discuss that too much, because it's pretty obvious and this post is about structure and development, not the game itself.

"use strict";
class Game {
    constructor() {
        // how many color choices to have
        this._numberOfChoices = 5;
        // the list of user scores
        this._log = [];
    }
    init(doc) {
        this._document = doc;
        this._document.addEventListener('DOMContentLoaded', this.onLoad.bind(this), false);
    }
    onLoad() {
        this._guide = this._document.getElementById('guideColor');
        this._choices = this._document.getElementById('choiceColors');
        // one click event on the parent, but event.target contains the exact element that was clicked
        this._choices.addEventListener('click', this.onChoiceClick.bind(this), false);
        this._score = this._document.getElementById('score');
        this.startRound();
    }
    startRound() {
        // all game logic works with numeric data
        const guideColor = this.randomColor();
        this._roundData = {
            guideColor: guideColor,
            choiceColors: this.generateChoices(guideColor),
            tries: new Set()
        };
        // only this method transforms the data into visuals
        this.refreshUI();
    }
    randomColor() {
        return Math.round(Math.random() * 0xFFFFFF);
    }
    generateChoices(guideColor) {
        const complementaryColor = 0xFFFFFF - guideColor;
        const index = Math.floor(Math.random() * this._numberOfChoices);
        const choices = [];
        for (let i = 0; i < this._numberOfChoices; i++) {
            choices.push(i == index
                ? complementaryColor
                : this.randomColor());
        }
        return choices;
    }
    refreshUI() {
        this._guide.style.backgroundColor = '#' + this._roundData.guideColor.toString(16).padStart(6, '0');
        while (this._choices.firstChild) {
            this._choices.removeChild(this._choices.firstChild);
        }
        for (let i = 0; i < this._roundData.choiceColors.length; i++) {
            const color = this._roundData.choiceColors[i];
            const elem = this._document.createElement('span');
            elem.style.backgroundColor = '#' + color.toString(16).padStart(6, '0');
            elem.setAttribute('data-index', i);
            this._choices.appendChild(elem);
        }
        while (this._score.firstChild) {
            this._score.removeChild(this._score.firstChild);
        }
        const threshold = 50;
        for (let i = this._log.length - 1; i >= 0; i--) {
            const value = this._log[i];
            const elem = this._document.createElement('span');

            elem.className = value >= threshold
                ? 'good'
                : 'bad';
            elem.innerText = value;
            this._score.appendChild(elem);
        }
    }
    onChoiceClick(ev) {
        const elem = ev.target;
        const index = elem.getAttribute('data-index');
        // just a regular expression test that the attribute value is actually a number
        if (!/^\d+$/.test(index)) {
            return;
        }
        const result = this.score(+index);
        elem.setAttribute('data-result', result);
    }
    score(index) {
        const expectedColor = 0xFFFFFF - this._roundData.guideColor;
        const isCorrect = this._roundData.choiceColors[index] == expectedColor;
        if (!isCorrect) {
            this._roundData.tries.add(index);
        }
        if (isCorrect || this._roundData.tries.size >= this._numberOfChoices - 1) {
            const score = 1 / Math.pow(2, this._roundData.tries.size);
            this._log.push(Math.round(100 * score));
            this.startRound();
        }
        return isCorrect;
    }
}

const game = new Game();
game.init(document);

This works, but it has several problems, including having too many responsibilities (display, logic, handling clicks, generating color strings from numbers, etc).

And while we have the logic and the structure, the display leaves a lot to be desired. Let's fix this first (I am terrible with design, so I will just dump the result here and it will be a homework for the reader to improve on the visuals).

First, I will add a new div to contain the three others. I could work directly with body, but it would be ugly:

<html>

<head>
    <link rel="stylesheet" href="complementary.css" />
    <script src="complementary.js"></script>
</head>

<body>
    <div class="board">
        <div id="guideColor"></div>
        <div id="choiceColors"></div>
        <div id="score"></div>
    </div>
</body>

</html>

Then, let's fill in the CSS:

body {
    width: 100vw;
    height: 100vh;
    margin: 0;
}
.board {
    width:100%;
    height:100%;
    display: grid;
    grid-template-columns: 50% 50%;
    grid-template-rows: min-content auto;
}
#score {
    grid-column-start: 1;
    grid-column-end: 3;
    grid-row: 1;
    display: flex;
    flex-direction: row;
    flex-wrap: nowrap;
}
#score span {
    display: inline-block;
    padding: 1rem;
    border-radius: 0.5rem;
    background-color: darkgray;
    margin-left: 2px;
}
#score span.good {
    background-color: darkgreen;
}
#score span.bad {
    background-color: red;
}
#guideColor {
    grid-column: 1;
    grid-row: 2;
}
#choiceColors {
    grid-column: 2;
    grid-row: 2;
    display: flex;
    flex-direction: column;
}
#choiceColors span {
    flex-grow: 1;
    cursor: pointer;
}
#choiceColors span[data-result=false] {
    opacity: 0.3;
}

I used a lot of flex and grid to display things.

The game should now do the following:

  • displays a left side color
  • displays five rows of different colors in the right side
  • clicking on any of them modifies the score (each wrong choice halves the maximum score of 100)
  • when there are no more moves left or the correct choice is clicked, the score is added to a list at the top of the board
  • the score tiles are either green (score>=50) or red

However, I am dissatisfied with the Javascript code. If Game has too many responsibilities it is a sign that new classes need to be created.

Refactoring the code

First, I will encapsulate all color logic into a Color class.

class Color {
    constructor(value = 0  /* black */) {
        this._value = value;
    }
    toString() {
        return '#' + this._value.toString(16).padStart(6, '0');
    }
    complement() {
        return new Color(0xFFFFFF - this._value);
    }
    equals(anotherColor) {
        return this._value === anotherColor._value;
    }
    static random() {
        return new Color(Math.round(Math.random() * 0xFFFFFF));
    }
}

This simplifies the Game class like this:

class Game {
    constructor() {
        // how many color choices to have
        this._numberOfChoices = 5;
        // the list of user scores
        this._log = [];
    }
    init(doc) {
        this._document = doc;
        this._document.addEventListener('DOMContentLoaded', this.onLoad.bind(this), false);
    }
    onLoad() {
        this._guide = this._document.getElementById('guideColor');
        this._choices = this._document.getElementById('choiceColors');
        // one click event on the parent, but event.target contains the exact element that was clicked
        this._choices.addEventListener('click', this.onChoiceClick.bind(this), false);
        this._score = this._document.getElementById('score');
        this.startRound();
    }
    startRound() {
        // all game logic works with numeric data
        const guideColor = Color.random();
        this._roundData = {
            guideColor: guideColor,
            choiceColors: this.generateChoices(guideColor),
            tries: new Set()
        };
        // only this method transforms the data into visuals
        this.refreshUI();
    }
    generateChoices(guideColor) {
        const complementaryColor = guideColor.complement();
        const index = Math.floor(Math.random() * this._numberOfChoices);
        const choices = [];
        for (let i = 0; i < this._numberOfChoices; i++) {
            choices.push(i == index
                ? complementaryColor
                : Color.random());
        }
        return choices;
    }
    refreshUI() {
        this._guide.style.backgroundColor = this._roundData.guideColor.toString();
        while (this._choices.firstChild) {
            this._choices.removeChild(this._choices.firstChild);
        }
        for (let i = 0; i < this._roundData.choiceColors.length; i++) {
            const color = this._roundData.choiceColors[i];
            const elem = this._document.createElement('span');
            elem.style.backgroundColor = color.toString();
            elem.setAttribute('data-index', i);
            this._choices.appendChild(elem);
        }
        while (this._score.firstChild) {
            this._score.removeChild(this._score.firstChild);
        }
        const threshold = 50;
        for (let i = this._log.length - 1; i >= 0; i--) {
            const value = this._log[i];
            const elem = this._document.createElement('span');

            elem.className = value >= threshold
                ? 'good'
                : 'bad';
            elem.innerText = value;
            this._score.appendChild(elem);
        }
    }
    onChoiceClick(ev) {
        const elem = ev.target;
        const index = elem.getAttribute('data-index');
        // just a regular expression test that the attribute value is actually a number
        if (!/^\d+$/.test(index)) {
            return;
        }
        const result = this.score(+index);
        elem.setAttribute('data-result', result);
    }
    score(index) {
        const expectedColor = this._roundData.guideColor.complement();
        const isCorrect = this._roundData.choiceColors[index].equals(expectedColor);
        if (!isCorrect) {
            this._roundData.tries.add(index);
        }
        if (isCorrect || this._roundData.tries.size >= this._numberOfChoices - 1) {
            const score = 1 / Math.pow(2, this._roundData.tries.size);
            this._log.push(Math.round(100 * score));
            this.startRound();
        }
        return isCorrect;
    }
}

But it's still not enough. Game is still doing a lot of UI stuff. Can we fix that? Yes, with custom HTML elements!

Here is the code. It looks verbose, but what it does is completely encapsulate UI logic into UI elements:

class GuideColor extends HTMLElement {
    set color(value) {
        this.style.backgroundColor = value.toString();
    }
}

class ChoiceColors extends HTMLElement {
    connectedCallback() {
        this._clickHandler = this.onChoiceClick.bind(this);
        this.addEventListener('click', this._clickHandler, false);
    }
    disconnectedCallback() {
        this.removeEventListener('click', this._clickHandler, false);
    }
    onChoiceClick(ev) {
        const elem = ev.target;
        if (!(elem instanceof ChoiceColor)) {
            return;
        }
        const result = this._choiceHandler(elem.choiceIndex);
        elem.choiceResult = result;
    }
    setChoiceHandler(handler) {
        this._choiceHandler = handler;
    }
    set colors(value) {
        while (this.firstChild) {
            this.removeChild(this.firstChild);
        }
        for (let i = 0; i < value.length; i++) {
            const color = value[i];
            const elem = new ChoiceColor(color, i);
            this.appendChild(elem);
        }
    }
}

class ChoiceColor extends HTMLElement {
    constructor(color, index) {
        super();
        this.color = color;
        this.choiceIndex = index;
    }
    get choiceIndex() {
        return +this.getAttribute('data-index');
    }
    set choiceIndex(value) {
        this.setAttribute('data-index', value);
    }
    set choiceResult(value) {
        this.setAttribute('data-result', value);
    }
    set color(value) {
        this.style.backgroundColor = value.toString();
    }
}

class Scores extends HTMLElement {
    set scores(log) {
        while (this.firstChild) {
            this.removeChild(this.firstChild);
        }
        for (let i = log.length - 1; i >= 0; i--) {
            const value = log[i];
            const elem = new Score(value);
            this.appendChild(elem);
        }
    }
}

class Score extends HTMLElement {
    constructor(value) {
        super();
        this.innerText = value;
        this.className = value > 50
            ? 'good'
            : 'bad';
    }
}

class Board extends HTMLElement {
    constructor() {
        super();
        this._guide = new GuideColor();
        this._choices = new ChoiceColors();
        this._score = new Scores();
    }
    connectedCallback() {
        this.appendChild(this._guide);
        this.appendChild(this._choices);
        this.appendChild(this._score);
    }
    setChoiceHandler(handler) {
        this._choices.setChoiceHandler(handler);
    }
    set guideColor(value) {
        this._guide.color = value;
    }
    set choiceColors(value) {
        this._choices.colors = value;
    }
    set scores(value) {
        this._score.scores = value;
    }
}

window.customElements.define('complementary-board', Board);
window.customElements.define('complementary-guide-color', GuideColor);
window.customElements.define('complementary-choice-colors', ChoiceColors);
window.customElements.define('complementary-choice-color', ChoiceColor);
window.customElements.define('complementary-scores', Scores);
window.customElements.define('complementary-score', Score);

With this, the HTML becomes:

<html>

<head>
    <link rel="stylesheet" href="complementary.css" />
    <script src="complementary.js"></script>
</head>

<body>
    <complementary-board>
    </complementary-board>
</html>

and the CSS:

body {
    width: 100vw;
    height: 100vh;
    margin: 0;
}
complementary-board {
    width:100%;
    height:100%;
    display: grid;
    grid-template-columns: 50% 50%;
    grid-template-rows: min-content auto;
}
complementary-scores {
    grid-column-start: 1;
    grid-column-end: 3;
    grid-row: 1;
    display: flex;
    flex-direction: row;
    flex-wrap: nowrap;
}
complementary-score {
    display: inline-block;
    padding: 1rem;
    border-radius: 0.5rem;
    background-color: darkgray;
    margin-left: 2px;
}
complementary-score.good {
    background-color: darkgreen;
}
complementary-score.bad {
    background-color: red;
}
complementary-guide-color {
    grid-column: 1;
    grid-row: 2;
}
complementary-choice-colors {
    grid-column: 2;
    grid-row: 2;
    display: flex;
    flex-direction: column;
}
complementary-choice-color {
    flex-grow: 1;
    cursor: pointer;
}
complementary-choice-color[data-result=false] {
    opacity: 0.3;
}

Next

In the next blog posts we will see how we can test our code (we have to make it more testable first!) and how we can use Git as source control. Finally we should have a working game that can be easily modified independently: the visual design, the working code, the structural elements.

First of all, what is TaskCompletionSource<T>? It's a class that returns a task that does not finish immediately and then exposes methods such as TrySetResult. When the result is set, the task completes. We can use this class to turn an event based programming model to an await/async one.

In the example below I will use a Windows Forms app, just so I have access to the Click handler of a Button. Only instead of using the normal EventHandler approach, I will start a thread immediately after InitializeComponent that will react to button clicks.

Here is the Form constructor. Note that I am using Task.Factory.StartNew instead of Task.Run because I need to specify the TaskScheduler in order to have access to a TextBox object. If it were to log something or otherwise not involve the UI, a Task.Run would have been sufficient.

    public Form1()
    {
        InitializeComponent();
        Task.Factory.StartNew(async () =>
        {
            while (true)
            {
                await ClickAsync(button1);
                textBox1.AppendText($"I was clicked at {DateTime.Now:HH:mm:ss.fffff}!\r\n");
            }
        },
        CancellationToken.None,
        TaskCreationOptions.DenyChildAttach,
        TaskScheduler.FromCurrentSynchronizationContext());
    }

What's going on here? I have a while (true) block and inside it I am awaiting a method then write something in a text box. Since await is smart enough to not use CPU and not block threads, this approach doesn't have any performance drawbacks.

Now, for the ClickAsync method:

    private Task ClickAsync(Button button1)
    {
        var tcs = new TaskCompletionSource<object>();
        void handler(object s, EventArgs e) => tcs.TrySetResult(null);
        button1.Click += handler;
        return tcs.Task.ContinueWith(_ => button1.Click -= handler);
    }

Here I am creating a task completion source, I am adding a handler to the Click event, then I am returning the task, which I continue with removing the handler. The handler just sets the result on the task source, thus completing the task.

The flow comes as follows:

  1. the source is created
  2. the handler is attached
  3. the task is returned, but does not complete, thus the loop is halted in await
  4. when the button is clicked, the source result is set, then the handler is removed
  5. the task completed, the await finishes and the text is appended to the text box
  6. the loop continues

It would have been cool if the method to turn an event to an async method would have worked like this: await button1.Click.MakeAsync(), but events are not first class citizens in .NET. Instead, something more cumbersome can be used to make this more generic (note that there is no error handling, for demo purposes):

    public Form1()
    {
        InitializeComponent();
        Task.Factory.StartNew(async () =>
        {
            while (true)
            {
                await EventAsync(button1, nameof(Button.Click));
                textBox1.AppendText($"I was clicked at {DateTime.Now:HH:mm:ss.fffff}!\r\n");
            }
        },
        CancellationToken.None,
        TaskCreationOptions.DenyChildAttach,
        TaskScheduler.FromCurrentSynchronizationContext());
    }

    private Task EventAsync(object obj, string eventName)
    {
        var eventInfo = obj.GetType().GetEvent(eventName);
        var tcs = new TaskCompletionSource<object>();
        EventHandler handler = delegate (object s, EventArgs e) { tcs.TrySetResult(null); };
        eventInfo.AddEventHandler(obj, handler);
        return tcs.Task.ContinueWith(_ => eventInfo.RemoveEventHandler(obj, handler));
    }

Notes:

  • is this a better method of doing things? That depends on what you want to do.
  • If you were to use Reactive Extensions, you can turn an event into an Observable with Observable.FromEventPattern.
  • I see it useful not for button clicks (that while true loop scratches at my brain), but for classes that have Completed events.
  • obviously the EventAsync method is not optimal and has no exception handling

  You are writing some code and you find yourself needing to call an async method in your event handler. The event handler, obviously, has a void return type and is not async, so when using await in it, you will get a compile error. I actually had a special class to execute async method synchronously and I used that one, but I didn't actually need it.

  The solution is extremely simple: just mark the event handler as async.

  You should never use async void methods, instead use async Task or async Task<T1,...>. The exception, apparently, is event handlers. And it kind of makes sense: an event handler is designed to be called asynchronously.

  More details here: Tip 1: Async void is for top-level event-handlers only

  And bonus: but what about constructors? They can't be marked as async, they have no return type!

  First, why are you executing code in your constructor? And second, if you absolutely must, you can also create an async void method that you call from the constructor. But the best solution is to make the constructor private and instead use a static async method to create the class, which will execute whatever code you need and then return new YourClass(values returned from async methods).

  That's a very good pattern, regardless of asynchronous methods: if you need to execute code in the constructor, consider hiding it and using a creation method instead.

As you possibly know, I am sending automated Tweets whenever I am writing a new blog post, containing the URL of the post. And since I've changed the blog engine, I was always seeing the Twitter card associated with the URL having the same image, the blog tile. So I made changes to display a separate image for each post, by populating the og:image meta property. It didn't work. The Twitter card now refused to show the image. And it was all because the URL I was using was relative to the blog site.

First of all, in order to test how a URL will look in Twitter, use the Twitter Card validator. Second, the Open Graph doesn't specify that the URL in og:image should be absolute, but in practice both Twitter and Facebook expect it to be.

So whenever populating Open Graph meta tags with URLs, make sure they are absolute, not relative.