On the SQLite reference page for the WITH clause there is a little example of solving a Sudoku puzzle. Using SQL. I wanted to see it in action and therefore I've translated it into T-SQL.

  You might think that there is a great algorithm at play, something that will blow your mind. I mean, people have blogged about Sudoku solvers to hone their programming skills for ages and they have worked quite a lot, writing lines and lines of how clever they were. And this is SQL, it works, but how do you do something complex in it? But no, it's very simple, very straightforward and also performant. Kind of a let down, I know, but it pretty much takes all possible solutions and only selects for the valid ones using CTEs (Common Table Expressions).

  Here is the translation, followed by some explanation of the code:

DECLARE @Board VARCHAR(81) = '86....3...2...1..7....74...27.9..1...8.....7...1..7.95...56....4..1...5...3....81';
WITH x(s,ind) AS
(
  SELECT sud,CHARINDEX('.',sud) as ind FROM (VALUES(@Board)) as input(sud)
  UNION ALL
  SELECT
	CONVERT(VARCHAR(81),CONCAT(SUBSTRING(s,1,ind-1),z,SUBSTRING(s,ind+1,81))) as s,
	CHARINDEX('.',CONCAT(SUBSTRING(s,1,ind-1),z,SUBSTRING(s,ind+1,81))) as ind
  FROM x
  INNER JOIN (VALUES('1'),('2'),('3'),('4'),('5'),('6'),('7'),('8'),('9')) as digits(z)
  ON NOT EXISTS (
            SELECT 1
              FROM (VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9)) as positions(lp)
             WHERE z = SUBSTRING(s, ((ind-1)/9)*9 + lp, 1)
                OR z = SUBSTRING(s, ((ind-1)%9) + (lp-1)*9 + 1, 1)
                OR z = SUBSTRING(s, (((ind-1)/3) % 3) * 3
                        + ((ind-1)/27) * 27 + lp
                        + ((lp-1) / 3) * 6, 1)
	)
	WHERE ind>0
)
SELECT s FROM x WHERE ind = 0

  The only changes from the original code I've done is to extract the unsolved puzzle into its own variable and to change the puzzle values. Also, added a more clear INNER JOIN syntax to replace the obnoxious, but still valid, comma (aka CROSS JOIN) notation. Here is the breakdown of the algorithm, as it were:

  • start with an initial state of the unsolved puzzle as a VARCHAR(81) string and the first index of a dot in that string, representing an empty slot - this is the anchor part
  • for the recursive member, join the current state with all the possible digit values (1 through 9) and return the strings with the first empty slot replaced by all valid possibilities and the position of the next empty slot
  • stop when there are no more empty slots
  • select the solutions (no empty slots)

  It's that simple. And before you imagine it will generate a huge table in memory or that it will take a huge time, worry not. It takes less than a second (a lot less) to find the solution. Obviously, resource use increases exponentially when the puzzle doesn't have just one solution. If you empty the first slot (. instead of 8) the number of rows is 10 and it takes a second to compute them all. Empty the next slot, too (6) and you get 228 solutions in 26 seconds and so on.

 The magical parts are the recursive Common Table Expression itself and the little piece of code that checks for validity, but the validity check is quite obvious as it is the exact translation of the Sudoku rules: no same digits on lines, rows or square sections.

  A recursive CTE has three parts:

  • an initial query that represents the starting state, often called the anchor member
  • a recursive query that references the CTE itself, called the recursive member, which is UNIONed with the anchor
  • a termination condition, to tell SQL when to end the recursion

  For us, we started with one unsolved solution, we recursed on all possible valid solutions for replacing the first empty slot and we stopped when there were no more empty slots.

  CTEs are often confusing because the notation seems to indicate something else to a procedural programmer. You imagine doing this without CTEs, maybe in an object oriented programming language, and you think of this huge buffer that just keeps increasing and you have to remember where you left off so you don't process the same partial solution multiple times and you have to clean the data structure so it doesn't get too large, etc. SQL, though, is at heart a declarative programming language, very close to functional programming. It will take care not only of the recursion, but also filter the rows by the final condition of no empty slots while (and sometimes before) it makes the computations.

  Once you consider the set of possible solutions for a problem as a working set, SQL can do wonders to find the solution, provided you can encode it in a way the SQL engine will understand. This is just another example of the right tool for the right job. Hope you learned something.

and has 0 comments

  One Word Kill starts off like an episode of Stranger Things. You've got the weird kid, his weird friends and the mysterious girl who is both beautiful, smart and hangs out with them to play D&D, all set in the 80's. Then the main character gets cancer and his future self comes to save... the girl. There is also a school boy psycho after them. But that's where the similarities end... the rest of the story is just... nothing. People explain things that needed little explaining and make no sense, good kids and their parents run around from a school boy, as psychotic as he could possibly be, without involving police or gang member allies and, in the middle of all the drama: cancer, psycho killer, future self, time travel... they play Dungeons and Dragons, a game that promotes imagination and creativity that then the protagonists fail to use in any amount in their real life.

  Having just read Prince of Thorns, I really expected a lot more from Mark Lawrence. Instead I get a derivative and boring story that brings absolutely nothing new to the table. It's reasonably well written, I guess, but nothing Wow!, which is exactly the reaction reviewers seem to have about this book. Have I read a different story somehow?

  Bottom line: I am tempted to rate this average, on account of other raving reviews and on the fact that I liked another Mark Lawrence book, but I have to be honest with me and rate this book alone, which I am sorry to say, is sub par.

  This is something that appeared in C# 5, so a long time ago, with .NET 4.5, but I only found out about it recently. Remember when you wanted to know the name of a property when doing INotifyPropertyChanged? Or when you wanted to log the name of the method that was calling? Or you wanted to know which line in which source file is responsible for calling a certain piece of code? All of this can be done with the Caller Information feature.

  And it is easy enough to use, just decorate a method parameter with an explicit default value with any of these three attributes:

The parameter value, if not set when calling the method, will be filled in with the member name or file name or line number. It's something that the compiler does, so no overhead from reflection. Even better, it works on the caller of the method, not the interior of the method. Imagine you had to write a piece of code to do the same. How would you reference the name of the method calling the method you are in?

Example from Microsoft's site:

public void DoProcessing()
{
    TraceMessage("Something happened.");
}

public void TraceMessage(string message,
        [System.Runtime.CompilerServices.CallerMemberName] string memberName = "",
        [System.Runtime.CompilerServices.CallerFilePath] string sourceFilePath = "",
        [System.Runtime.CompilerServices.CallerLineNumber] int sourceLineNumber = 0)
{
    System.Diagnostics.Trace.WriteLine("message: " + message);
    System.Diagnostics.Trace.WriteLine("member name: " + memberName);
    System.Diagnostics.Trace.WriteLine("source file path: " + sourceFilePath);
    System.Diagnostics.Trace.WriteLine("source line number: " + sourceLineNumber);
}

// Sample Output:
//  message: Something happened.
//  member name: DoProcessing
//  source file path: c:\Visual Studio Projects\CallerInfoCS\CallerInfoCS\Form1.cs
//  source line number: 31

  So you want to use a queue, a structure that has items added at one side and removed on another, in Javascript code. Items are added to the tail of the queue, while they are removed at the head. We, Romanians, are experts because in the Communist times resources were scarce and people often formed long queues to get to them, sometimes only on the basis of rumour. They would see a line of people and ask "Don't they have meat here?" and the answer would come "No, they don't have milk here. It's next building they don't have meat at". Anyway...

  There is an option that can be used directly out of the box: the humble array. It has methods like .push (add an item), .pop (remove the latest added item - when you use it as a stack) and .shift (remove the oldest added item - when you use it as a queue). For small cases, that is all you need.

  However, I needed it in a high performance algorithm and if you think about it, removing the first element of an array usually means shifting (hence the name of the function) all elements one slot and decreasing the length of the array. Consider a million items array. This is not an option.

  One of the data structure concepts we are taught at school is the linked list. Remember that? Each item has a reference to the next (and maybe the previous) item in the list. You explore it by going from one item to the next, without indexing, and you can remove any part of the list or add to any part of the list just by changing the value of these references. This also means that for each value you want stored you have the value, the reference(s) and the overhead of handling a more complex data object. Again, consider a million numbers array. It's not the right fit for this problem.

  Only one option remains: still using an array, but moving the start and the end of the array in an abstract manner only, so that all queue/dequeue operations take no effort. This means keeping a reference to the tail and the head of the queue in relation to the length of the queue and of the underlying array.

  But first let's establish a baseline. Let's write a test and implement a queue using the default array pop/shift implementation:

// the test
const size = 100000;
const q=new Queue();
time(()=> { for (let i=0; i<size; i++) q.enqueue(i); },'Enqueue '+size+' items');
time(()=> { for (let i=0; i<size; i++) q.dequeue(i); },'Dequeue '+size+' items');
time(()=> { for (let i=0; i<size/10; i++) {
	for (let j=0; j<10; j++) q.enqueue(i);
	for (let j=0; j<9; j++) q.dequeue(i);
} },'Dequeue and enqueue '+size+' items');

// the Queue implementation
class Queue {
  constructor() {
    this._arr = [];
  }

  enqueue(item) {
    this._arr.push(item);
  }

  dequeue() {
    return this._arr.shift();
  }
}

// the results
Enqueue 100000 items, 10ms
Dequeue 100000 items, 1170ms
Dequeue and enqueue 100000 items, 19ms

  The Enqueue operation is just adding to an array, enqueuing and dequeuing by leaving an item at ever series of dequeues is slightly slower, as the amount of array shifting is negligible. Dequeuing, though, is pretty heavy. Note that increasing just a little bit the amount of items leads to an exponential increase in time:

Enqueue 200000 items, 12ms
Dequeue 200000 items, 4549ms
Dequeue and enqueue 200000 items, 197ms

  Now let's improve the implementation of the queue. We will keep enqueue using Array.push, but use a _head index to determine which items to dequeue. This means faster speed, but the queue will never shorten. It's the equivalent of Romanians getting their product, but remaining in the queue.

// the Queue implementation
class Queue {
  constructor() {
    this._arr = [];
    this._head = 0;
  }

  enqueue(item) {
    this._arr.push(item);
  }

  dequeue() {
    if (this._head>=this._arr.length) return;
    const result = this._arr[this._head];
    this._head++;
    return result;
  }
}

// the results
Enqueue 200000 items, 11ms
Dequeue 200000 items, 4ms
Dequeue and enqueue 200000 items, 11ms

  The performance has reached the expected level. Dequeuing is now even faster than enqueuing because it doesn't need to expand the array as items are added. However, for all scenarios the queue is only growing, even when dequeuing all the items. What I can do is reuse the slots of the dequeued items for the items to be added. Now it gets interesting!

  My point is that right now we can improve the functionality of our queue by replacing dequeued but still stored items with newly enqueued items. That is the equivalent of Romanians leaving the queue only after they get the meat and a new Romanian comes to take their place. If there are more people coming than getting served, then people that got their meat will all leave and we can just add people to the tail of the queue.

  So let's recap the algorithm:

  • we will use an array as a buffer
  • the queue items start at the head and end at the tail, but wrap around the array buffer
  • whenever we add an item, it will be added in the empty space inside the array and the tail will increment
  • if there is no empty space (queue length is the same as the array length) then the array will be rearranged so that it has space for new itms
  • when we dequeue, the item at the head will be returned and the head incremented
  • whenever the head or tail reach the end of the array, they will wrap around

Some more improvements:

  • if we enqueue a lot of items then dequeue them, the array will not decrease until we dequeue them all. An improvement is to rearrange the array whenever the queue length drops below half of that of the array. It will add computation, but reduce space.
  • when we make space for new items (when the array size is the same as the one of the logical queue) we should add more space than just 1, so I will add the concept of a growth factor and the smallest size increase.

Here is the code:

/**
 * A performant queue implementation in Javascript
 *
 * @class Queue
 */
class Queue {

    /**
     *Creates an instance of Queue.
     * @memberof Queue
     */
    constructor() {
        this._array = [];
        this._head = 0;
        this._tail = 0;
        this._size = 0;
        this._growthFactor = 0.1;
        this._smallestSizeIncrease = 64;
    }

    /**
     * Adding an iterator so we can use the queue in a for...of loop or a destructuring statement [...queue]
     */
    *[Symbol.iterator]() {
        for (let i = 0; i < this._size; i++) {
            yield this.getAt(i);
        }
    }

    /**
     * Returns the length of the queue
     *
     * @readonly
     * @memberof Queue
     */
    get length() {
        return this._size;
    }

    /**
     * Get item based on item in the queue
     *
     * @param {*} index
     * @returns
     * @memberof Queue
     */
    getAt(index) {
        if (index >= this._size) return;
        return this._array[(this._head + index) % this._array.length];
    }

    /**
     * Gets the item that would be dequeued, without actually dequeuing it
     *
     * @returns
     * @memberof Queue
     */
    peek() {
        return this.getAt(0);
    }

    /**
     * Clears the items and shrinks the underlying array
     */
    clear() {
        this._array.length = 0;
        this._head = 0;
        this._tail = 0;
        this._size = 0;
    }

    /**
     * Adds an item to the queue
     *
     * @param {*} obj
     * @memberof Queue
     */
    enqueue(obj) {
        // special case when the size of the queue is the same as the underlying array
        if (this._size === this._array.length) {
            // this is the size increase for the underlying array
            const sizeIncrease = Math.max(this._smallestSizeIncrease, ~~(this._size * this._growthFactor));
            // if the tail is behind the head, it means we need to move the data from the head to 
            // the end of the array after we increase the array size
            if (this._tail <= this._head) {
                const toMove = this._array.length - this._head;
                this._array.length += sizeIncrease;
                for (let i = 0; i < toMove; i++) {
                    this._array[this._array.length - 1 - i] = this._array[this._array.length - 1 - i - sizeIncrease];
                }
                this._head = (this._head + sizeIncrease) % this._array.length;
            }
            else
            // the array size can just increase (head is 0 and tail is the end of the array)
            {
                this._array.length += sizeIncrease;
            }
        }
        this._array[this._tail] = obj;
        this._tail = (this._tail + 1) % this._array.length;
        this._size++;
    }

    /**
     * Removed the oldest items from the queue and returns it
     *
     * @returns
     * @memberof Queue
     */
    dequeue() {
        if (this._size === 0) {
            return undefined;
        }
        const removed = this._array[this._head];
        this._head = (this._head + 1) % this._array.length;
        this._size--;
        // special case when the size of the queue is too small compared to the size of the array
        if (this._size > 1000 && this._size < this._array.length / 2 - this._smallestSizeIncrease) {
            if (this._head<this._tail) {
                this._array = this._array.slice(this._head,this._tail);
            } else {
                this._array=this._array.slice(this._head, this._array.length).concat(this._array.slice(0,this._tail));
            }
            this._head = 0;
            this._tail = 0;
        }   
        return removed;
    }
}

  Final notes:

  • there is no specification on how an array should be implemented in Javascript, therefore I've used the growth factor concept, just like in C#. However, according to James Lawson, the array implementation is pretty smart in modern Javascript engines, we might not even need it.
  • the optimization in dequeue might help with space, but it could be ignored if what you want is speed and don't care about the space usage
  • final benchmarking results are:
    Enqueue 200000 items, 15ms, final array size 213106
    Dequeue 200000 items, 19ms, final array size 1536
    Dequeue and enqueue 200000 items, 13ms, final array size 20071

and has 0 comments

  While on the road with his mother and baby brother, a ten year old prince is attacked by an enemy armed group. Thrown into a patch of thorns from where he could not move, only watch, he sees his mother defiled and killed and his brother smashed on a rock like a toy. He vows vengeance. Such a classic story, right? Only we see him a few years later, leading a band of brigands, murdering and looting and raping, his vengeance all but forgotten and replaced by a desire to unite all the hundred little states warring against each other. Well, more interesting, but still pretty classic, right? Nope, stuff still happens that makes the lead character (and you) doubt his thoughts and the true nature of reality and retroactively explains some of the more incredulous questions that the reader is asking.

  I would say Prince of Thorns is all about revealing layers of this world that Mark Lawrence is still shaping. I quite liked that. The first book sets things up, but it is not a setup book. It is filled with action. Nor does it tell us everything, leaving a lot to be explored in the next books in the series. That's something that is sorely missing in many modern stories. In order to enjoy the book, though, you have to suspend your disbelief when it tells of an eleven year old boy smashing heads, swinging swords and leading men. Yes, in feudal times being 11 is the time to have a midlife crisis, but it is all a little bit too much for a child.

  It is a game of thrones kind of book, but mercifully from the standpoint of a single character. There is not a lot of lore, but there is magic and a mysterious connection to an advanced but now dead civilisation, plenty of violence and strategy. I will probably read the next books in the series.

  While working on my pet project Linqer (LINQ for Javascript and Typescript) I've spent quite a lot of time improving the performance of the Quicksort algorithm I am using for .orderBy. Therefore I am publishing it here, even if you could extract it just the same from the Linqer sources, with limited discussion on what is going on.

  Why

  First, why use it at all? Doesn't Javascript have the .sort method in the Array class? What's wrong with that?

  The answer is that the implementation for sort is different from browser to browser, or better said, from Javascript engine to Javascript engine. In Chrome, for example, the algorithm used is insertion sort, which is simple, in place, stable and reasonably fast. It is optimized for the most common usage: small arrays that need to be sorted for UI purposes and such. However, when using large arrays, the algorithm doesn't perform as well as one might expect.

  For Linqer I had an additional reason, because I would use ordering followed by skip and take methods that limited the scope of the need for sorting. Imagine a million items array that I wanted ordered and then needed the first ten items. Sorting the entire thing for just ten items would have been overkill. The default .sort function doesn't have parameters for such scenarios.

  And there is another reason: the default function used to compare array items is alphanumeric. [1, 2, 10] would get ordered as [1, 10, 2].

  Second, why Quicksort? There are a bunch of sorting algorithms out there. Mergesort, Heapsort, Radixsort, Timsort, Selectionsort. What's so special about Quicksort.

  I have to admit that I went for it by googling fast sorting algorithm. It does have "quick" in the name, doesn't it? I also found it elegant and easy to comprehend. And for my particular scenario, I liked that it used a divide et impera strategy which allowed me to ignore parts of the array if I didn't need the items there. In other words, it's very well suited both as a general sorting algorithm and a partial sorting algorithm.

  What

  I would like to tell you that it's simple to explain what Quicksort does, but it requires some amount of attention and time. In general terms, it chooses an arbitrary item (called a pivot) then orders the remaining items relative to the pivot, in two so called partitions: the smaller items to the left, the larger to the right. Then it repeats the process for each of the two sides. How the pivot is chosen and how the partitions are handled is what differentiates Quicksort algorithms and determines their performance.

  It is an in place algorithm, meaning it doesn't copy the array in some other type of structure and instead it moves items around inside it. It is not a stable algorithm, meaning the order of "equal" items is not preserved. The average computational complexity is O(n log n), with the worst cases O(n^2). The space complexity is harder to determine. Most people say it is O(1) because it uses no additional data structures, but that is not really correct. Being a recursive algorithm, the call stack gets used quite a lot, an invisible storage that should be computed in the data complexity.

  Unfortunately, the worst case scenarios are also very common: already sorted arrays and arrays filled with the same value. There are various optimizations to be used in order to handle this sort of thing. Also, Quicksort is efficient with large quantities of data, but less so with small numbers of items.

  How

  Finally, we get to the code. The _quicksort function receives:

  • an array
  • left and right index values determining the inclusive area that will be sorted (usually 0 and array.length-1)
  • a comparer function (item1,item2)=> 1, 0 or -1 and that defaults to _defaultComparer which tries to sort items based on the > and < operators
  • min and max index values determining the window of the array that we need to have sorted

The left and right indexes determine which section (before the sort) of the array will be sorted, the min and max indexes determine which items I am interested in (after the sort). This allows me to skip ordering partitions that are outside my area of interest.

As I said, the pivot choice is important. Some strategies are very popular:

  • the last item in the array as the pivot
    • this is the strategy used in the original incarnation of Quicksort
    • leads to very poor performance when the array is already sorted
  • the median item
    • this suggests parsing the array in order to get the value, implying extra computation
    • it only makes sense when the values in the array are numbers
  • the average between the first, the last and the middle item
    • it only makes sense when the values in the array are numbers
  • the item that is in the middle of the array
    • this is the one that I am using
  • a random item in the array
    • this makes the algorithm escape scenarios where the performance would be bad
    • the outcome of the sorting is unpredictable in terms of time used and stability of items
  • multiple pivots
    • an interesting concept, but one that complicated the algorithm too much for comfort

Then there is the matter of the partitioning. I've used an optimization that involves two indexes, one at the start the other at the end of a partition, coming toward each other and swapping items that are on the wrong side of the pivot. In some implementations, if the pivot is the last item, the partitioning is from one side only. In others, multiple indexes are used to handle multiple pivots.

In most implementations, the algorithm recurses on _quicksort, but I refactored it to only recurse on the partitioning. Then, because I didn't want to get stack overflows when bad data was used, I've eliminated the recursion and instead used a stack of my own where the partitions to be sorted are stored and wait their turn. This is where the data complexity comes around. In my case I was using a little more data than I actually needed, because I was adding partitions to the stack and also incrementing the index of the current partition, meaning the stack array grew with handled partitions. Even if there is no computation performance benefit, I've optimized this as well by adding a queueIndex which is used to recycle the slots in the partition array that are behind the partitionIndex. New partitions are being added behind the partitionIndex and queueIndex is incremented. When the loop reaches the last partition in the stack, a new loop is started with the partitions from 0 to queueIndex. Thus, for a ten million item array the partition stack rarely goes above 40000 in length.

A further optimization is to use insertion sort on partitions that have become too small (under 64 items). It irks me to have had to do this, I would have liked to use a "pure" algorithm, but this improved the performance and minimized the size of the partition stack.

The Code

That's about it. Here is the code:

    function _insertionsort(arr, leftIndex, rightIndex, comparer) {
        for (let j = leftIndex; j <= rightIndex; j++) {
            const key = arr[j];
            let i = j - 1;
            while (i >= leftIndex && comparer(arr[i], key) > 0) {
                arr[i + 1] = arr[i];
                i--;
            }
            arr[i + 1] = key;
        }
    }
    function _swapArrayItems(array, leftIndex, rightIndex) {
        const temp = array[leftIndex];
        array[leftIndex] = array[rightIndex];
        array[rightIndex] = temp;
    }
    function _partition(items, left, right, comparer) {
        const pivot = items[(right + left) >> 1];
        while (left <= right) {
            while (comparer(items[left], pivot) < 0) {
                left++;
            }
            while (comparer(items[right], pivot) > 0) {
                right--;
            }
            if (left < right) {
                _swapArrayItems(items, left, right);
                left++;
                right--;
            }
            else {
                if (left === right)
                    return left + 1;
            }
        }
        return left;
    }
    const _insertionSortThreshold = 64;
    function _quicksort(items, 
                        left, right, comparer = _defaultComparer,
                        minIndex = 0, maxIndex = Number.MAX_SAFE_INTEGER) {
        if (!items.length)
            return items;
        const partitions = [];
        partitions.push({ left, right });
        while (partitions.length) {
            ({ left, right } = partitions.pop());
            if (right - left < _insertionSortThreshold) {
                _insertionsort(items, left, right, comparer);
                continue;
            }
            const index = _partition(items, left, right, comparer);
            if (left < index - 1 && index - 1 >= minIndex) {
                partitions.push({ left, right: index - 1 });
            }
            if (index < right && index < maxIndex) {
                partitions.push({ left: index, right });
            }
        }

        return items;
    }
    _defaultComparer = (item1, item2) => {
        if (item1 > item2)
            return 1;
        if (item1 < item2)
            return -1;
        return 0;
    };

  I've accepted the old man should teach me as the only solution to becoming a champion, but it is hard to swallow it. He is very old, but mischievous, so whenever I try to learn something from him, he kicks me to the ground. He tricks me again and again and again. I am frustrated, but I am trying to keep my cool. I am strong. If I were to really fight him, he might be smart, but every attack would break bone and then what good would he be? Just a bag of meat and broken shards. I close my eyes, I breath, I tell myself it is worth it.

  The old man apologizes and offers me a hand, I take it, only to be kicked in the ass and thrown into a jumble of debris. I lose my temper and stomp away. He doesn't understand. Getting angry at him is pointless, hurting him futile. I have nothing to learn from him. I walk through the old grounds of my conquests, now just the walled in and decrepit underground of the large arena above. I feel a presence behind me and I see the old man is following me, eyes to the ground. Contrition? Surely another of his tricks. "Begone!" I roar at him, but he goes to his knees and kowtows in front of me, his hands touching my feet. I feel tears swelling up in my eyes. He might as well be a little boy asking for forgiveness. Just who is the teacher and who is the student? Who is the adult here?

  "How did you get to a hundred years or whatever behaving like a little kid?! You are a child!" I shout at him in admonishment. I look around and ghosts of my past awaken my anguish. I feel my face contort into a painful grin as my tears flow freely. "Every week I was coming here to murder people!", I rage, my voice barely my own, a booming, low, animal growl, my expression that of an enraptured madman, for sure. "I would stake my life every time and I would leave, alive, every time!". The images of old fights flash before my wet blurred vision and I imagine that some of the painted white walls might contain some of the scrolls of the ancient arts, built over by a world that doesn't get it anymore. "I loved it!", I say, walking in the dead halls, every step a pulse of power overlaying glorious past over grey reality. My body is shaking with now uncontrollable weeping. "I killed so many people and I miss it... so.... very... MUCH!".

  Does he get it now, I ask myself? Has he even an inkling of the power he needs to teach me to control? I burst through the door to the surface and climb the stairs that get me to the arena above. The seats are packed with oblivious spectators, all watching some performance I don't even care to notice. I breathe in the fresh air and feel better. Ready to come to a final understanding with the old man, if he is capable of it ,I turn around. There is little time and we should not fight each other. But the old man is gone.

   I strain my eyes into the darkness of the stairs and I feel it, The Beast, the adversary I need to fight is there. He's got the old man and, even if I cannot see it, I know it is there, all cunning, fury and power. My body roars by itself, a predator sound, strong and fearless, no sound a man should ever be able to make. The arena spectators panic in surprised horror, but I ignore them. I jump into the darkness with animal strength. I will fight this beast, I will meet it head on, I will be the most savage, alone I will remain alive.

and has 0 comments

  The Grace of Kings feels long from the very start. Ken Liu is starting off from a fictional empire of seven islands, but it might as well have been a historical book. Everything is mostly realistic, with very human characters that do what human characters do: harm and kill other people and find rationalizations for it. Some of them are heroic and occasionally think of other people, too.

  Half way through the book (which is one of a trilogy, of course) I couldn't keep up with all the characters that kind of did the same thing, the long expositions of why people did stupid or horrible things to others and the various anecdotes that made some of the characters heroes or villains. And I call them anecdotes because that's what they feel like: short moments that disrupt rather than enforce the long and unfortunately boring history of the realm.

  Bottom line, it feels like a Chinese Game of Thrones, with less interesting characters and no magic as of yet. It's not badly written, quite the contrary, but its subject is long winding and doesn't interest me. I will therefore abandon reading it.

So I was trying to optimize a sorting algorithm, only the metrics did not make any sense. On the test page I had amazing performance, on the other it was slow as hell. What could possibly be the problem?

The difference between the two tests was that one was sorting inline (normal array reading and writing) and the other was using a more complex function and iterated a data source. So I went to test the performance of iterating itself.

The code tests the speed of adding all the items of a large array in three cases:

  • a classic for...in loop that increments an index and reads the array at that index
  • a for...of loop that iterates the items of the array directly
  • a for...of loop that iterates over a generator function that yields the values of the array
time(()=>{ let sum=0; for (let i=0; i<arr.length; i++) sum+=arr[i]; },'for in');
time(()=>{ let sum=0; for (const v of arr) sum+=v; },'iterator for of');
time(()=>{ let sum=0; for (const v of (function*(){ for (let i=0; i<arr.length; i++) yield arr[i]; })()) sum+=v; },'generator for of');

time is a function I used to compute the speed of the execution. The array is 100 million integers. And here are the results:

for in: 155.12999997008592
for of: 1105.7250000303611
for of: 2309.88499999512

I haven't yet processed what it means, but I really thought using an iterator was going to be at least as fast as a for loop that uses index accessing to read values. Instead there is a whooping 7 to 14 times decrease in speed.

So from now on I will avoid for...of in high performance scenarios.

  I guess I don't have to tell you about ad blockers and browser extensions that improve YouTube. They are a dime a dozen and bring many features to the habitual YouTube watcher. However there is one particular new YouTube annoyance that you don't really need an extension to get rid of: the dreaded Video paused dialog.

  To get rid of it is easy: on an interval, check if there is a visible element of a certain type containing a certain text and click its button. While this can be done in simple Javascript, I am lazy, so the script that I am using will first load jQuery, then run a one line function periodically. This code is to be copied and pasted in the Console tab of the browser's development tools, then press Enter.

const scr = document.createElement('script');
scr.setAttribute('src','https://code.jquery.com/jquery-3.4.1.min.js');
document.querySelector('head').appendChild(scr);
setInterval(()=> { 
  $('#button:contains(Yes)','yt-confirm-dialog-renderer:visible:contains(Video paused)').click();
},500);

It's easy to understand:

  • create a script element
  • set its source to jQuery
  • append it to the page
  • execute every 500 milliseconds a code that:
    • finds the element with id button containing the text "Yes"
    • inside an element of type yt-confirm-dialog-renderer which is visible and contains the text "Video paused"
    • click the element

There is an even more comfortable solution, though, that I recommend. You will need a Chrome extension called cjs that loads whatever script you tell it in whatever page you want. It gives you the option to inject jQuery, so all you have to do is write 

setInterval(()=> { $('#button:contains(Yes)','yt-confirm-dialog-renderer:visible:contains(Video paused)').click(); },500);

 as the script to be executed on YouTube.

That's it. You're all done.

and has 0 comments

  Wakenhyrst is very well written, but where it excels is the dissection of the hypocrisy of people. Michelle Paver is telling the story from the viewpoint of a young girl who must navigate the world and her own adolescence in the house of a father that has no love for her or for her mother, finds every reason to blame others for his shortcomings and deeds, and yet is untouchable because he is a man and the lord of the manor. What legions of screeching feminists could not do, Paver manages with her subdued, yet defiant description of how women are used and ignored and pretty much treated as glorified pets. It is impossible to not hate the father figure in the book, even as the main character is torn between wanting to forgive him and dealing with the creepy and sometimes evil shit he pulls. The ending is powerful, as the daughter finds the strength to sublimate her hate into an even more appropriate emotion: pity.

  But the story's power is not limited to the detailed analysis of the human psyche. It binds together Anglican folklore, medieval beliefs about devils and angels and art, whitewashed (in the actual sense of the term) by Puritans and systematically destroyed by Victorians, the power of untamed nature and the horror of the human complacency. How refreshing to have a very young girl be the rational and intelligent agent that fends for herself in a world of mystical belief and societal poppycock, so that we can identify with her and see it as it was. How wonderful to have Paver describe it all without any trace of anachronism, as if she has lived in that world herself.

  The story starts slow and the pace almost never picks up, yet the tension and the level of details are constantly increasing, managing to somehow convey at the same time two distinct and contrary feelings: one of slow burn and the other of untamed power rising to a crescendo. It brilliantly mingles the oppressive hot wet feel of subconscious fear and superstition with cold analytical reason as its adversary. In the beginning I wanted to rate it above average only, but now, the more I think about it the more I admire the writing and the way the book tells the story. Good job, Michelle Paver!

  Bottom line: move past the slower start, it is certainly worth reading. A gothic tale of subliminal supernatural horror and a very human and real one at the same time.

and has 0 comments

  Salvation Lost is the second book in the Salvation Sequence trilogy from Peter F. Hamilton. I was commenting on the previous book saying that it is mostly filler and action that is irrelevant to the larger story. This book is a lot more action packed and a bit more interesting, but ultimately just as pointless. A lot of characters that will only get relevant in the third book, if at all, a lot of stories that happen in the past as we know what is going 10000 years into the future and no closure on anything. The horror of the alien invasion is powerful, but not as much as it could have been. The story invests so much in some people only to kill them later with no apparent effect on the timeline of events.

  Bottom line: I will probably read the last book, scheduled sometime in Sep 2020, but I feel this series is one of Hamilton's weakest.

Just a thing I learned today: using the bitwise not operator (~) on a number in Javascript ignores its fractional part (it converts it to integer first), therefore using it twice gives you the integer part of original number. Thanks to fetishlace for clarifications.

Notes:

  • this is equivalent to (int)number in languages that support the int type
  • this is equivalent to Math.trunc for numbers in the integer range
  • this is equivalent to Math.floor only for positive numbers in the integer range

Examples:
~~1.3 = 1
~~-6.5432 = -6
~~(2 ** 32 + 0.5) = 0
~~10000000000 = 1410065408

P.S. If you don't know what ** is, it's called the Exponentiation operator and it came around in Javascript ES2016 (ES7) and it is the equivalent of Math.pow. 2 ** 3 = Math.pow(2,3) = 8

and has 0 comments

  I consider Peter F. Hamilton to be one of the great science fiction writers. Yes, he has a formula, yes he messes up the endings, but the ideas and worlds that he puts on paper have rarely disappointed me. I can't say that of Salvation, either, but I didn't especially like it, as it gives away too much too soon, then proceeds on boring or enraging the reader with police procedural vignettes that we already know will have no impact on future events.

  Hamilton has this method of combining at least two threads, usually one is hard science fiction and the other is fantasy or police procedural or something different, only to bind them together at some point in time. In this book, we see a group of people running away from an enemy species bent on exterminating humanity and also a peaceful future in which humanity has discovered how to create instantaneous transport portals to other places and was contacted by two different alien species. Somehow, how we get from one point to the other is the topic of the book, but the vast majority of it is about corporate security people that abuse their power to "get things done" or toxic cleanup people or other kinds of short stories that only bring some new information to light while eating up reading time. Then the book ends!

  And it's a bit annoying that even the technical aspects don't add up, like how you can know how to make portals, but you still rely on nuclear weapons or rockets to do war, or that people fear sabotage and terrorism, but don't see the possible threat posed by personal portals to other worlds. Gravity alterations, atmosphere loss or pollution, a portal from a star to an inhabited area would have been much more dangerous. Also some of the ways characters act are completely unnatural and it feels jarring to see them do things in a sequence (heh!) only to further the story and not consistent to their character and expertise.

  Anyway, I am reading Salvation Lost now, but in my view the first book in this series could have been a lot more, especially from a brilliant writer such as Hamilton.

and has 2 comments

  Sometimes a good regular expression can save a lot of time and lead to a robust, yet flexible system that works very efficiently in terms of performance. It may feel like having superpowers. I don't remember when exactly I've decided they were useful, but it happened during my PHP period, when the Linux command line system and its multitude of tools made using regular expressions a pretty obvious decision. Fast forward (waaaay forward) and now I am a .NET developer, spoiled by the likes of Visual Studio and the next-next approach to solving everything, yet I still use regular expressions, well... regularly, sometimes even when I shouldn't. The syntax is concise, flexible and easy to use most of the time.

  And yet I see many senior developers avoiding regular expressions like they were the plague. Why is that? In this post I will argue that the syntax makes a regular pattern be everything developers hate: unreadable and hard to maintain. Having many of the characters common in both XML and JSON have special meaning doesn't help either. And even when bent on learning them, having multiple flavors depending on the operating system, language and even the particular tool you use makes it difficult. However, using small incremental steps to get the job done and occasionally look for references to less used features is usually super easy, barely an inconvenience. As for using them in your code, new language features and a few tricks can solve most problems one has with regular expressions.

 The Syntax Issue

The most common scenario for the use of regular expressions is wanting to search, search and replace or validate strings and you almost always start with a bunch of strings that you want to match. For example, you want to match both dates in the format 2020-01-21 and 21/01/2020. Immediately there are problems:

  • do you need to escape the slashes?
  • if you match the digits, are you going for two digit month and day segments or do you also accept something like 21/1/2020?
  • is there the possibility of having strings in your input that look like 321/01/20201, in which case you will still match 21/01/2020, but it's clearly not a date?
  • do you need to validate stuff like months being between 1-12 and days between 1-31? Or worse, validate stuff like 30 February?

But all of these questions, as valid as they are, can be boiled down to one: given my input, is my regular expression matching what I want it to match? And with that mindset, all you have to do is get a representative input and test your regular expression in a tester tool. There are many out there, free, online, all you have to do is open a web site and you are done. My favourite is RegexStorm, because it tests .NET style regex, but a simple Google search will find many and varied tools for reading and writing and testing regular expressions.

The syntax does present several problems that you will hit every time:

  • you cannot reuse parts of the pattern
    • in the example above, even if you have clearly three items that you look for - year, month, day - you will need to copy/paste the pattern for each variation you are looking for
  • checking the same part of the input string multiple times is not what regular expressions were created for and even those that support various methods to do that do it in a hackish way
    • example: find a any date in several formats, but not the ones that are in March or in 2017
    • look behind and look ahead expressions are usually used for scenarios like this, but they are not easy to use and reduce a lot of the efficiency of the algorithm
  • classic regular expression syntax doesn't support named groups, meaning you often need to find and maintain the correct index for the match
    • what index does one of the capturing groups have?
    • if you change something, how do other indexes change?
    • how do you count groups inside groups? Do they even count if they match nothing?
  • the "flags" indicating how the regular engine should interpret the pattern are different in every implementation
    • /x/g will look for all x characters in the string in Javascript
    • C# doesn't even need a global flag and the flags themselves, like CaseInsensitive, are .NET enums in code, not part of the regular pattern string
  • from the same class of issues as the two above (inconsistent implementation), many a tool uses a regular expression syntax that is particular to it
    • a glaring example is Visual Studio, which does not use a fully compatible .NET syntax

  The Escape Issue

The starting point for writing any regular expression has to be a regex escaped sample string that you want to match. Escaping means telling the regular expression engine that characters from your sample string that have meaning to it are just simple characters. Example 12.23.34.56, which is an IP address, if used exactly like that as a regular pattern, will match 12a23b34c56, because the dot is a catch all special character for regular expressions. The pattern working as expected would be 12\.23\.34\.56. Escaping brings several severe problems:

  • it makes the pattern less humanly readable
    • think of a phrase in which all white space has been replaced with \s+ to make it more robust (it\s+makes\s+the\s+pattern\s+less\s+humanly\s+readable)
  • you only escape with a backslash in every flavor of regular expressions, but the characters that are considered special are different depending on the specific implementation
  • many characters found in very widely used data formats like XML and JSON are special characters in regular expressions and the escaping character for regex is a special character in JSON and also string in many popular programming languages, forcing you to "double escape", which magnifies the problem
    • this is often an annoyance when you try to store regular patterns in config files

  Readability and Maintainability

Perhaps the biggest issue with regular expressions is that they are hard to maintain. Have you ever tried to understand what a complex regular expression does? The author of the code started with some input data and clear objectives of what they wanted to match, went to regular expression testers, found what worked, then dumped a big string in the code or configuration file. Now you have neither the input data or the things that they wanted matched. You have to decompile the regular pattern in your head and try to divine what it was trying to do. Even when you manage to do that, how often do developers redo the testing step so they verify the changes in a regular expressions do what was intended?

Combine this with the escape issue and the duplication of subpatterns issue and you get every developer's nightmare: a piece of code they can't understand and they are afraid to touch, one that is clearly breaking every tenet of their religion, like Don't Repeat Yourself or Keep It Simple Silly, but they can't change. It's like an itch they can't scratch. The usual solution for code like that is to unit test it, but regular expression unit tests are really really ugly:

  • they contain a lot of text variables, on many lines
  • they seem to test the functionality of the regular expression engine, not that of the regular expression pattern itself
  • usually regular expressions are used internally, they are not exposed outside a class, making it difficult to test by themselves

  Risk

Last, but not least, regular expressions can work poorly in some specific situations and people don't want to learn the very abstract computer science concepts behind regular expression engines in order to determine how to solve them.

  • typical example is lazy modifiers (.*? instead of .*) which tell the engine to not be greedy (get the least, not the most)
    • ex: for input "ineffective" the regular expression .*n will work a lot worse than .*?n, because it will first match the entire word, then see it doesn't end with n, then backtrack until it gets to "in" which it finally matches. The other syntax just stops immediately as it finds the n.
  • another typical example is people trying to find the HTML tag that has a an attribute and they do something like \<.*href=\"something\".*\/\> and what it matches is the entire HTML document up to a href attribute and the end of the last atomic tag in the document.
  • the golden hammer strikes again
    • people start with a simple regular expression in the proof of concept, they put it in a config file, then for the real life application they continuously tweak the configured pattern instead of trying to redesign the code, until they get to some horrible monstrosity
    • a regex in an online library that uses look aheads and look behinds solves the immediate problem you have, so you copy paste it in the code and forget about it. Then the production app has serious performance problems. 

  Solutions

  There are two major contexts in which to look for solutions. One is the search/replace situation in a tool, like a text editor. In this case you cannot play with code. The most you can hope for is that you will find a tester online that supports the exact syntax for regular expressions of the tool you are in. A social solution would be to throw shade on lazy developers that think only certain bits of regular expressions should be supported and implemented and then only in one particular flavor that they liked when they were children.

  The second provides more flexibility: you are writing the code and you want to use the power of regular expressions without sacrificing readability, testability and maintainability. Here are some possible solutions:

  • start with simple stuff and learn as you go
    • the overwhelming majority of the time you need only the very basic features of regular expressions, the rest you can look up when you need them
    • if the regular expression becomes too complex it is an indication that maybe it's not the best approach
  • store subpatterns in constants than you can then reuse using templated strings
    • ex: var yearPattern = @"(?<year>\d{4})"; var datePattern = $@"\b(?:{yearPattern}-(?<month>\d{2})-(?<day>\d{2})|(?<month>\d{2})\/(?<day>\d{2})\/{yearPattern})\b";
    • the example above only stores the year in another variable, but you can store the two different formats, the day, the month, etc
    • in the end your code might look more like the Backus-Naur (BNF) syntax, in which every separate component is described separately
  • use verbatim strings to make the patterns more readable by avoiding double escaping
    • in C# use @"\d+" not "\\d+"
    • in Javascript they are called template literals and use backticks instead of quotes, but they have the same mechanism for escaping characters as normal strings, so they are not a solution
  • use simple regular expressions or, if not, abstract their use
    • a good solution is using a fluent interface (check this discussion out) that allows the expressivity of human language for something that ends up being a regular expression
    • no, I am not advocating creating your own regular expression syntax... I just think someone probably already did it and you just have to find the right one for you :)
  • look for people who have solved the same problem with regular expression and you don't have to rewrite the wheel
  • always test your regular expressions on valid data and pay attention to the time it took to get the matches
  • double check any use of the string methods like IndexOf, StartsWith, Contains, even Substring. Could you use regular expressions?
    • note that you cannot really chain these methods. Replacing a regular expression like ^http[s]?:// with methods always involves several blocks of code and the introduction of cumbersome constants:
      if (text.StartsWith("http")) {
        // 4 works now, but when you change the string above, it stops working
        // you save "http" to a constant and then use .Length, you write yet another line of code
        text=text.Substring(4);
      } else {
        return false;
      }
      if (text.StartsWith("s")) {
        text=text.Substring(1);
      }
      return text.StartsWith("://");
      
      // this can be easily refactored to
      return text
               .IfStartsWith("http")
               .IfStartsWithOrIgnore("s")
               .IfStartsWith("://");
      // but you have to write your own helper methods
      // and you can't save the result in a config file​

  Conclusion

Regular expressions look daunting. Anyone not familiar with the subject will get scared by trying to read a regular expression. Yet most regular expression patterns in use are very simple. No one actually knows by heart the entire feature set of regular expressions: they don't need to. Language features and online tools can help tremendously to make regex readable, maintainable and even testable. Regular expressions shine for partial input validation and efficient string search and replace operations and can be easily stored in configuration files or data stores. When regular expressions become too complex and hard to write, it might be a sign you need to redesign your feature. Often you do not need to rewrite a regular expression, as many libraries with patterns to solve most common problems already exist.