Optimized general Quicksort in Javascript
While working on my pet project Linqer (LINQ for Javascript and Typescript) I've spent quite a lot of time improving the performance of the Quicksort algorithm I am using for .orderBy. Therefore I am publishing it here, even if you could extract it just the same from the Linqer sources, with limited discussion on what is going on.
Why
First, why use it at all? Doesn't Javascript have the .sort method in the Array class? What's wrong with that?
The answer is that the implementation for sort is different from browser to browser, or better said, from Javascript engine to Javascript engine. In Chrome, for example, the algorithm used is insertion sort, which is simple, in place, stable and reasonably fast. It is optimized for the most common usage: small arrays that need to be sorted for UI purposes and such. However, when using large arrays, the algorithm doesn't perform as well as one might expect.
For Linqer I had an additional reason, because I would use ordering followed by skip and take methods that limited the scope of the need for sorting. Imagine a million items array that I wanted ordered and then needed the first ten items. Sorting the entire thing for just ten items would have been overkill. The default .sort function doesn't have parameters for such scenarios.
And there is another reason: the default function used to compare array items is alphanumeric. [1, 2, 10]
would get ordered as [1, 10, 2]
.
Second, why Quicksort? There are a bunch of sorting algorithms out there. Mergesort, Heapsort, Radixsort, Timsort, Selectionsort. What's so special about Quicksort.
I have to admit that I went for it by googling fast sorting algorithm. It does have "quick" in the name, doesn't it? I also found it elegant and easy to comprehend. And for my particular scenario, I liked that it used a divide et impera strategy which allowed me to ignore parts of the array if I didn't need the items there. In other words, it's very well suited both as a general sorting algorithm and a partial sorting algorithm.
What
I would like to tell you that it's simple to explain what Quicksort does, but it requires some amount of attention and time. In general terms, it chooses an arbitrary item (called a pivot) then orders the remaining items relative to the pivot, in two so called partitions: the smaller items to the left, the larger to the right. Then it repeats the process for each of the two sides. How the pivot is chosen and how the partitions are handled is what differentiates Quicksort algorithms and determines their performance.
It is an in place algorithm, meaning it doesn't copy the array in some other type of structure and instead it moves items around inside it. It is not a stable algorithm, meaning the order of "equal" items is not preserved. The average computational complexity is O(n log n), with the worst cases O(n^2). The space complexity is harder to determine. Most people say it is O(1) because it uses no additional data structures, but that is not really correct. Being a recursive algorithm, the call stack gets used quite a lot, an invisible storage that should be computed in the data complexity.
Unfortunately, the worst case scenarios are also very common: already sorted arrays and arrays filled with the same value. There are various optimizations to be used in order to handle this sort of thing. Also, Quicksort is efficient with large quantities of data, but less so with small numbers of items.
How
Finally, we get to the code. The _quicksort function receives:
- an array
- left and right index values determining the inclusive area that will be sorted (usually 0 and array.length-1)
- a comparer function
(item1,item2)=> 1, 0 or -1
and that defaults to _defaultComparer which tries to sort items based on the > and < operators - min and max index values determining the window of the array that we need to have sorted
The left and right indexes determine which section (before the sort) of the array will be sorted, the min and max indexes determine which items I am interested in (after the sort). This allows me to skip ordering partitions that are outside my area of interest.
As I said, the pivot choice is important. Some strategies are very popular:
- the last item in the array as the pivot
- this is the strategy used in the original incarnation of Quicksort
- leads to very poor performance when the array is already sorted
- the median item
- this suggests parsing the array in order to get the value, implying extra computation
- it only makes sense when the values in the array are numbers
- the average between the first, the last and the middle item
- it only makes sense when the values in the array are numbers
- the item that is in the middle of the array
- this is the one that I am using
- a random item in the array
- this makes the algorithm escape scenarios where the performance would be bad
- the outcome of the sorting is unpredictable in terms of time used and stability of items
- multiple pivots
- an interesting concept, but one that complicated the algorithm too much for comfort
Then there is the matter of the partitioning. I've used an optimization that involves two indexes, one at the start the other at the end of a partition, coming toward each other and swapping items that are on the wrong side of the pivot. In some implementations, if the pivot is the last item, the partitioning is from one side only. In others, multiple indexes are used to handle multiple pivots.
In most implementations, the algorithm recurses on _quicksort, but I refactored it to only recurse on the partitioning. Then, because I didn't want to get stack overflows when bad data was used, I've eliminated the recursion and instead used a stack of my own where the partitions to be sorted are stored and wait their turn. This is where the data complexity comes around. In my case I was using a little more data than I actually needed, because I was adding partitions to the stack and also incrementing the index of the current partition, meaning the stack array grew with handled partitions. Even if there is no computation performance benefit, I've optimized this as well by adding a queueIndex which is used to recycle the slots in the partition array that are behind the partitionIndex. New partitions are being added behind the partitionIndex and queueIndex is incremented. When the loop reaches the last partition in the stack, a new loop is started with the partitions from 0 to queueIndex. Thus, for a ten million item array the partition stack rarely goes above 40000 in length.
A further optimization is to use insertion sort on partitions that have become too small (under 64 items). It irks me to have had to do this, I would have liked to use a "pure" algorithm, but this improved the performance and minimized the size of the partition stack.
The Code
That's about it. Here is the code:
function _insertionsort(arr, leftIndex, rightIndex, comparer) {
for (let j = leftIndex; j <= rightIndex; j++) {
const key = arr[j];
let i = j - 1;
while (i >= leftIndex && comparer(arr[i], key) > 0) {
arr[i + 1] = arr[i];
i--;
}
arr[i + 1] = key;
}
}
function _swapArrayItems(array, leftIndex, rightIndex) {
const temp = array[leftIndex];
array[leftIndex] = array[rightIndex];
array[rightIndex] = temp;
}
function _partition(items, left, right, comparer) {
const pivot = items[(right + left) >> 1];
while (left <= right) {
while (comparer(items[left], pivot) < 0) {
left++;
}
while (comparer(items[right], pivot) > 0) {
right--;
}
if (left < right) {
_swapArrayItems(items, left, right);
left++;
right--;
}
else {
if (left === right)
return left + 1;
}
}
return left;
}
const _insertionSortThreshold = 64;
function _quicksort(items,
left, right, comparer = _defaultComparer,
minIndex = 0, maxIndex = Number.MAX_SAFE_INTEGER) {
if (!items.length)
return items;
const partitions = [];
partitions.push({ left, right });
while (partitions.length) {
({ left, right } = partitions.pop());
if (right - left < _insertionSortThreshold) {
_insertionsort(items, left, right, comparer);
continue;
}
const index = _partition(items, left, right, comparer);
if (left < index - 1 && index - 1 >= minIndex) {
partitions.push({ left, right: index - 1 });
}
if (index < right && index < maxIndex) {
partitions.push({ left: index, right });
}
}
return items;
}
_defaultComparer = (item1, item2) => {
if (item1 > item2)
return 1;
if (item1 < item2)
return -1;
return 0;
};
Comments
Be the first to post a comment