Why all articles about demystifying JS array methods are rubbish
Every month or so I see another article posted by some dev, usually with a catchy title using words like "demystifying" or "understanding" or "N array methods you should be using" or "simplify your Javascript" or something similar. It has become so mundane and boring that it makes me mad someone is still trying to cache on these tired ideas to try to appear smart. So stop doing it! There is no need to explain methods that were introduced in 2009!
But it gets worse. These articles are partially misleading because Javascript has evolved past the need to receive or return data as arrays. Let me demystify the hell out of you.
First of all, the methods we are discussing here are .filter and .map. There is of course .reduce, but that one doesn't necessarily return an array. Ironically, one can write both .filter and .map as a reduce function, so fix that one and you can get far. There is also .sort, which for performance reasons works a bit differently and returns nothing, so it cannot be chained as the others can. All of these methods from the Array object have something in common: they receive functions as parameters that are then applied to all of the items in the array. Read that again: all of the items.
Having functions as first class citizens of the language has always been the case for Javascript, so that's not a great new thing to teach developers. And now, with arrow functions, these methods are even easier to use because there are no scope issues that caused so many hidden errors in the past.
Let's take a common use example for these methods for data display. You have many data records that need to be displayed. You have to first filter them using some search parameters, then you have to order them so you can take just a maximum of n records to display on a page. Because what you display is not necessarily what you have as a data source, you also apply a transformation function before returning something. The code would look like this:
var colors = [
{ name: 'red', R: 255, G: 0, B: 0 },
{ name: 'blue', R: 0, G: 0, B: 255 },
{ name: 'green', R: 0, G: 255, B: 0 },
{ name: 'pink', R: 255, G: 128, B: 128 }
];
// it would be more efficient to get the reddish colors in an array
// and sort only those, but we want to discuss chaining array methods
colors.sort((c1, c2) => c1.name > c2.name ? 1 : (c1.name < c2.name ? -1 : 0));
const result = colors
.filter(c => c.R > c.G && c.R > c.B)
.slice(page * pageSize, (page + 1) * pageSize)
.map(c => ({
name: c.name,
color: `#${hex(c.R)}${hex(c.G)}${hex(c.B)}`
}));
This code takes a bunch of colors that have RGB values and a name and returns a page (defined by page and pageSize) of the colors that are "reddish" (more red than blue and green) order by name. The resulting objects have a name and an HTML color string.
This works for an array of four elements, it works fine for arrays of thousands of elements, too, but let's look at what it is doing:
- we pushed the sort up, thus sorting all colors in order to get the nice syntax at the end, rather than sorting just the reddish colors
- we filtered all colors, even if we needed just pageSize elements
- we created an array at every step (three times), even if we only needed one with a max size of pageSize
Let's write this in a classical way, with loops, to see how it works:
const result = [];
let i=0;
for (const c of colors) {
if (c.R<c.G || c.R<c.B) continue;
i++;
if (i<page*pageSize) continue;
result.push({
name: c.name,
color: `#${hex(c.R)}${hex(c.G)}${hex(c.B)}`
});
if (result.length>=pageSize) break;
}
And it does this:
- it iterates through the colors array, but it has an exit condition
- it ignores not reddish colors
- it ignores the colors of previous pages, but without storing them anywhere
- it stores the reddish colors in the result as their transformed version directly
- it exits the loop if the result is the size of a page, thus only going through (page+1)*pageSize loops
No extra arrays, no extra iterations, only some ugly ass code. But what if we could write this as nicely as in the first example and make it work as efficiently as the second? Because of ECMAScript 6 we actually can!
Take a look at this:
const result = Enumerable.from(colors)
.where(c => c.R > c.G && c.R > c.B)
//.orderBy(c => c.name)
.skip(page * pageSize)
.take(pageSize)
.select(c => ({
name: c.name,
color: `#${hex(c.R)}${hex(c.G)}${hex(c.B)}`
}))
.toArray();
What is this Enumerable thing? It's a class I made to encapsulate the methods .where, .skip, .take and .select and will examine it later. Why these names? Because they mirror similar method names in LINQ (Language Integrated Queries from .NET) and because I wanted to clearly separate them from the array methods.
How does it all work? If you look at the "classical" version of the code you see the new for..of loop introduced in ES6. It uses the concept of "iterable" to go through all of the elements it contains. An array is an iterable, but so is a generator function, also an ES6 construct. A generator function is a function that generates values as it is iterated, the advantage being that it doesn't need to hold all of the items in memory (like an array) and any operation that needs doing on the values is done only on the ones requested by code.
Here is what the code above does:
- it creates an Enumerable wrapper over array (performs no operation, just assignments)
- it filters by defining a generator function that only returns reddish colors (but performs no operation) and returns an Enumerable wrapper over the function
- it ignores the items from previous pages by defining a generator function that counts items and only returns items after the specified number (again, no operation) and returns an Enumerable wrapper over the function
- it then takes a page full of items, stopping immediately after, by defining a generator function that does that (no operation) and returns an Enumerable wrapper over the function
- it transforms the colors in output items by defining a generator function that iterates existing items and returns the transformed values (no operation) and returns an Enumerable wrapper over the function
- it iterates the generator function in the current Enumerable and fills an array with the values (all the operations are performed here)
And here is the flow for each item:
- .toArray enumerates the generator function of .select
- .select enumerates the generator function of .take
- .take enumerates the generator function of .skip
- .skip enumerates the generator function of .where
- .where enumerates the generator function that iterates over the colors array
- the first color is red, which is reddish, so .where "yields" it, it passes as the next item in the iteration
- the page is 0, let's say, so .skip has nothing to skip, it yields the color
- .take still has pageSize items to take, let's assume 20, so it yields the color
- .select yields the color transformed for output
- .toArray pushes the color in the result
- go to 1.
If for some reason you would only need the first item, not the entire page (imagine using a .first method instead of .toArray) only the steps from 1. to 10. would be executed. No extra arrays, no extra filtering, mapping or assigning.
Am I trying too hard to seem smart? Well, imagine that there are three million colors, a third of them are reddish. The first code would create an array of a million items, by iterating and checking all three million colors, then take a page slice from that (another array, however small), then create another array of mapped objects. This code? It is the equivalent of the classical one, but with extreme readability and ease of use.
OK, what is that .orderBy thing that I commented out? It's a possible method that orders items online, as they come, at the moment of execution (so when .toArray is executed). It is too complex for this blog post, but there is a full implementation of Enumerable that I wrote containing everything you will ever need. In that case .orderBy would only order the minimal number of items required to extract the page ((page+1) * pageSize). The implementation can use custom sorting algorithms that take into account .take and .skip operators, just like in LiNQer.
The purpose of this post was to raise awareness on how Javascript evolved and on how we can write code that is both readable AND efficient.
One actually doesn't need an Enumerable wrapper, and can add the methods to the prototype of all generator functions, as well (see LINQ-like functions in JavaScript with deferred execution). As you can see, this was written 5 years ago, and still people "teach" others that .filter and .map are the Javascript equivalents of .Where and .Select from .NET. NO, they are NOT!
The immense advantage for using a dedicated object is that you can store information for each operator and use it in other operators to optimize things even further (like for orderBy). All code is in one place, it can be unit tested and refined to perfection, while the code using it remains the same.
Here is the code for the simplified Enumerable object used for this post:
class Enumerable {
constructor(generator) {
this.generator = generator || function* () { };
}
static from(arr) {
return new Enumerable(arr[Symbol.iterator].bind(arr));
}
where(condition) {
const generator = this.generator();
const gen = function* () {
let index = 0;
for (const item of generator) {
if (condition(item, index)) {
yield item;
}
index++;
}
};
return new Enumerable(gen);
}
take(nr) {
const generator = this.generator();
const gen = function* () {
let nrLeft = nr;
for (const item of generator) {
if (nrLeft > 0) {
yield item;
nrLeft--;
}
if (nrLeft <= 0) {
break;
}
}
};
return new Enumerable(gen);
}
skip(nr) {
const generator = this.generator();
const gen = function* () {
let nrLeft = nr;
for (const item of generator) {
if (nrLeft > 0) {
nrLeft--;
} else {
yield item;
}
}
};
return new Enumerable(gen);
}
select(transform) {
const generator = this.generator();
const gen = function* () {
for (const item of generator) {
yield transform(item);
}
};
return new Enumerable(gen);
}
toArray() {
return Array.from(this.generator());
}
}
The post is filled with links and for whatever you don't understand from the post, I urge you to search and learn.
Comments
Be the first to post a comment