Lambdas, LInQ, Javascript and so on

Published Apr 13, 2017

Posted in
Java
programming
Haxe
C#

As a .NET developer I am very familiar with LInQ, or Language Integrated Queries, which is a collection of fluent interface methods that deal with querying data from collections. However, so many people outside the .NET ecosystem are not familiar with the concept or they use it as disparate functions in their language of choice. What makes it even more confusing is that the same concept is implemented in other languages with different names. Let me give you an example:

var arr=[1,2,3,4,5,6];
var result=arr
  .Where(v=>v%2==0) //get only even values
  .Select(v=>v*10)  //return their values multiplied with 10
  .Aggregate(15,(s,v)=>v+s); //aggregate their value into a sum that starts with a seed of 15
// result should be 15+2*10+4*10+6*10=135

We see here the use of three of these methods:

Where - filters the values on a condition
Select - changes the values it returns
Aggregate - creates an aggregate value using an operation on all the values in the collection

Let me write you the same C# code without using these methods:

var arr=[1,2,3,4,5,6];
var result=15;
foreach(var v in arr) {
  if (v%2==0) {
    result+=v*10;
  }
}

In this case, some people might prefer the second version, but it is only an example. LInQ is not a silver bullet that replaces all loops, only a tool amongst many in a large toolset. Some advantages of using such a method are concise code, better readability, a common API for iterating, filtering and querying collections, etc. For example in the largely used Entity Framework or its previous incarnations such as Linq over SQL, the queries would look the same, but they would be translated into SQL and sent to the database and executed just once. So it would not get a list of thousands of records to filter it in memory, instead it would translate the expression of the function sent to the query into SQL and execute it there. The same sort of operations can be used on streams of data, rather than fixed collections, like in the case of Reactive Extensions.

Some other methods in this set include:

First/Last - getting the first or last element in an enumerable that satisfies a boolean condition
Skip - ignoring a number of values in a collection
Take - returning a number of values in a collection
Any/All - returning true if at least one or all of the items satisfy a boolean condition
Average/Sum/Min/Max - specific aggregating methods for the elements in the collection
OrderBy/OrderByDescending - sorting
Count - counting

There are many others, you can look them up here.

Does this system of querying data seem familiar to you? To SQL developers it will feel second nature. In SQL the same result from above would be achieved by using something like:

SELECT 15+SUM(SELECT v*10 FROM table WHERE v%2=0)

Note that other than putting the source of the data in front, LInQ syntax is almost identical.

However, in other languages this sort of data query is called map/reduce and in fact there is a very used programming model called MapReduce that applies in big data processing. In Java, the function that filters data is called filter, the one that alters the values is called map and the one that aggregates data is called reduce. Similar in Javascript. Here is the same code in Javascript:

var arr=[1,2,3,4,5,6];
var result=arr
  .filter(v=>v%2==0) //get only even values
  .map(v=>v*10)  //return their values multiplied with 10
  .reduce((s,v)=>v+s,15); //aggregate their value into a sum that starts with a seed of 15
// result should be 15+2*10+4*10+6*10=135

Note that the lambda syntax of writing functions used here is new in ECMA Script version 6. Before you would have to use the function(x) { return [something with x]; } syntax.

In Haxe, the concept is achieved by using the Lambda library and the functions are again named differently: filter for filtering, map for altering and fold for aggregating.

There is another sort of people that would instantly recognize this model of data querying: functional programming people. Indeed, SQL is a functional programming language at its core and the same standard for data querying is used very efficiently in functional programming languages, since they know whether a function is pure or not (has side effects). When only dealing with pure functions, some optimizations can be made on the query by the compiler before anything is even executed. Haskell has the same naming as Haxe (filter, map, fold) for example.

So whenever I get to review other people's code, especially people that have little experience with either SQL or C#, I cringe to see stuff like this:

var max=-1;
for (var i=0; i<arr.length; i++) {
  if (max<arr[i]) max=arr[i];
}

In my head this should be simply arr.max(); And considering how easy it is to implement something like this in Javascript, for example, it's a crime for not using it:

Array.prototype.max=function() { return Math.max.apply(null,this); }

Yet there is more to this than my personal preference for reading code. Composition, for example. Because this works like a fluent API or a builder pattern, one can keep adding conditions to a query. Imagine you have to filter a list of strings based on a Google like query string. At the very minimum you would need to split the query into strings and filter repeatedly on each one. Something like this:

var arr=['this is my special query string','this is a string','my query string is this awesome','no query strings here, move along','these are not the strings you are looking for'];
var query="this is a query string";
var splits=query.split(/\s+/g);
var result=arr;
splits.forEach(s=>result=result.filter(a=>a.includes(s)));
console.log(result);

There is a lot of stuff I could be saying about this subject, but I need to summarize it. It's all about inverting loops. Instead of having to go through a collection, a stream or some other data source, then executing some code for each element, this method allows you to encapsulate the operations you want to execute on those elements, pass them around, compose them, translate them, then use them on any data source in the same way. A common API means reusability, better readability of code, less written code and a simpler declaration of intent. Because we get out of the loop system, we can expand the use for other paradigms, such as infinite data streams or event buses.

Comments

September 9, 2018

The post wasn't specifically about adding a method to the Array prototype, but about reducing code that needs to be read by others in a way that makes it more readable. I don't really agree with the idea that Array is so standard that you should not modify its prototype. C#, for example, supports extension methods that can "modify the prototype" of any object or interface. In code reviews, I mark as bugs stuff like extension methods on integers. It's way better to say obj.SetId(5) than 5.SetIdTo(obj) and you really don't want any more methods on the String class (a pet peeve of mine, because a string is also an IEnumerable of char, so it gets all the extensions for enumerables, like Max ! :) ). However, a javascript array is just that: a list of objects. If you routinely look for the maximum, why shouldn't you add it? It's not like you are modifying the source code of Javascript. It's your own choice if you add to your project the piece of code adding the function to the array. Yet, the basic idea of the post was that for anyone reading your code it's way better to see arr.max() or even array_max(arr) than to have to understand an entire loop and its intended functionality. Thank you for your comment and I am really glad you like and find useful what I am writing here.

Siderite

September 9, 2018

Siderite - I am enjoying your blog - but I am trying to understand something that falls under the "whether you should just because you can" category. You advocate adding a .max method to the Array prototype. Here is some advice to the contrary: Only modify your own prototypes. Never modify the prototypes of standard JavaScript objects. And here is the source: https://www.w3schools.com/js/js_object_prototypes.asp. Why not just set up a function in your own custom library that simply returns the max value of an array that is passed into it? function array_max(x) { return Math.max.apply(null,arr);} array_max(arr)

boldresolves

Comments

September 9, 2018

September 9, 2018

Post a comment