When we connect to SQL we usually copy/paste some connection string and change the values we need and rarely consider what we could change in it. That is mostly because of the arcane looking syntax and the rarely read documentation for it. You want to connect to a database, give it the server, instance and credentials and be done with it. However, there are some parameters that, when set, can save us a lot of grief later on.

  Application Name is something that identifies the code executing SQL commands to SQL Server, which can then be seen in profilers and DMVs or used in SQL queries. It has a maximum length of 128 characters. Let's consider the often met situation when your application is large enough to be segregated into different domains, each having their own data access layer, business rules and user interface. In this case, each domain can have its own connection string and it makes sense to specify a different Application Name for each. Later on, one can use SQL Profiler, for example, and filter on the specific area of interest.

 

  The application name can also be seen in some queries to SQL Server's Dynamic Management Views (quite normal, considering DMVs are used by SQL Profiler) like sys.dm_exec_sessions. Inside your own queries you can also get the value of the application name by simply calling APP_NAME(). For example, running SELECT APP_NAME(); in SQL Management Studio returns a nice "Microsoft SQL Server Management Studio - Query" value. In SQL Server Profiler the column is ApplicationName while in DMVs like sys.dm_exec_sessions the column is program_name.

  Example connection string: Server=localhost;Database=MyDatabase;User Id=Siderite;Password=P4ssword; Application Name=Greatest App Ever

  Hope it helps!

  Interesting SQL table hint I found today: READPAST. It instructs SQL queries to ignore locked rows. This comes with advantages and disadvantages. For one it avoids deadlocks when trying to read or write an already locked row, but it also provides the wrong results. Just as NOLOCK, it works around the transaction mechanism, and while NOLOCK will allow dirty reads of information partially changed in transactions that have not been committed, READPAST ignores its existence completely.

  There is one scenario where I think this works best: batched DELETE operations. You want to delete a lot of rows from a table, but without locking it. If you just do a delete for the entire table with some condition you will get these issues:

  • the operation will be slow, especially if you are deleting on a clustered index which moves data around in the table
  • if the number of deleted rows is too large (usually 5000 or more) then the operation will lock the entire table, not just the deleted rows
  • if there are many rows to be deleted, the operation will take a long while, increasing the possibility of deadlocks

  While there are several solutions for this, like partitioning the table and then truncating the partitions or soft deletes or designing your database to separate read and write operations, one type of implementation change that is small in scope and large is result is batched deletes. Basically, you run a flow like this:

  1. SELECT a small number of rows to be deleted (again, mind the 5000 limit that causes table locks, perhaps even use ROWLOCK hint)
  2. DELETE rows selected and their dependencies (DELETE TOP x should work as well for steps 1 and 2, but I understand in some cases this syntax automatically causes a table lock and maybe also use ROWLOCK hint)
  3. if the number of selected rows is larger than 0, go back to step 1

  This allows SQL to lock individual rows and, if your business logic is sound, no rows should be deleted while something is trying to read or write them. However, this is not always the case, especially in high stress cases with many concurrent reads and writes. But here, if you use READPAST, then locked rows will be ignored and the next loops will have the chance to delete them.

  But there is a catch. Let's take an example:

  1. Table has 2 rows: A and B
  2. Transaction 1 locks row A
  3. In a batched delete scenario, Transaction 2 gets the rows with READPAST and so only gets B
  4. Transaction 2 deletes row B and commits, and continues the loop
  5. Transaction 3 gets the rows with READPAST and gets no rows (A is still locked)
  6. Transaction 3 deletes nothing and exists the loop
  7. Transaction 1 unlocks row A
  8. Table now has 1 row: A, which should have been deleted, but it's not

  There is a way to solve this: SELECT with NOLOCK and DELETE with READPAST

  • this will allow to always select even locked and uncommitted rows
  • this will only delete rows that are not locked
  • this will never deadlock, but will loop forever as long as some rows remain locked

  One more gotcha is that READPAST allows for a NOWAIT syntax, which says to immediately ignore locked rows, without waiting for a number of seconds (specified by LOCK_TIMEOUT) to see if it unlocks. Since you are doing a loop, it would be wise to wait, so that it doesn't go into a rapid loop while some rows are locked. Barring that, you might want to use READPAST NOWAIT and then add a WAITFOR DELAY '00:00:00.010' at the end of the loop to add 10 millisecond delay, but if you have a lot of rows to delete, it might make this too slow.

  Enough of this, lets see some code example:

DECLARE @batchSize INT = 1000
DECLARE @nrRows INT = 1

CREATE TABLE #temp (Id INT PRIMARY KEY)

WHILE (@nrRows>0)
BEGIN

  BEGIN TRAN

	INSERT INTO #temp
    SELECT TOP (@batchSize) Id
    FROM MyTable WITH (NOLOCK)
    WHERE Condition=1

    SET @nrRows = @@ROWCOUNT

    DELETE FROM mt 
    FROM MyTable mt WITH (READPAST NOWAIT)
    INNER JOIN #temp t
    ON mt.Id=t.Id

	WAITFOR DELAY '00:00:00.010'

  COMMIT TRAN

END

DROP TABLE #temp

Now the scenario goes like this:

  1. Table has 2 rows: A and B
  2. Transaction 1 locks row A
  3. Transaction 2 gets the rows with NOLOCK and so only gets A and B
  4. Transaction 2 deletes rows A and B with READPAST, but only B is deleted
  5. loop continues (2 rows selected)
  6. Transaction 3 gets the rows with NOLOCK and gets one row 
  7. Transaction 3 deletes with READPAST with no effect (A is still locked)
  8. loop continues (1 rows selected)
  9. Transaction 1 unlocks row A
  10. Transaction 4 gets the rows with NOLOCK and gets row A (not locked)
  11. Transaction 4 deleted with READPAST and deletes row A
  12. loop continues (1 rows selected), but next transaction selects nothing, so loop ends (0 rows selected)
  13. Table now has no rows and no deadlock occurred

Hope this helps.

 So I got assigned this bug where date 1900-01-01 was displayed on the screen so, as I am lazy, I started to look into the code without reproducing the issue. The SQL stored procedure looked fine, it was returning:

SELECT
  CASE SpecialCase=1 THEN ''
  ELSE SomeDate
  END as DateFilteredBySpecialCase

Then the value was being passed around through various application layers, but it wasn't transformed into anything, then it was displayed. So where did this magical value come from? I was expecting some kind of ISNULL(SomeDate,'1900-01-01') or some change in the mapping code or maybe SomeDate was 1900-01-01 in some records, but I couldn't find anything like that.

Well, at second glance, the selected column has to have a returning type, so what is it? The Microsoft documentation explains:

Returns the highest precedence type from the set of types in result_expressions and the optional else_result_expression. For more information, see Data Type Precedence.

If you follow that link you will see that strings are at the very bottom, while dates are close to the top. In other words, a CASE statement that returns strings and dates will always have the return type a date!

SELECT CAST('' as DATETIME) -- selects 1900-01-01

Just a quickie. Hope it helps.

and has 0 comments

  Tracing and logging always seem simple, an afterthought, something to do when you've finished your code. Only then you realize that you would want to have it while you are testing your code or when an unexpected issue occurs in production. And all you have to work with is an exception, something that tells you something went wrong, but without any context. Here is a post that attempts to create a simple method to enhance exceptions without actually needing to switch logging level to Trace or anything like that and without great performance losses.

  Note that this is a proof of concept, not production ready code.

  First of all, here is an example of usage:

public string Execute4(DateTime now, string str, double dbl)
{
    using var _ = TraceContext.TraceMethod(new { now, str, dbl });
    throw new InvalidOperationException("Invalid operation");
}

  Obviously, the exception is something that would occur in a different way in real life. The magic, though, happens in the first line. I am using (heh!) the new C# 8.0 syntax for top level using statements so that there is no extra indentation and, I might say, one of the few situations where I would want to use this syntax. In fact, this post started from me thinking of a good place to use it without confusing any reader of the code.

  Also, TraceContext is a static class. That might be OK, since it is a very special class and not part of the business logic. With the new Roslyn source generators, one could insert lines like this automatically, without having to write them by hand. That's another topic altogether, though.

  So, what is going on there? Since there is no metadata information about the names of the currently executing method (without huge performance issues), I am creating an anonymous object that has properties with the same names and values as the arguments of the method. This is the only thing that might differ from one place to another. Then, in TraceMethod I return an IDisposable which will be disposed at the end of the method. Thus, I am generating a context for the entire method run which will be cleared automatically at the end.

  Now for the TraceContext class:

/// <summary>
/// Enhances exceptions with information about their calling context
/// </summary>
public static class TraceContext
{
    static ConcurrentStack<MetaData> _stack = new();

    /// <summary>
    /// Bind to FirstChanceException, which occurs when an exception is thrown in managed code,
    /// before the runtime searches the call stack for an exception handler in the application domain.
    /// </summary>
    static TraceContext()
    {
        AppDomain.CurrentDomain.FirstChanceException += EnhanceException;
    }

    /// <summary>
    /// Add to the exception dictionary information about caller, arguments, source file and line number raising the exception
    /// </summary>
    /// <param name="sender"></param>
    /// <param name="e"></param>
    private static void EnhanceException(object? sender, FirstChanceExceptionEventArgs e)
    {
        if (!_stack.TryPeek(out var metadata)) return;
        var dict = e.Exception.Data;
        if (dict.IsReadOnly) return;
        dict[nameof(metadata.Arguments)] = Serialize(metadata.Arguments);
        dict[nameof(metadata.MemberName)] = metadata.MemberName;
        dict[nameof(metadata.SourceFilePath)] = metadata.SourceFilePath;
        dict[nameof(metadata.SourceLineNumber)] = metadata.SourceLineNumber;
    }

    /// <summary>
    /// Serialize the name and value of arguments received.
    /// </summary>
    /// <param name="arguments">It is assumed this is an anonymous object</param>
    /// <returns></returns>
    private static string? Serialize(object arguments)
    {
        if (arguments == null) return null;
        var fields = arguments.GetType().GetProperties();
        var result = new Dictionary<string, object>();
        foreach (var field in fields)
        {
            var name = field.Name;
            var value = field.GetValue(arguments);
            result[name] = SafeSerialize(value);
        }
        return JsonSerializer.Serialize(result);
    }

    /// <summary>
    /// This would require most effort, as one would like to serialize different types differently and skip some.
    /// </summary>
    /// <param name="value"></param>
    /// <returns></returns>
    private static string SafeSerialize(object? value)
    {
        // naive implementation
        try
        {
            return JsonSerializer.Serialize(value).Trim('\"');
        }
        catch (Exception ex1)
        {
            try
            {
                return value?.ToString() ?? "";
            }
            catch (Exception ex2)
            {
                return "Serialization error: " + ex1.Message + "/" + ex2.Message;
            }
        }
    }

    /// <summary>
    /// Prepare to enhance any thrown exception with the calling context information
    /// </summary>
    /// <param name="args"></param>
    /// <param name="memberName"></param>
    /// <param name="sourceFilePath"></param>
    /// <param name="sourceLineNumber"></param>
    /// <returns></returns>
    public static IDisposable TraceMethod(object args,
                                            [CallerMemberName] string memberName = "",
                                            [CallerFilePath] string sourceFilePath = "",
                                            [CallerLineNumber] int sourceLineNumber = 0)
    {
        _stack.Push(new MetaData(args, memberName, sourceFilePath, sourceLineNumber));
        return new DisposableWrapper(() =>
        {
            _stack.TryPop(out var _);
        });
    }

    /// <summary>
    /// Just a wrapper over a method which will be called on Dipose
    /// </summary>
    public class DisposableWrapper : IDisposable
    {
        private readonly Action _action;

        public DisposableWrapper(Action action)
        {
            _action = action;
        }

        public void Dispose()
        {
            _action();
        }
    }

    /// <summary>
    /// Holds information about the calling context
    /// </summary>
    public class MetaData
    {
        public object Arguments { get; }
        public string MemberName { get; }
        public string SourceFilePath { get; }
        public int SourceLineNumber { get; }

        public MetaData(object args, string memberName, string sourceFilePath, int sourceLineNumber)
        {
            Arguments = args;
            MemberName = memberName;
            SourceFilePath = sourceFilePath;
            SourceLineNumber = sourceLineNumber;
        }
    }
}

Every call to TraceMethod adds a new MetaData object to a stack and every time the method ends, the stack will pop an item. The static constructor of TraceMethod will have subscribed to the FirstChangeException event of the current application domain and, whenever an exception is thrown (caught or otherwise), its Data dictionary is getting enhanced with:

  • name of the method called
  • source file name
  • source file line number where the exception was thrown.
  • serialized arguments (remember Exceptions need to be serializable, including whatever you put in the Data dictionary, so that is why we serialize it all)

(I have written another post about how .NET uses code attributes to get the first three items of information during build time) 

This way, you get information which would normally be "traced" (detailed logging which is usually detrimental to performance) in any thrown exception, but without filling some trace log or having to change production configuration and reproduce the problem again. Assuming your application does not throw exceptions all over the place, this adds very little complexity to the executed code.

Moreover, this will enhance exception with the source code file name and line number even in Release mode!

I am sure there are some issues with code that might fail and it is not caught in a try/catch and of course the serialization code is where people should put a lot of effort, since different types get to be serialized for inspection differently (think async methods and the like). And more methods should be added so that people trace whatever they like in thrown exceptions. Yet, as I said, this is a POC, so I hope it gets you inspired.

 T-SQL Querying is a very good overview of SQL Server queries, indexing, best practices, optimization and troubleshooting. I can't imagine someone can just read it and be done with it, as it is full of useful references, so it's good to keep it on the table. Also, it's relatively short, so one can peruse it in a day and then keep using it while doing SQL work.

What I didn't like so much was the inconsistent level of knowledge needed for the various chapters. It starts with a tedious explanations of types of queries and what JOINs are and what ORDER BY is and so on, then moves on to the actual interesting stuff. Also, what the hell is that title and cover? :) You'd think it's a gardening book.

Another great thing about it is that it is available free online, from its publishers: Packt.

The need

  I don't know about you, but I've been living with ad blockers from the moment they arrived. Occasionally I get access to a new machine and experience the Internet without an ad blocker and I can't believe how bad it is. A long time ago I had a job where I was going by bike. After two years of not using public transport, I got in a bus and had to get out immediately. It smelled so bad! How had I ever used that before? It's the same thing.

  However, most of the companies we take for granted as pillars of the web need to bombard us with ads and generally push or skew our perceptions in order to make money, so they find ways to obfuscate their web sites, lock them in "apps" that you have no control of or plain manipulate the design of the things we use to write code so that it makes this more difficult.

Continuous war

  Let me give you an example of this arms race. You open your favorite web site and there it is, a garish, blinking, offending, annoying ad that is completely useless. Luckily, you have an ad blocker so you select the ad, see that it's enclosed in a div with class "annoyingAd" and you make a blocking rule for it. Your web site is clean again. But the site owner realizes people are not clicking on the ad anymore, so he dynamically changes the class name to something different every time you open the page. Now, you could try to decipher the JavaScript code that populates the div and write a script to get the correct class, but it would only work for this web site and you would have to know how to code in JavaScript and it would take a lot of effort you don't want to spend. But then you realize that above the horrid thing there is a title "Annoying ad you can't get rid of", so you write a simple thing to get rid of the div that contains a span with that content. Yay!

  At this point you already have some issues. The normal way people block ads is to create a quasi CSS rule for an element. Yet CSS doesn't easily let's you select elements based on the inner text or to select parents of elements with certain characteristics. In part it's a question of performance, but at the same time there is a matter of people who want to obfuscate your web site taking part in the decision process of what should go in CSS. So here, to get the element with a certain content we had to use something that expands normal CSS, like the jQuery syntax or some extra JavaScript. This is, needless to say, already applicable to a low number of people. But suspend your disbelief for a moment.

  Maybe your ad blocker is providing you with custom rules that you can make based on content, or you write a little script or even the ad blocker people write the code for you. So the site owner catches up and he does something: instead of having a span with the title, he puts many little spans, containing just a few letters, some of them hidden visually and filled with garbage, others visible. The title is now something like "Ann"+"xxx"+"oying"+"xxx"+" ad", where all "xxx" texts appear as part of the domain object model (the page's DOM) but they are somehow not visible to the naked eye. Now the inner text of the container is "Annxxxoyingxxx ad", with random letters instead of xxx. Beat that!

  And so it goes. You need to spend knowledge and effort to escalate this war that you might not even win. Facebook is the king of obfuscation, where even the items shared by people are mixed and remixed so that you cannot select them. So what's the solution?

Solution

  At first I wanted to go in the same direction, fight the same war. Let's create a tool that deobfuscates the DOM! Maybe using AI! Something that would, at the end, give me the simplest DOM possible that would create the visual output of the current page and, when I change one element in this simple DOM, it would apply the changes to the corresponding obfuscated DOM. And that IS a solution, if not THE solution, but it is incredibly hard to implement.

  There is another option, though, something that would progressively enhance the existing DOM with information that one could use in a CSS rule. Imagine a small script that, added to any page, would add attributes to elements like this: visibleText="Annoying ad" containingText="Annxxxoingxxx ad" innerText="" positionInPage="78%,30%-middle-right" positionInViewport="78%,5%-top-right". Now you can use a CSS rule for it, because CSS has syntax for attributes equal to, containing, starting or ending with something. This would somewhat slow the page, but not terribly so. One can use it as a one shot (no matter how long it takes, it only runs once) or continuous (where every time an element changes, it would recreate the attributes in it and its parents).

Feedback

  Now, I have not begun development on this yet, I've just had this idea of a domExplainer library that I could make available for everybody. I have to test how it works on difficult web sites like Facebook and try it as a general option in my browser. But I would really appreciate feedback first. What do you think? What else would you add to (or remove from) it? What else would you use it for?

  This blog post is about Table Value Constructors or Row Constructors. While they make intuitive sense and this is how I will present them, they were introduced in Microsoft Sql Server 2008 and because they look like another very old feature, most database developers are not aware of them.

  So let's begin with a simple INSERT statement:

CREATE TABLE Position(X INT, Y INT)

INSERT INTO Position
VALUES (1,1),(1,2),(NULL,NULL),(2,1)

So we create a table and we insert some data using the VALUES expression. This is equivalent to

CREATE TABLE Position(X INT, Y INT)

INSERT INTO Position
SELECT 1,1
UNION ALL
SELECT 1,2
UNION ALL
SELECT NULL,NULL
UNION ALL
SELECT 2,1

I've certainly used this SELECT + UNION ALL construct to generate static data in my SQL statements. Sometimes, because it's such an annoying syntax, I've created a table or table variable and then inserted values into it in order to use data in a structured way. But could we use the VALUES expression in other contexts, not just for INSERT statements? And the answer is: Yes! (in Sql Server 2008 or newer)

Here is an example:

SELECT *
FROM (VALUES(1,1),(1,2),(NULL,NULL),(2,1)) as Position(X,Y)

This is not a disk table, nor is it a table variable, but an expression that will be treated as a table with columns X and Y, of type INT. As in a SELECT/UNION ALL construct, the type of the columns will be determined by the first set of values.

You can see a "real life" example in my previous post on how to solve Sudoku using an SQL statement.

Now, while I've explained how to remember the syntax and purpose of Table Value Constructors, there are differences between the VALUES expression used as a TVC and when used in an INSERT statement.

In an INSERT statement, VALUES is just a way to specify data to add and has been there since the beginning of SQL and therefore is subject to constraints from that era. For example, you cannot add more than 1000 rows in an INSERT/VALUES construct. But you can using an INSERT/SELECT/VALUES construct:

INSERT INTO Positions
VALUES (1,2),
       (1,1),
       -- ... more than 1000 records
       (0,1)

-- Error 10738 is returned

INSERT INTO Positions
SELECT x,y FROM (
VALUES (1,2),
       (1,1),
       -- ... more than 1000 records
       (0,1)
) as P(x,y)

-- Works like a charm

Hope it helps!

and has 0 comments

I got this exception at my work today, a System.ArgumentException with the message "Argument passed in is not serializable.", that I could not quite understand. Where does it come from, since the .NET source repository does not contain the string? How can I fix it?

The stack trace ended up at System.Collection.ListDictionaryInternal.set_Item(Object key, Object value) in a method where, indeed, I was setting a value in a dictionary. But this is not how dictionaries behave! The dictionary in question was the Exception.Data property. It makes sense, because Exception objects are supposed to be serializable, and I was adding a value of type HttpMethod which, even if extremely simple and almost always used as an Enum, it is actually a class of its own which is not serializable!

So, there you have it, always make sure you add serializable objects in an exception's Data dictionary.

But why is this happening? The implementation of the Data property looks like this:

public virtual IDictionary Data { 
  [System.Security.SecuritySafeCritical]
  get {
    if (_data == null)
      if (IsImmutableAgileException(this))
        _data = new EmptyReadOnlyDictionaryInternal();
      else
        _data = new ListDictionaryInternal();
    return _data;
  }
}

Now, EmptyReadOnlyDictionaryInternal is just a dictionary you can't add to. The interesting class is ListDictionaryInternal. Besides being an actual linked list implementation (who does that in anything but C++ classrooms?) it contains this code:

#if FEATURE_SERIALIZATION
  if (!key.GetType().IsSerializable)                 
    throw new ArgumentException(Environment.GetResourceString("Argument_NotSerializable"), "key");                    
  if( (value != null) && (!value.GetType().IsSerializable ) )
    throw new ArgumentException(Environment.GetResourceString("Argument_NotSerializable"), "value");                    
#endif

So both key and value of the Data dictionary property in an Exception instance need to be serializable.

But why didn't I find the string in the source reference? While the Microsoft reference website doesn't seem to support simple string search, it seems Google does NOT index the code GitHub pages either. You have to:

  • manually go to GitHub and search
  • get no results
  • notice that the "Code" section of the results has a question mark instead of a number next to it
  • click on it
  • then it asks you to log in
  • and only then you get results!

So bonus thing: if you are searching for some string in the .NET source code, first of all use the GitHub repo, then make sure you log in when you search.

and has 0 comments

Intro

When learning to code we get to these exercises and tests and katas and interview questions using some array and expecting some magical string or number and you hear they are called algorithms. And they are intellectual, complex, mathematical, abstract, annoying and feel completely random. But when you are actually doing something real, code doesn't look like that at all. It took me years to understand what the problem is and I am going to share that with you today.

The short version is this: If your program logic doesn't look like an algorithm you are probably doing something wrong. Programming katas are simple because they need to be able to check your answers and give an unequivocal result. It's good to know them, but you shouldn't need to know them, because they are not meant for the real world, but for controlled short term experiments. Unless you are going to work for a sorting company, that's a thing.

Now for the long version.

What you expected versus what you get

You get your first job as a developer and your tasks sound like "fix the color of the submit button" and "the report page shows title in the right, move it to the left". And you think "why the hell did I go through those manual Bubble sort algorithms and learned Quicksort partitions if this is what programming looks like?!". The answer is that you will get to a point where your skills will make people feel confident enough to let you design and architect the things you write. Only then the algorithmic thinking will help because you will have decided yourself what the button does and why its color or position are what they are.

When you start designing flows and entire systems and how they click together it helps a lot to see a component as an algorithm: inputs, rules and outputs. "But, Siderite, a button is not either of those!" you will say. And that is true, but also completely irrelevant. Your program logic should not care about a button, but about an input. And now you also see why the summing of distinct array items is a poor substitute for real life problems, because a click on a button is not a value in a properly contained list, but an event. And most programming exercises and even entire computer classes don't treat events as abstract inputs at all.

Lately this has started to change, both in how programming languages look at actions and events as first-class citizens, but also in theoretical and programmatic concepts like observables, streams, functional programming, reactivity, event buses and messaging, microservices, etc. It makes sense to not quite get it when you have not begun to touch these concepts and when everybody and their grandmother focus on the latest frontend framework, rapid application dev tools or extensions to VS Code, but at their very core all of these things are solutions to the same problem, following the same principles.

Breaking reality apart

As you start to climb toward seniority (and that does NOT mean going to Mexico so they call you "señor developer") you learn about Separation of Concerns, as a good strategy to isolate changes, improve readability and testing and ease maintainability and deployment. You learn about writing applications in layers: the UI, the business logic, the database access, etc, which is also about separating concerns. And as you go further and further on that path you realize...

Wait! This business logic thing looks like an algorithm! It abstracts all of its dependencies until all that remains is: inputs, rules, outputs.

But there are things to confound you: events, user input, parallel tasks, race conditions, heavy load use, the cloud. You can use the same tools, though! Abstract everything, separate concerns. What is an event but a signal coming from a source? Your input is the observable source object and the events themselves just values coming in. Or just a method that receives an event object and you handle sending the event someplace else. Everything coming from the user can be handled the same way. Concurrency is solved by maintaining as little internal state as possible and, when absolutely necessary, guarding it against concurrent access via clear established methods, like semaphores and transaction contexts.

Once your logic is clear, your data structured and every external dependency abstracted away, you can run and test every subsystem in isolation. You don't care something is supposed to be a click, or an error, or a network message or on Windows or Linux or how it's deployed or if the database is available and what kind it is, what UI is being used and what it does, where in the world you are and what time it is and so on. Your code is now an algorithm: a set of rules applied on predictable input which can then be tested for an expected output.

A new requirement comes: you change just the part responsible for the requirement. You can write unit tests before or after or test it manually without caring about anything outside that piece of code. A bug is reported: you write a test that reproduces the bug, you change the code, see the test pass and you never had to open a browser or an app or go to some external environment or ask some other team for user access or if you can use the database. How does it sound to be able to code without ever having to manually go through application scenarios?

Of course there will be an ugly user facing piece of code that you will have to write, but it should be minimal. Your logic is sound, almost mathematically provable to be correct, and how you plug it in is irrelevant. Yes, you will have to work with the graphical designer in your team and make it so the nicely colored card slides across the screen, but that is a meaningless process that you play with in complete isolation from your logic. End to end testing is sometimes necessary, but it's a human thing to do, as well. Just check the "feel" of things, how they look, how they move, if it works for you. The only reason why you are going through it is because you have not been able to completely abstract the end user, with their stupid requests and complicated needs and ideas of what beautiful means.

Yet that is beginning to change as well. Artificial Intelligence, of all things, has advanced so far that you can create minimal interfaces using human language requests. "Build me a web page with a list of items that can be scrolled and selected to be displayed in a details pane on the right". I can imagine this can be used in real life only when the logic of the application has already been written and one is able to just plug and play such a monstrosity without much effort, while also being prepared to change the requirements, recreate the entire things in a different way, but plug it in the same.

And there will be some sort of deployment framework, with people deploying stuff and checking stuff, with data in databases or other persistence mediums. Your code logic? Doesn't care.

Imposter syndrome

Does this sound like a pipe dream that a snake oil peddler is trying to sell you? Let me tell you that the only reason you are not working like that now is because someone though it was too complicated and decided to cut corners. And they have been paying for it ever since, as well as you.

The only proven way of solving complex problems is Divide and Rule. Life is complex and real problems, too. Separation of Concerns, Inversion of Control, Domain Boundaries are the tools you use to break any problem into smaller manageable pieces. And that brings us back to interview questions and pointless algorithms.

When you go to a code test, you are the algorithm. They give you some input and an expected output and check to see if your internal rules are up to the task. Of course you could google for an easy solution. More than that, what kind of employee would you be if whenever the boss asked for something you would build it from scratch without seeing what others did? What hubris to believe that you could know the answer better than anyone else without even checking!

Test succeeded

The conclusion of this stream (heh!) of consciousness is that once you realize the algorithmic nature of any problem (once you abstract every interface with reality), you can see the actual value of being proficient in writing one. You might start with sorting and fizzbuzz and other bullcrap like that, but they are just steps on a larger ladder that will eventually make sense, just like learning the letters of the alphabet prepared you to read to the end of this post. Also, if you are trying to get a job as a book editor and the HR person is asking you if you know all the letters of the alphabet, maybe you don't want to work there.

P.S.

The links in this article are important, especially if you are a just beginning your journey as a developer. Check out the concepts there and learn to use them in your life, it will get a whole lot easier!

and has 0 comments

The point of regular expression character classes is to simplify your expressions, but they can introduce subtle bugs or efficiency issues.

Let's check out this StackOverflow answer to question \d less efficient than [0-9]

\d checks all Unicode digits, while [0-9] is limited to these 10 characters. For example, Persian digits, ۱۲۳۴۵۶۷۸۹, are an example of Unicode digits which are matched with \d, but not [0-9].

This makes sense, only it has never occurred to me until this very moment. I would never use a [0-9] notation and I would replace it with a \d if found in code.

What does that mean?

One simple consequence of such a class would be performance: searching for a large list of characters is less efficient. Another would be introducing the possibility for bugs or even malicious attacks. Let's see the code for a calculator that adds two numbers. It's a silly piece of code, but imagine that a more complex one would take the user content and save it into a database, try to process it or display it.

static void Main(string[] args)
{
    Console.InputEncoding = Encoding.Unicode;
    var firstNumber = GetNumberString();
    var secondNumber = GetNumberString();
    Console.WriteLine("Sum = "+(int.Parse(firstNumber) + int.Parse(secondNumber)));
}

private static string GetNumberString()
{
    string result=null;
    var isNumber = false;
    while (!isNumber)
    {
        Console.Write("Enter a number: ");
        result = Console.ReadLine();
        isNumber = Regex.IsMatch(result, @"^\d+$");
        if (!isNumber)
        {
            Console.WriteLine($"{result} is not a number! Try again.");
        }
    }
    return result;
}

This will try to get numbers as a string and test it using the regular expression ^\d+$, which means the string has to consist of one or more digits. Note that I had to set the console input encoding to Unicode in order to be able to paste Persian numbers. This code works fine until I use Arabic or Persian digits, where it breaks in the int.Parse method. Using ^[0-9]$ as the regular expression pattern would solve this issue.

Same issue will occur with \w (warning: \w is letters AND digits) and [a-zA-Z] (or just [a-z] and using RegexOptions.IgnoreCase).

If one uses code to determine the number of matches for each regular expression pattern

var regexPattern = @"\d";
var nr = 0;
for (int i = 0; i < ushort.MaxValue; i++)
{
    string str = Convert.ToChar(i).ToString();
    if (Regex.IsMatch(str, regexPattern))
        nr++;
}
Console.WriteLine(nr);

we get this:

  • for \d : 370
    • ALL digits
  • for \w : 50320
    • ALL word characters (including digits) 
  • for [^\W\d] : 49950
    • ALL word characters, but not the digits 
  • for \p{L} : 48909
    • ALL letters
  • for [A-Za-z] : 52
    • letters from a to z
  • for [0-9] : 10
    • digits from 0 to 9

I hope this helps.

and has 0 comments

Let's start with a simple requirement and the obvious first draft of the code. The requirement is "build a method that receives a string and writes to the console a URL using that string as a parameter".

The obvious first attempt and working in every version of C# would be:

private static void ShowUrl(string something)
{
    var url = "https://siderite.dev/blog/search/formattablestring?param1="+something;
    Console.WriteLine(url);
}

But then you get some problems. You want to use a parameter that has strange characters in it, like an email, another URL, arithmetic symbols, etc. So you realize that you need to use an URL encode class. Next version is:

private static void ShowUrl(string something)
{
    var url = "https://siderite.dev/blog/search/formattablestring?param1="+Uri.EscapeDataString(something);
    Console.WriteLine(url);
}

Now that works for a second. One might argue about the difference between System.Web.HttpUtility.UrlEncode, System.Net.WebUtility.UrlEncode and System.Net.Uri.EscapeDataString. Yet, his is not the post to do that. Follow the above links for more information.

Then someone decides to use a null for the parameter and you get a ArgumentNullException and you rage in frustration "This is complicated and ugly! The first version looked much better and handled nulls, too. And now I have to add this long Uri.EscapeDataString to every URL parameter in every URL building code I write and also check for null!".

But... what if you could write the method just like the first version and get rid of every problem?

First of all, let's get rid of that plus sign and use interpolated strings. They've been around for awhile:

private static void ShowUrl(string something)
{
    var url = $"https://siderite.dev/blog/search/formattablestring?param1={something}";
    Console.WriteLine(url);
}

Now, this has the same problem of the first version: no URL encoding. And you don't want to add an escape function to every parameter (imagine there would be 10 parameters in this URL). How about we use the fact that the URL is an interpolated string?

Check out this code:

private static void ShowUrl(string something)
{
    var url = UrlHelper.Url($"https://siderite.dev/blog/search/formattablestring?param1={something}");
    Console.WriteLine(url);
}

It uses an UrlHelper class to fix the problems with the URL in the string we generate. Or is it a string? I was writing a long time ago a post about FormattableString and how I didn't see a real use scenario for it. Well, this is it! Let's see what UrlHelper looks like:

public static class UrlHelper
{
    public static string Url(FormattableString url)
    {
        return string.Format(
            url.Format, 
            url.GetArguments()
                .Select(a => Uri.EscapeDataString(a?.ToString()??""))
                .ToArray());
    }
}

The Url method receives not a string, but a FormattableString. When an interpolated string is used as a FormattableString, it gives the developer access to a Format string and a list of arguments in GetArguments. In order to convert it to a string, one must only do String.Format(s.Format,url.GetArguments()).  In our case, the Format property will have the value "https://siderite.dev/blog/search/formattablestring?param1={0}" and the arguments list will contain one value: the value of the something parameter.

Therefore, the Url method will be able to format the string, but first URL encode every single parameter in it.

I admit, this might not be the most readable code in the world, which is my pet peeve with FormattableString. Reading the ShowUrl method you have no clue that UrlHelper.Url receives a FormattableString as parameter. One might even look at that string and say "Hey, wait a minute! That's a bug waiting to happen!" and then proceed to fix it. Perhaps the name of the method could be made more intuitive like EncodeUrlParameters.

But wait! We can do better. I don't know how it might help in the future, but perhaps someone might want to process the parameters of the URL further. By returning a string we eliminate some of the information of the incoming parameter and we don't want to do that.

Also, careful people will notice that using String.Format introduces a subtle bug (or feature): we format the string without specifying a culture. This might be fine, using the current one by default, but it might also cause some issues. The FormattableString class does provide the static CurrentCulture and Invariant methods and, of course, the ToString method would work just fine as well, but then we lose the ability to process the parameters.

So here is the final version of the UrlHelper class:

public static class UrlHelper
{
    public static FormattableString EncodeUrlParameters(FormattableString url)
    {
        return FormattableStringFactory.Create(
            url.Format,
            url.GetArguments()
                .Select(a => Uri.EscapeDataString(a?.ToString() ?? ""))
                .ToArray());
    }
}

By using FormattableStringFactory from System.Runtime.CompilerServices we get a FormattableString as an argument and we return one, with parameters URL encoded. Now, if the result of the method is used as a string, it will be automatically handled by ToString and if it will be used as a FormattableString it will contain all the information provided by the original parameters.

Hope that helps!

Bonus time! There's more!

What if you could make this behavior built in by introducing a new type? Methods from, let's say, a REST client class could use FormattableStrings as URL parameters. But what if we could specify the type of the parameters and get the implicit (hint, hint!) result already URL encoded?

public class UrlString
{
    private FormattableString _formattableString;

    public UrlString(FormattableString formattableString)
    {
        _formattableString = formattableString;
    }

    public static implicit operator FormattableString(UrlString urlString)
    {
        if (urlString == null) return null;
        return UrlHelper.EncodeUrlParameters(urlString.ToFormattableString());
    }

    public static implicit operator string(UrlString urlString)
    {
        if (urlString == null) return null;
        return ((FormattableString)urlString).ToString();
    }

    public static implicit operator UrlString(FormattableString formattableString)
    {
        return new UrlString(formattableString);
    }

    private FormattableString ToFormattableString()
    {
        return _formattableString;
    }
}

This UrlString class will implicitly be converted into a FormattableString or a string and a FormattableString will be converted into a UrlString. So now your code might look like this:

// used like this
RestClient.Get($"https://test.com?x={something}");

// this method does not URL encode anything
public RestResponse Get(string url) {
  ...
}

// this method does URL encode everything, just by changing the type
public RestResponse Get(UrlString url) {
  ...
}

Intro

Dependency injection is baked in the ASP.Net Core projects (yes, I still call it Core), but it's missing from console app templates. And while it is easy to add, it's not that clear cut on how to do it. I present here three ways to do it:

  1. The fast one: use the Worker Service template and tweak it to act like a console application
  2. The simple one: use the Console Application template and add dependency injection to it
  3. The hybrid: use the Console Application template and use the same system as in the Worker Service template

Tweak the Worker Service template

It makes sense that if you want a console application you would select the Console Application template when creating a new project, but as mentioned above, it's just the default template, as old as console apps are. Yet there is another default template, called Worker Service, which almost does the same thing, only it has all the dependency injection goodness baked in, just like an ASP.Net Core Web App template.

So start your Visual Studio, add a new project and choose Worker Service:

It will create a project containing a Program.cs, a Worker.cs and an appsettings.json file. Program.cs holds the setup and Worker.cs holds the code to be executed.

Worker.cs has an ExecuteAsync method that logs some stuff every second, but even if we remove the while loop and add our own code, the application doesn't stop. This might be a good thing, as sometimes we just want stuff to work until we press Ctrl-C, but it's not a console app per se.

In order to transform it into something that works just like a console application you need to follow these steps:

  1. inject an IHost instance into your worker
  2. specifically instruct the host to stop whenever your code has finished

So, you go from:

public class Worker : BackgroundService
{
    private readonly ILogger<Worker> _logger;

    public Worker(ILogger<Worker> logger)
    {
        _logger = logger;
    }

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        while (!stoppingToken.IsCancellationRequested)
        {
            _logger.LogInformation("Worker running at: {time}", DateTimeOffset.Now);
            await Task.Delay(1000, stoppingToken);
        }
    }
}

to:

public class Worker : BackgroundService
{
    private readonly ILogger<Worker> _logger;
    private readonly IHost _host;

    public Worker(ILogger<Worker> logger, IHost host)
    {
        _logger = logger;
        _host = host;
    }

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        Console.WriteLine("Hello world!");
        _host.StopAsync();
    }
}

Note that I did not "await" the StopAsync method because I don't actually need to. You are telling the host to stop and it will do it whenever it will see fit.

If we look into the Program.cs code we will see this:

public class Program
{
    public static void Main(string[] args)
    {
        CreateHostBuilder(args).Build().Run();
    }

    public static IHostBuilder CreateHostBuilder(string[] args) =>
        Host.CreateDefaultBuilder(args)
            .ConfigureServices((hostContext, services) =>
            {
                services.AddHostedService<Worker>();
            });
}

I don't know why they bothered with creating a new method and then writing it as an expression body, but that's the template. You see that there is a lambda adding dependencies (by default just the Worker class), but everything starts with Host.CreateDefaultBuilder. In .NET source code, this leads to HostingHostBuilderExtensions.ConfigureDefaults, which adds a lot of stuff:

  • environment variables to config
  • command line arguments to config (via CommandLineConfigurationSource)
  • support for appsettings.json configuration
  • logging based on operating system

That is why, if you want these things by default, your best bet is to tweak the Worker Service template

Add Dependency Injection to an existing console application

Some people want to have total control over what their code is doing. Or maybe you already have a console application doing stuff and you just want to add Dependency Injection. In that case, these are the steps you want to follow:

  1. create a ServiceCollection instance (needs a dependency to Microsoft.Extensions.DependencyInjection)
  2. register all dependencies you want to use to it
  3. create a starting class that will execute stuff (just like Worker above)
  4. register starting class in the service collection
  5. build a service provider from the collection
  6. get an instance of the starting class and execute its one method

Here is an example:

class Program
{
    static void Main(string[] args)
    {
        var services = new ServiceCollection();
        ConfigureServices(services);
        services
            .AddSingleton<Executor,Executor>()
            .BuildServiceProvider()
            .GetService<Executor>()
            .Execute();
    }

    private static void ConfigureServices(IServiceCollection services)
    {
        services
            .AddSingleton<ITest, Test>();
    }
}

public class Executor
{
    private readonly ITest _test;

    public Executor(ITest test)
    {
        _test = test;
    }

    public void Execute()
    {
        _test.DoSomething();
    }
}

The only reason we register the Executor class is in order to get an instance of it later, complete with constructor injection support. You can even make Execute an async method, so you can get full async/await support. Of course, for this example appsettings.json configuration or logging won't work until you add them.

Mix them up

Of course, one could try to get the best of both worlds. This would work kind of like this:

  1. use Host.CreateDefaultBuilder() anyway (needs a dependency to Microsoft.Extensions.Hosting), but in a normal console app
  2. use the resulting service provider to instantiate a starting class
  3. start it

Here is an example:

class Program
{
    static void Main(string[] args)
    {
        Host.CreateDefaultBuilder()
            .ConfigureServices(ConfigureServices)
            .ConfigureServices(services => services.AddSingleton<Executor>())
            .Build()
            .Services
            .GetService<Executor>()
            .Execute();
    }

    private static void ConfigureServices(HostBuilderContext hostContext, IServiceCollection services)
    {
        services.AddSingleton<ITest,Test>();
    }
}

The Executor class would be just like in the section above, but now you get all the default logging and configuration options from the Worker Service section.

Conclusion

What the quickest and best solution is depends on your situation. One could just as well start with a Worker Service template, then tweak it to never Run the builder and instead configure it as above. One can create a Startup class, complete with Configure and ConfigureServices as in an ASP.Net Core Web App template or even start with such a template, then tweak it to work as a console app/web hybrid. In .NET Core everything is a console app anyway, it's just depends on which packages you load and how you configure them.

and has 0 comments

  We have been told again and again that our code should be split into smaller code blocks that are easy to understand and read. The usual example is a linear large method that can easily be split into smaller methods that just execute one after the other. But in real life just selecting a logical bit of our method and applying an Extract Method refactoring doesn't often work because code rarely has a linear flow.

  Let's take a classic example of "exiting early", another software pattern that I personally enjoy. It says that you extract the functionality of retrieving some data into a method and, inside it, return from the method as soon as you fail a necessary condition for extracting that data. Something like this:

public Order OrderCakeForClient(int clientId) {
  var client = _clientRepo.GetClient(clientId);
  if (client == null) {
    _logger.Log("Client doesn't exist");
    return null;
  }
  var preferences = _clientRepo.GetPreferences(client);
  if (preferences == null) {
    _logger.Log("Client has not submitted preferences");
    return null;
  }
  var existingOrder = _orderRepo.GetOrder(client, preferences);
  if (existingOrder != null) {
    _logger.Log("Client has already ordered their cake");
    return existingOrder;
  }
  var cake = _inventoryRepo.GetCakeByPreferences(preferences);
  if (cake == null) {
    cake = MakeNewCake(preferences);
  }
  if (cake == null) {
    _logger.Log("Cannot make cake as requested by client");
    return null;
  }
  var order = _orderRepo.GenerateOrder(client, cake);
  if (order == null) {
    _logger.Log("Generating order failed");
    throw new OrderFailedException();
  }
  return order;
}

This is a long method that appears to be linear, but in fact it is not. At every step there is the opportunity to exit the method therefore this is like a very thin tree, not a line. And it is a contrived example. A real life flow would have multiple decision points, branching and then merging back, with asynchronous logic and so on and so on.

If you want to split it into smaller methods you will have some issues:

  • the most of the code is not actual logic code, but logging code - one could make a method called getPreferences(client), but it would just proxy the _clientRepo method and gain nothing
  • one could try to extract a method that gets the preferences, logs and decides to exit - one cannot do that directly, because you cannot make an exit decision in a child method

Here is a way that this could work. I personally like it, but as you will see later on, it's not sufficient. Let's encode the exit decision as a boolean and get the information we want as out parameters. Then the code would look like this:

public Order OrderCakeForClient(int clientId) {
  if clientExists(clientId, out Client client)
    && clientSubmittedPreferences(client, out Preferences preferences)
    && orderDoesntExist(client, preferences, out Order existingOrder)
    && getACake(preferences, out Cake cake) {
    return generateOrder(client,cake);
  }
  return existingOrder;
}

private bool clientExists(clientId, out Client client) {
  client = _clientRepo.GetClient(clientId);
  if (client == null) {
    _logger.Log("Client doesn't exist");
    return false;
  }
  return true; 
}

private bool clientSubmittedPreferences(Client client, out Preferences preferences) {
  preferences = _clientRepo.GetPreferences(client);
  if (preferences == null) {
    _logger.Log("Client has not submitted preferences");
    return false;
  }
  return true;
}

private bool orderDoesntExist(Client client, Preferences preferences,
                               out Order existingOrder) {
  existingOrder = _orderRepo.GetOrder(client, preferences);
  if (existingOrder != null) {
    _logger.Log("Client has already ordered their cake");
    return false;
  }
  return true;
}

private bool getACake(Preferences preferences, out Cake cake)
  cake = _inventoryRepo.GetCakeByPreferences(preferences);
  if (cake == null) {
    cake = MakeNewCake(preferences);
  }
  if (cake == null) {
    _logger.Log("Cannot make cake as requested by client");
    return false;
  }
  return true;
}

private Order generateOrder(Client client, Cake cake)
  var order = _orderRepo.GenerateOrder(client, cake);
  if (order == null) {
    _logger.Log("Generating order failed");
    throw new OrderFailedException();
  }
  return order;
}

Now reading OrderCakeForClient is a joy! Other than the kind of weird assignment of existingOrder in a condition method, which is valid in C# 7, but not that easy to read, one can grasp the logic of the method from a single glance as a process that only runs if four conditions are met.

However, there are some problems here. One of them is that the out syntax only works for non-async methods and it is generally frowned upon these days. The method assumes a cake can just be made instantly, when we know it takes time, care, effort and kitchen cleaning. Also, modern code rarely features synchronous database access.

So how do we fix it? The classic solution for getting rid of the out syntax is to create custom return types that contain everything the method returns. Let's try to rewrite the method with async in mind:

public async Task<Order> OrderCakeForClient(int clientId) {
  var clientExistsResult = await clientExists(clientId);
  if (!clientExistsResult.Success) return null;
  var orderDoesntExistResult = await orderDoesntExist(client, preferences)
  if (!orderDoesntExistResult.Success) return orderDoesntExistResult.Order;
  var getACakeResult = await getACake(preferences);
  if (!getACakeResult.Success) return null;
  return generateOrder(client,cake);
}

private async Task<ClientExistsResult> clientExists(clientId) {
  client = await _clientRepo.GetClient(clientId);
  if (client == null) {
    _logger.Log("Client doesn't exist");
    return new ClientExistsResult { Success = false };
  }
  return new ClientExistsResult { Client = client, Success = false }; 
}

// the rest ommitted for brevity

Wait... this doesn't look good at all! We wrote a ton of extra code and we got... something that is more similar to the original code than the easy to read one. What happened? Several things:

  • because we couldn't return a bool, he had to revert to early exiting - not bad in itself, but adds code that feels to duplicate the logic in the condition methods
  • we added a return type for each condition method - extra code that provides no extra functionality, not to mention ugly
  • already returning bool values was awkward, returning these result objects is a lot more so
  • the only reason why we needed to return a result value (bool or something more complex) was to move the logging behavior in the smaller methods

"Step aside!" the software architect will say and quickly draw on the whiteboard a workflow framework that will allow us to rewrite any flow as a combination of classes that can be configured in XML. He is dragged away from the room, spitting and shouting.

The problem in the code is that we want to do three things at the same time:

  1. write a simple, easy to read, flow-like code
  2. make decisions based on partial results of that code
  3. store the partial results of that code

There is a common software pattern that is normally used for flow like code: the builder pattern. However, it feels unintuitive to use it here. First of all, the builder pattern needs some masterful code writing to make it work with async/await. Plus, most people, when thinking about a builder, they think of a separate reusable class that handles logic that is independent from the place where it is used. But it doesn't need to be so. Let's rewrite this code using a builder like logic:

public async Task<Order> OrderCakeForClient(int clientId)
{
    SetClientId(clientId);
    await IfClientExists();
    await IfClientSubmittedPreferences();
    await IfOrderDoesntExist();
    await IfGotACake();
    return await GenerateOrder();
}

private bool _continueFlow = true;
private int _clientId;
private Client _client;
private Preferences _preferences;
private Order _existingOrder;
private Cake _cake;

public void SetClientId(int clientId)
{
    _clientId = clientId;
    _existingOrder = null;
    _continueFlow = true;
}

public async Task IfClientExists()
{
    if (!_continueFlow) return;
    _client = await _clientRepo.GetClient(_clientId);
    test(_client != null, "Client doesn't exist");
}

public async Task IfClientSubmittedPreferences()
{
    if (!_continueFlow) return;
    _preferences = await _clientRepo.GetPreferences(_client);
    test(_preferences != null, "Client has not submitted preferences");
}

public async Task IfOrderDoesntExist()
{
    if (!_continueFlow) return;
    _existingOrder = await _orderRepo.GetOrder(_client, _preferences);
    test(_existingOrder == null, "Client has already ordered their cake");
}

public async Task IfGotACake()
{
    if (!_continueFlow) return;
    _cake = await _inventoryRepo.GetCakeByPreferences(_preferences)
        ?? await MakeNewCake(_preferences);
    test(_cake != null, "Cannot make cake as requested by client");
}

public async Task<Order> GenerateOrder()
{
    if (!_continueFlow) return _existingOrder;
    var order = await _orderRepo.GenerateOrder(_client, _cake);
    if (order == null)
    {
        _logger.Log("Generating order failed");
        throw new OrderFailedException();
    }
    return order;
}

private void test(bool condition, string logText)
{
    if (!condition)
    {
        _continueFlow = false;
        _logger.Log(logText);
    }
}

No need for a new builder class, just add builder like methods to the existing class. No need to return the instance of the class, since it's only used inside it. No need for chaining because we don't return anything and chaining with async is a nightmare.

The ugly secret is, of course, using fields of the class instead of local variables. That may add other problems, like multithread safety issues, but those are not in the scope of this post.

I've shown a pretty stupid piece of code that anyway was difficult to properly split into smaller methods. I've demonstrated several ways to do that, the choice of which depends on the actual requirements of your code. Hope that helps!

  Every month or so I see another article posted by some dev, usually with a catchy title using words like "demystifying" or "understanding" or "N array methods you should be using" or "simplify your Javascript" or something similar. It has become so mundane and boring that it makes me mad someone is still trying to cache on these tired ideas to try to appear smart. So stop doing it! There is no need to explain methods that were introduced in 2009!

  But it gets worse. These articles are partially misleading because Javascript has evolved past the need to receive or return data as arrays. Let me demystify the hell out of you.

  First of all, the methods we are discussing here are .filter and .map. There is of course .reduce, but that one doesn't necessarily return an array. Ironically, one can write both .filter and .map as a reduce function, so fix that one and you can get far. There is also .sort, which for performance reasons works a bit differently and returns nothing, so it cannot be chained as the others can. All of these methods from the Array object have something in common: they receive functions as parameters that are then applied to all of the items in the array. Read that again: all of the items.

  Having functions as first class citizens of the language has always been the case for Javascript, so that's not a great new thing to teach developers. And now, with arrow functions, these methods are even easier to use because there are no scope issues that caused so many hidden errors in the past.

  Let's take a common use example for these methods for data display. You have many data records that need to be displayed. You have to first filter them using some search parameters, then you have to order them so you can take just a maximum of n records to display on a page. Because what you display is not necessarily what you have as a data source, you also apply a transformation function before returning something. The code would look like this:

var colors = [
  {    name: 'red',    R: 255,    G: 0,    B: 0  },
  {    name: 'blue',   R: 0,      G: 0,    B: 255  },
  {    name: 'green',  R: 0,      G: 255,  B: 0  },
  {    name: 'pink',   R: 255,    G: 128,  B: 128  }
];

// it would be more efficient to get the reddish colors in an array
// and sort only those, but we want to discuss chaining array methods
colors.sort((c1, c2) => c1.name > c2.name ? 1 : (c1.name < c2.name ? -1 : 0));

const result = colors
  .filter(c => c.R > c.G && c.R > c.B)
  .slice(page * pageSize, (page + 1) * pageSize)
  .map(c => ({
      name: c.name,
      color: `#${hex(c.R)}${hex(c.G)}${hex(c.B)}`
  }));

This code takes a bunch of colors that have RGB values and a name and returns a page (defined by page and pageSize) of the colors that are "reddish" (more red than blue and green) order by name. The resulting objects have a name and an HTML color string.

This works for an array of four elements, it works fine for arrays of thousands of elements, too, but let's look at what it is doing:

  • we pushed the sort up, thus sorting all colors in order to get the nice syntax at the end, rather than sorting just the reddish colors
  • we filtered all colors, even if we needed just pageSize elements
  • we created an array at every step (three times), even if we only needed one with a max size of pageSize

Let's write this in a classical way, with loops, to see how it works:

const result = [];
let i=0;
for (const c of colors) {
	if (c.R<c.G || c.R<c.B) continue;
	i++;
	if (i<page*pageSize) continue;
	result.push({
      name: c.name,
      color: `#${hex(c.R)}${hex(c.G)}${hex(c.B)}`
    });
	if (result.length>=pageSize) break;
}

And it does this:

  • it iterates through the colors array, but it has an exit condition
  • it ignores not reddish colors
  • it ignores the colors of previous pages, but without storing them anywhere
  • it stores the reddish colors in the result as their transformed version directly
  • it exits the loop if the result is the size of a page, thus only going through (page+1)*pageSize loops

No extra arrays, no extra iterations, only some ugly ass code. But what if we could write this as nicely as in the first example and make it work as efficiently as the second? Because of ECMAScript 6 we actually can!

Take a look at this:

const result = Enumerable.from(colors)
  .where(c => c.R > c.G && c.R > c.B)
  //.orderBy(c => c.name)
  .skip(page * pageSize)
  .take(pageSize)
  .select(c => ({
      name: c.name,
      color: `#${hex(c.R)}${hex(c.G)}${hex(c.B)}`
  }))
  .toArray();

What is this Enumerable thing? It's a class I made to encapsulate the methods .where, .skip, .take and .select and will examine it later. Why these names? Because they mirror similar method names in LINQ (Language Integrated Queries from .NET) and because I wanted to clearly separate them from the array methods.

How does it all work? If you look at the "classical" version of the code you see the new for..of loop introduced in ES6. It uses the concept of "iterable" to go through all of the elements it contains. An array is an iterable, but so is a generator function, also an ES6 construct. A generator function is a function that generates values as it is iterated, the advantage being that it doesn't need to hold all of the items in memory (like an array) and any operation that needs doing on the values is done only on the ones requested by code.

Here is what the code above does:

  • it creates an Enumerable wrapper over array (performs no operation, just assignments)
  • it filters by defining a generator function that only returns reddish colors (but performs no operation) and returns an Enumerable wrapper over the function
  • it ignores the items from previous pages by defining a generator function that counts items and only returns items after the specified number (again, no operation) and returns an Enumerable wrapper over the function
  • it then takes a page full of items, stopping immediately after, by defining a generator function that does that (no operation) and returns an Enumerable wrapper over the function
  • it transforms the colors in output items by defining a generator function that iterates existing items and returns the transformed values (no operation) and returns an Enumerable wrapper over the function
  • it iterates the generator function in the current Enumerable and fills an array with the values (all the operations are performed here)

And here is the flow for each item:

  1. .toArray enumerates the generator function of .select
  2. .select enumerates the generator function of .take
  3. .take enumerates the generator function of .skip
  4. .skip enumerates the generator function of .where
  5. .where enumerates the generator function that iterates over the colors array
  6. the first color is red, which is reddish, so .where "yields" it, it passes as the next item in the iteration
  7. the page is 0, let's say, so .skip has nothing to skip, it yields the color
  8. .take still has pageSize items to take, let's assume 20, so it yields the color
  9. .select yields the color transformed for output
  10. .toArray pushes the color in the result
  11. go to 1.

If for some reason you would only need the first item, not the entire page (imagine using a .first method instead of .toArray) only the steps from 1. to 10. would be executed. No extra arrays, no extra filtering, mapping or assigning.

Am I trying too hard to seem smart? Well, imagine that there are three million colors, a third of them are reddish. The first code would create an array of a million items, by iterating and checking all three million colors, then take a page slice from that (another array, however small), then create another array of mapped objects. This code? It is the equivalent of the classical one, but with extreme readability and ease of use.

OK, what is that .orderBy thing that I commented out? It's a possible method that orders items online, as they come, at the moment of execution (so when .toArray is executed). It is too complex for this blog post, but there is a full implementation of Enumerable that I wrote containing everything you will ever need. In that case .orderBy would only order the minimal number of items required to extract the page ((page+1) * pageSize). The implementation can use custom sorting algorithms that take into account .take and .skip operators, just like in LiNQer.

The purpose of this post was to raise awareness on how Javascript evolved and on how we can write code that is both readable AND efficient.

One actually doesn't need an Enumerable wrapper, and can add the methods to the prototype of all generator functions, as well (see LINQ-like functions in JavaScript with deferred execution). As you can see, this was written 5 years ago, and still people "teach" others that .filter and .map are the Javascript equivalents of .Where and .Select from .NET. NO, they are NOT!

The immense advantage for using a dedicated object is that you can store information for each operator and use it in other operators to optimize things even further (like for orderBy). All code is in one place, it can be unit tested and refined to perfection, while the code using it remains the same.

Here is the code for the simplified Enumerable object used for this post:

class Enumerable {
  constructor(generator) {
	this.generator = generator || function* () { };
  }

  static from(arr) {
	return new Enumerable(arr[Symbol.iterator].bind(arr));
  }

  where(condition) {
    const generator = this.generator();
    const gen = function* () {
      let index = 0;
      for (const item of generator) {
        if (condition(item, index)) {
          yield item;
        }
        index++;
      }
    };
    return new Enumerable(gen);
  }

  take(nr) {
    const generator = this.generator();
    const gen = function* () {
      let nrLeft = nr;
      for (const item of generator) {
        if (nrLeft > 0) {
          yield item;
          nrLeft--;
        }
        if (nrLeft <= 0) {
          break;
        }
      }
    };
    return new Enumerable(gen);
  }

  skip(nr) {
    const generator = this.generator();
    const gen = function* () {
      let nrLeft = nr;
      for (const item of generator) {
        if (nrLeft > 0) {
          nrLeft--;
        } else {
          yield item;
        }
      }
    };
    return new Enumerable(gen);
  }

  select(transform) {
    const generator = this.generator();
    const gen = function* () {
      for (const item of generator) {
		yield transform(item);
      }
    };
    return new Enumerable(gen);
  }

  toArray() {
	return Array.from(this.generator());
  }
}

The post is filled with links and for whatever you don't understand from the post, I urge you to search and learn.

I like Dave Farley's video channel about the software industry quality, Continuous Deployment, so I will share this video about how tech interviews should be like. Not a step by step tutorial as much as asking what is the actual purpose of a tech interview and how to get to achieve it:

[youtube:osnOY5zgdMI]

Top Software Engineering Interview Tips - Dave Farley