Lab Girl should have been the kind of book I like: a deeply personal autobiography. Hope Jahren writes well, also, and in 14 chapters goes through about 20 years of her life, from the moment she decided she would be a scientist to the moment when she was actually accepted as a full professor by academia. She talks about her Norwegian family education, about the tough mother that never gave her the kind of love she yearned for, she talks about misogyny in science, about deep feelings for her friends, she talks about her bipolar disorder and her pregnancy. Between chapters she interposes a short story about plants, mostly trees, as metaphors for personal growth. And she is an introvert who works and is best friends with a guy who is even more an introvert than she is. What is not to like?

And the truth is that I did like the book, yet I couldn't empathize with her "character". Each chapter is almost self contained, there is no continuity and instead of feeling one with the writer I was getting the impression that she overthinks stuff and everything I read is a memory of a memory of a thought. I also felt there was little science in a book written by someone who loves science, although objectively there is plenty of stuff to rummage through. Perhaps I am not a plant person.

The bottom line is that I was expecting someone autopsying their daily life, not paper wrapping disjointed events that marked their life in general. As it usually is with expectations, I felt a bit disappointed when the author had other plans with her book. It does talk about deep feelings, but I was more interested in the actual events than the internal projection of them. However if you are the kind of person who likes the emotional lens on life, you will probably like the book more than I did.

I am going to discuss in this post an interview question that pops up from time to time. The solution that is usually presented as best is the same, regardless of the inputs. I believe this to be a mistake. Let me explore this with you.

The problem



The problem is simple: given two sorted arrays of very large size, find the most efficient way to compute their intersection (the list of common items in both).

The solution that is given as correct is described here (you will have to excuse its Javiness), for example. The person who provided the answer made a great effort to list various solutions and list their O complexity and the answer inspires confidence, as coming from one who knows what they are talking about. But how correct is it? Another blog post describing the problem and hinting on some extra information that might influence the result is here.

Implementation


Let's start with some code:

var rnd = new Random();
var n = 100000000;
int[] arr1, arr2;
generateArrays(rnd, n, out arr1, out arr2);
var sw = new Stopwatch();
sw.Start();
var count = intersect(arr1, arr2).Count();
sw.Stop();
Console.WriteLine($"{count} intersections in {sw.ElapsedMilliseconds}ms");

Here I am creating two arrays of size n, using a generateArrays method, then I am counting the number of intersections and displaying the time elapsed. In the intersect method I will also count the number of comparisons, so that we avoid for now the complexities of Big O notation (pardon the pun).

As for the generateArrays method, I will use a simple incremented value to make sure the values are sorted, but also randomly generated:

private static void generateArrays(Random rnd, int n, out int[] arr1, out int[] arr2)
{
    arr1 = new int[n];
    arr2 = new int[n];
    int s1 = 0;
    int s2 = 0;
    for (var i = 0; i < n; i++)
    {
        s1 += rnd.Next(1, 100);
        arr1[i] = s1;
        s2 += rnd.Next(1, 100);
        arr2[i] = s2;
    }
}


Note that n is 1e+7, so that the values fit into an integer. If you try a larger value it will overflow and result in negative values, so the array would not be sorted.

Time to explore ways of intersecting the arrays. Let's start with the recommended implementation:

private static IEnumerable<int> intersect(int[] arr1, int[] arr2)
{
    var p1 = 0;
    var p2 = 0;
    var comparisons = 0;
    while (p1<arr1.Length && p2<arr2.Length)
    {
        var v1 = arr1[p1];
        var v2 = arr2[p2];
        comparisons++;
        switch(v1.CompareTo(v2))
        {
            case -1:
                p1++;
                break;
            case 0:
                p1++;
                p2++;
                yield return v1;
                break;
            case 1:
                p2++;
                break;
        }

    }
    Console.WriteLine($"{comparisons} comparisons");
}


Note that I am not counting the comparisons of the two pointers p1 and p2 with the Length of the arrays, which can be optimized by caching the length. They are just as resource using as comparing the array values, yet we discount them in the name of calculating a fictitious growth rate complexity. I am going to do that in the future as well. The optimization of the code itself is not part of the post.

Running the code I get the following output:

19797934 comparisons
199292 intersections in 832ms


The number of comparisons is directly proportional with the value of n, approximately 2n. That is because we look for all the values in both arrays. If we populate the values with odd and even numbers, for example, so no intersections, the number of comparisons will be exactly 2n.

Experiments


Now let me change the intersect method, make it more general:

private static IEnumerable<int> intersect(int[] arr1, int[] arr2)
{
    var p1 = 0;
    var p2 = 0;
    var comparisons = 0;
    while (p1 < arr1.Length && p2 < arr2.Length)
    {
        var v1 = arr1[p1];
        var v2 = arr2[p2];
        comparisons++;
        switch (v1.CompareTo(v2))
        {
            case -1:
                p1 = findIndex(arr1, v2, p1, ref comparisons);
                break;
            case 0:
                p1++;
                p2++;
                yield return v1;
                break;
            case 1:
                p2 = findIndex(arr2, v1, p2, ref comparisons);
                break;
        }

    }
    Console.WriteLine($"{comparisons} comparisons");
}

private static int findIndex(int[] arr, int v, int p, ref int comparisons)
{
    p++;
    while (p < arr.Length)
    {
        comparisons++;
        if (arr[p] >= v) break;
        p++;
    }
    return p;
}

Here I've replaced the increment of the pointers with a findIndex method that keeps incrementing the value of the pointer until the end of the array is reached or a value larger or equal with the one we are searching for was found. The functionality of the method remains the same, since the same effect would have been achieved by the main loop. But now we are free to try to tweak the findIndex method to obtain better results. But before we do that, I am going to P-hack the shit out of this science and generate the arrays differently.

Here is a method of generating two arrays that are different because all of the elements of the first are smaller than the those of the second. At the very end we put a single element that is equal, for the fun of it.

private static void generateArrays(Random rnd, int n, out int[] arr1, out int[] arr2)
{
    arr1 = new int[n];
    arr2 = new int[n];
    for (var i = 0; i < n - 1; i++)
    {
        arr1[i] = i;
        arr2[i] = i + n;
    }
    arr1[n - 1] = n * 3;
    arr2[n - 1] = n * 3;
}


This is the worst case scenario for the algorithm and the value of comparisons is promptly 2n. But what if we would use binary search (what in the StackOverflow answer was dismissed as having O(n*log n) complexity instead of O(n)?) Well, then... the output becomes

49 comparisons
1 intersections in 67ms

Here is the code for the findIndex method that would do that:

private static int findIndex(int[] arr, int v, int p, ref int comparisons)
{
    var start = p + 1;
    var end = arr.Length - 1;
    if (start > end) return start;
    while (true)
    {
        var mid = (start + end) / 2;
        var val = arr[mid];
        if (mid == start)
        {
            comparisons++;
            return val < v ? mid + 1 : mid;
        }
        comparisons++;
        switch (val.CompareTo(v))
        {
            case -1:
                start = mid + 1;
                break;
            case 0:
                return mid;
            case 1:
                end = mid - 1;
                break;
        }
    }
}


49 comparisons is smack on the value of 2*log2(n). Yeah, sure, the data we used was doctored, so let's return to the randomly generated one. In that case, the number of comparisons grows horribly:

304091112 comparisons
199712 intersections in 5095ms

which is larger than n*log2(n).

Why does that happen? Because in the randomly generated data the binary search find its worst case scenario: trying to find the first value. It divides the problem efficiently, but it still has to go through all the data to reach the first element. Surely we can't use this for a general scenario, even if it is fantastic for one specific case. And here is my qualm with the O notation: without specifying the type of input, the solution is just probabilistically the best. Is it?

Let's compare the results so far. We have three ways of generating data: randomly with increments from 1 to 100, odds and evens, small and large values. Then we have two ways of computing the next index to compare: linear and binary search. The approximate numbers of comparisons are as follows:

RandomOddsEvensSmallLarge

Linear 2n 2n 2n
Binary search 3/2*n*log(n) 2*n*log(n) 2*log(n)

Alternatives


Can we create a hybrid findIndex that would have the best of both worlds? I will certainly try. Here is one possible solution:

private static int findIndex(int[] arr, int v, int p, ref int comparisons)
{
    var inc = 1;
    while (true)
    {
        if (p + inc >= arr.Length) inc = 1;
        if (p + inc >= arr.Length) return arr.Length;
        comparisons++;
        switch(arr[p+inc].CompareTo(v))
        {
            case -1:
                p += inc;
                inc *= 2;
                break;
            case 0:
                return p + inc;
            case 1:
                if (inc == 1) return p + inc;
                inc /= 2;
                break;
        }
    }
}


What am I doing here? If I find the value, I return the index; if the value is smaller, not only do I advance the index, but I also increase the speed of the next advance; if the value is larger, then I slow down until I get to 1 again. Warning: I do not claim that this is the optimal algorithm, this is just something that was annoying me and I had to explore it.

OK. Let's see some results. I will decrease the value of n even more, to a million. Then I will generate the values with random increases of up to 10, 100 and 1000. Let's see all of it in action! This time is the actual count of comparisons (in millions):

Random10Random100Random1000OddsEvensSmallLarge

Linear 2 2 2 2 2
Binary search 30 30 30 40 0.00004
Accelerated search 3.4 3.9 3.9 4 0.0002


So for the general cases, the increase in comparisons is at most twice, while for specific cases the decrease can be four orders of magnitude!

Conclusions


Because I had all of this in my head, I made a fool of myself at a job interview. I couldn't reason all of the things I wrote here in a few minutes and so I had to clear my head by composing this long monstrosity.

Is the best solution the one in O(n)? Most of the time. The algorithm is simple, no hidden comparisons, one can understand why it would be universally touted as a good solution. But it's not the best in every case. I have demonstrated here that I can minimize the extra comparisons in standard scenarios and get immense improvements for specific inputs, like arrays that have chunks of elements smaller than the next value in the other array. I would also risk saying that this findIndex version is adaptive to the conditions at hand with improbable scenarios as worst cases. It works reasonable well for normally distributed arrays, it does wonders for "chunky" arrays (in this is included the case when one array is much smaller than the other) and thus is a contender for some kinds of uses.

What I wanted to explore and now express is that finding the upper growth rate of an algorithm is just part of the story. Sometimes the best implementation fails for not adapting to the real input data. I will say this, though, for the default algorithm: it works with IEnumerables, since it never needs to jump forward over some elements. This intuitively gives me reason to believe that it could be optimized using the array/list indexing. Here it is, in IEnumerable fashion:

private static IEnumerable<int> intersect(IEnumerable<int> arr1, IEnumerable<int> arr2)
{
    var e1 = arr1.GetEnumerator();
    var e2 = arr2.GetEnumerator();
    var loop = e1.MoveNext() && e2.MoveNext();
    while (loop)
    {
        var v1 = e1.Current;
        var v2 = e2.Current;
        switch (v1.CompareTo(v2))
        {
            case -1:
                loop = e1.MoveNext();
                break;
            case 0:
                loop = e1.MoveNext() && e2.MoveNext();
                yield return v1;
                break;
            case 1:
                loop = e2.MoveNext();
                break;
        }

    }
}

Extra work


The source code for a project that tests my various ideas can be found on GitHub. There you can find the following algorithms:

  • Standard - the O(m+n) one described above
  • Reverse - same, but starting from the end of the arrays
  • Binary Search - looks for values in the other array using binary search. Complexity O(m*log(n))
  • Smart Choice - when m*log(n)<m+n, it uses the binary search, otherwise the standard one
  • Accelerating - the one that speeds up when looking for values
  • Divide et Impera - recursive algorithm that splits arrays by choosing the middle value of one and binary searching it in the other. Due to the complexity of the recursiveness, it can't be taken seriously, but sometimes gives surprising results
  • Middle out - it takes the middle value of one array and binary searches it in the other, then uses Standard and Reverse on the resulting arrays
  • Pair search - I had high hopes for this, as it looks two positions in front instead of one. Really good for some cases, though generally it is a bit more than Standard


The testing tool takes all algorithms and runs them on randomly generated arrays:

  1. Lengths m and n are chosen randomly from 1 to 1e+6
  2. A random number s of up to 100 "spikes" is chosen
  3. m and n are split into s+1 equal parts
  4. For each spike a random integer range is chosen and filled with random integer values
  5. At the end, the rest of the list is filled with any random values

Results


For really small first array, the Binary Search is king. For equal size arrays, usually the Standard algorithm wins. However there are plenty of cases when Divide et Impera and Pair Search win - usually not by much. Sometimes it happens that Accelerating Search is better than Standard, but Pair Search wins! I still have the nagging feeling that Pair Search can be improved. I feel it in my gut! However I have so many other things to do for me to dwell on this.

Maybe one of you can find the solution! Your mission, should you choose to accept it, is to find a better algorithm for intersecting sorted arrays than the boring standard one.

and has 1 comment
While reading the book Introduction to Algorithms, Third Edition, by Thomas H. Cormen and Charles E. Leiserson, I found a little gem about simultaneously finding the minimum and maximum value in an array in 3*n/2 comparisons instead of the usual 2n. The trick is to take two numbers at a time, compare them with each other and only then compare the smallest one with the minimum and the largest with the maximum.

So instead of:
var min=int.MaxValue;
var max=int.MinValue;
for (var i=0; i<arr.Length; i++) {
var val=arr[i];
if (val>max) max=val;
if (val<min) min=val;
}
you can use this:
var min=int.MaxValue;
var max=int.MinValue;
for (var i=0; i<arr.Length-1; i+=2) {
var v1=arr[i];
var v2=arr[i+1];
if (v1>v2) {
if (v1>max) max=v1;
if (v2<min) min=v2;
} else {
if (v2>max) max=v2;
if (v1<min) min=v1;
}
}
if (arr.Length%2==1) {
var v=arr[arr.Length-1];
if (v>max) max=v;
if (v<min) min=v;
}

In the first case, we take all n values and compare them with the min and max values respectively, so n times 2. In the second example we take every two values (so n/2 times), compare them with each other (1 comparison) and then we compare the smaller value with min and the larger with max (another 2 comparisons), with a combined number of comparisons of n/2 times 3 (plus 2 extra ones if the number of items in the array is odd).

Update: Here is a variant for an IEnumerable<int>, the equivalent of a foreach:
var enumerator = enumerable.GetEnumerator();
var min = int.MaxValue;
var max = int.MinValue;
while (enumerator.MoveNext()) {
var v1 = enumerator.Current;
var v2 = enumerator.MoveNext() ? enumerator.Current : v1;
if (v1 > v2)
{
if (v1 > max) max = v1;
if (v2 < min) min = v2;
}
else
{
if (v2 > max) max = v2;
if (v1 < min) min = v1;
}
}

and has 2 comments
I found these cool websites where you can solve software challenges. Completely randomly I find out about this method that I found pretty cool, called Array.ConvertAll. Imagine you want to transform a string containing a space separated list of integers into an actual list of integers. You would use Array.ConvertAll(line.Split(' '),int.Parse). That is it. I liked the simplicity of it and also the fact that it works out of the box without having to import any namespace. The same thing can be achieved with LINQ thus: line.Split(' ').Select(int.Parse).ToArray(), but you need to import the System.Linq namespace.

Unfortunately, the same day I found about this method that I had never used before yet is there since .NET 2.0, I noticed that it doesn't exist in .NET Core. Like a butterfly, it only lived one day in my development repertoire.

Learning ASP.Net MVC series:
  1. Setup
  2. MVC Concepts
  3. Authentication
  4. Entity Framework Fundamentals
  5. Upgrading project to .NET Core 1.1
  6. Dependency Injection and Services


The previous version of Entity Framework was 6 and the current one is Entity Framework Core 1.0, although for a few years they have been going with Entity Framework 7. It might seem that they changed the naming to be consistent with .NET Core, but according to them they did it to avoid confusion. The new version sprouted from the idea of "EF everywhere", just like .Net Core is ".Net everywhere", and is a rewrite - a port, as they chose to call it, with some extra features but also lacking some of the functionality EF6 had - or better to say has, since they continue to support it for .NET proper. In this post I will examine the history and some of the basic concepts related to working with Entity Framework as opposed to a more direct approach (like opening System.Data.SqlConnection and issuing SqlCommands).

Entity Framework history


Entity Framework started as an ORM, a class of software that abstracts database access. The term itself is either a bit obsolete, with the advent of databases that call themselves non relational, or rebelliously exact, recognizing that anything that can be called a database needs to also encode relationships between data. But that's another topic altogether. When Entity Framework was designed it was all about abstracting SQL into an object oriented framework. How would that work? You would define entities, objects that inherited from a EntityBase class, and decorate their properties with attributes defining some restrictions that databases have, but objects don't, like the size of a field. You also had some default methods that could be overridden in order to control very specific custom requirements. In the background, objects would be mapped to tables, their simple properties to columns and their more complex properties to other tables that had a foreign key relationship with the owner object mapped table.

There were some issues with this system that quickly became apparent. With the data layer separation idea going strong, it was really cumbersome and ugly to move around objects that inherited from an entire hierarchy of Entity Framework classes and held state in ways that were almost opaque to the user. Users demanded the use of POCOs, a way to separate the functionality of EF from the data objects that were used through all the tiers of the application. At the time the solution was mostly to use simple objects within your application and then translate them to data access objects which were entities.

Microsoft also recognized this and in further iterations of EF, they went full POCO. But this enabled them to also move from one way of thinking to another. At the beginning the focus was on the database. You had your database structure and your data access layer and you wanted to add EF to your project, meaning you needed to map existing tables to C# objects. But now, you could go the other way around. You started with an application using plain objects and then just slapped EF on and asked it to create and maintain the database. The first way of thinking was coined "database first" and the other "code first".

In seven iterations of the framework, things have been changed and updated quite a lot. You can imagine that successfully adapting to legacy database structures while seamlessly abstracting changes to that structure and completely managing the mapping of objects to database was no easy. There were ups and downs, but Microsoft stuck with their guns and now they are making the strong argument that all your data manipulation should be done via EF. That's bold and it would be really stupid if Entity Framework weren't a good product they have full confidence in. Which they do. They moved from a framework that was embedded in .NET, to one that was partially embedded and then some extra code was separate and then, with EF6, they went full open source. EF Core is also open source and .NET Core is free of EF specific classes.

Also, EF Core is more friendly towards non relational databases, so you either consider ORM an all encompassing term or EF is no longer just an ORM :)

In order to end this chapter, we also need to discuss alternatives.

Ironically, both the ancestor and the main competitor for Entity Framework was LINQ over SQL. If you don't know what LINQ is, you should take the time to look it up, since it has been an integral part of .NET since version 3.5. in Linq2Sql you would manually map objects to tables, then use the mapping in your code. The management of the database and of the mapping was all you. When EF came along, it was like an improvement over this idea, with the major advantage (or flaw, depending on your political stance) that it handled schema mapping and management for you, as much as possible.

Another system that was and is very used was separating data access based on intent, not on structure. Basically, if you had the need to add/get the names of people from your People table, you would have another project that had some object hierarchy that in the end had methods for AddPeople and GetPeople. You didn't need to delete or update people, you didn't have the API for it. Since the intent was clear, so was the structure and the access to the database, all encapsulated - manually - into this data access layer project. If you wanted to get people by name, for example, you had to add that functionality and code all the intermediary access. This had the advantage (or flaw) that you had someone who was good with databases (and a bit with code) handling the maintenance of the data access layer, basically a database admin with some code writing permissions. Some people love the control over the entire process, while others hate that they need to understand the underlying database in order to access data.

From my perspective, it seems as there is an argument between people who want more control over what is going on and people who want more ease of development. The rest is more an architectural discussion which is irrelevant as EF is concerned. However, it seems to me that the Entity Framework team has worked hard to please both sides of that argument, going for simplicity, but allowing very fine control down the line. It also means that this blog post cannot possibly cover everything about Entity Framework.

Getting started with Entity Framework


So, how do things look in EF Core 1.0? Things are still split down the middle in "code first" and "database first", but code first is the recommended way for starting new projects. Database first is something that must be supported in perpetuity just in case you want to migrate to EF from an existing database.

Database first


Imagine you have tables in an SQL server database. You want to switch to EF so you need to somehow map the existing data to entities. There is a tutorial for that: ASP.NET Core Application to Existing Database (Database First), so I will just quickly go over the essentials.

First thing is to use NuGet to install EF in your project:
Install-Package Microsoft.EntityFrameworkCore.SqlServer
and then add
"Microsoft.EntityFrameworkCore.Tools": "1.0.0-preview2-final"
to the project.json tools section. For the Database First approach we also need other stuff like:
Install-Package Microsoft.EntityFrameworkCore.Tools –Pre
Install-Package Microsoft.EntityFrameworkCore.SqlServer.Design
Final touch, running
Scaffold-DbContext "<Sql connection string>" Microsoft.EntityFrameworkCore.SqlServer -OutputDir Models

At this time alarm bells are sounding already. Wait! I only gave it my database connection string, how can it automagically turn this into C# code and work?

If we look at the code to create the sample database in the tutorial above, there are two tables: Blog and Post and they are related via primary key/foreign key as is recommended to create an SQL database. Columns are clearly defined as NULL or NOT NULL and the size of text fields is conveniently Max.



The process created some interesting classes. Besides the properties that map to fields, the Blog class has a property of type ICollection<Post> which is instantiated with a HashSet<Post>. The real fun is the BloggingContext class, which inherits from DbContext and in the override for ModelCreating configures the relationships in the database.
  • Enforcing the required status of the blog Url:
    modelBuilder.Entity<Blog>(entity =>
    {
    entity.Property(e => e.Url).IsRequired();
    });
  • Defining the one-to-many relationship between Blog and Post:
    modelBuilder.Entity<Post>(entity =>
    {
    entity.HasOne(d => d.Blog)
    .WithMany(p => p.Post)
    .HasForeignKey(d => d.BlogId);
    });
  • Having the root sets used to access entities:
    public virtual DbSet<Blog> Blog { get; set; }
    public virtual DbSet<Post> Post { get; set; }

First thing to surprise me, honestly, is that the data model classes are as bare as possible. I would have expected some attributes on the properties defining their state as required, for example. EF Core allows to not pollute the classes with data annotations, as well as an annotation based system. The collections are interfaces and they are only instantiated with a concrete implementation in the constructor. An interesting choice for the collection type is HashSet. As opposed to a List it does not allow access via indexers, only enumerators. It is designed to optimize search: basically finding an item in the hashset does not depend on the size of the collection. Set operations like union and intersects can be used efficiently with Hashset, as well.

Hashset also does not allow duplicates and that may cause some sort of confusion. How does one define a duplicate? It uses IEqualityComparer. However, a HashSet can be instantiated with a custom IEqualityComparer implementation. Alternately, the Equals and GetHashCode methods can be overridden in the entities themselves. People are divided over whether one should use such mechanisms to optimize Entity Framework functionality, but keep in mind that normally EF would only keep in memory stuff that it immediately needs. Such optimizations are more likely to cause maintainability problems than save processing time.

Database first seems to me just a way to work with Entity Framework after using a migration tool. It sounds great, but there are probably a lot of small issues that one has to gain experience with when dealing with real life databases. I will blog about it if I get to doing something like this.

Code first


The code first tutorial goes the other direction, obviously, but has some interesting differences that tell me that a better model of migrating even existing databases is to start code first, then find a way to migrate the data from the existing database to the new one. This has the advantage that it allows for refactoring the database as well as provide some sort of verification mechanism when comparing the old with the new structure.

The setup is similar: use NuGet to install EF in your project:
Install-Package Microsoft.EntityFrameworkCore.SqlServer
then add
"Microsoft.EntityFrameworkCore.Tools": "1.0.0-preview2-final"
to the project.json tools section.

Then we create the models: a simple DbContext inheritance, containing DbSets of Blog and Post, and the data models themselves: Blog and Post. Here is the code:
public class BloggingContext : DbContext
{
public BloggingContext(DbContextOptions<BloggingContext> options)
: base(options)
{ }

public DbSet<Blog> Blogs { get; set; }
public DbSet<Post> Posts { get; set; }
}

public class Blog
{
public int BlogId { get; set; }
public string Url { get; set; }

public List<Post> Posts { get; set; }
}

public class Post
{
public int PostId { get; set; }
public string Title { get; set; }
public string Content { get; set; }

public int BlogId { get; set; }
public Blog Blog { get; set; }
}

Surprisingly, the tutorial doesn't go into any other changes to this code. There are no HashSets, there are no restrictions over what is required or not and how the classes are related to each other. A video demo of this also shows the created database and it contains primary keys. A blog has a primary key on BlogId, for example. To me that suggests that convention over configuration is also used in the background. The SomethingId property of a class named Something will automatically be considered the primary key (also simply Id). Also, if you look in the code that EF is executing when creating the database (these are called migrations and are pretty cool, I'll discuss them later in the post) Blogs are connected to Posts via foreign keys, so this thing works wonders if you name your entities right. I also created a small console application to test this and it worked as advertised.

Obviously this will not work with every scenario and there will be attributes attached to models and novel ways of configuring mapping, but so far it seems pretty straightforward. If you want to go into the more detailed aspects of controlling your data model, try reading the documentation provided by Microsoft so far.

Entity Framework concepts


We could go right into the code fray, but I choose to first write some more boring conceptual stuff first. Working with Entity Framework involves understanding concepts like persistence, caching, migrations, change saving and the underlying mechanisms that turn code into SQL, of the Unit of Work and Repository patterns, etc. I'll try to be brief.

Context


As you have seen earlier, classes inheriting from DbContext are the root of all database access. I say classes, because more of them can be used. If you want to copy from one database to another you will need to contexts. The context defines a data model, differentiated from a database schema by being a purely programmatic concept. DbContext implements IDisposable so for nuclear operations it can be used just as one uses an open SQL connection. In fact, if you are tempted to reuse the same context remember that its memory use increases with the quantity of data it accesses. It is recommended for performance reasons to immediately dispose a context when finishing operations. Also, a DbContext class is not thread safe. It stands to reason to use context for as short a period as possible inside single threaded operations.

DbContext provides two hooks called OnConfiguring and OnModelCreating that users can override to configure the context and the model, respectively. Careful, though, one can configure the context to use a specific implementation of IModel as model, in which case OnModelCreating will not be called. The other most important functionality of DbContext is SaveChanges, which we will discuss later. Worth mentioning are Database and Model, properties that can be used to access database and model information and metadata. The rest are Add, Update, Remove, Attach, Find, etc. plus their async and range versions allowing for the first time - EF6 did not - to dynamically send an object to a function like Add for example and let EF determine where to add it. It's nothing that sounds very safe to use, but probably there were some scenarios where it was necessary.

DbSet


For each entity type to be accessed, the context should have DbSet<TEntity> properties for that type. DbSet allows for the manipulation of entities via methods like Add, Update, Remove, Find, Attach and is an IEnumerable and IQueriable of TEntity. However, in order to persist any change, SaveChanges needs to be called on the context class.

SaveChanges


The SaveChanges method is the most important functionality of the context class, which otherwise caches the accessed objects and their state waiting either for this method to be called or for the context to be disposed. Important improvements on the EF Core code now allows to send these changes to the database using batches of commands. Before, in EF6 and previously, each change was sent separately so, for example, adding two entities to a set and saving changes would do two database trips. From EF Core onward, that would only take one trip unless specifically configured with MaxBatchSize(number). Revert to the EF6 behavior using MaxBatchSize(1). This applies to SqlServer only so far.

This behavior is the reason why contexts need to be released as soon as their work is done. If you query all the items with a name starting with 'A', all of these items will be loaded in the context memory. If you then need to get the ones starting with 'B', the performance and memory will be affected if using the same context. It might be helpful, though, if then you need to query both items starting with 'A' and the ones starting with 'B'. It's your choice.

One particularity of working with Entity Framework is that in order to update or delete records, you first need to query them. Something like
context.Posts.RemoveRange(context.Posts.Where(p => p.Title.StartsWith("x")));
There is no .RemoveRange(predicate) because it would be impossible to resolve a query afterwards. Well, not impossible, only it would have to somehow remember the predicate, alter subsequent selects to somehow gather all information required and apply deletion on the client side. Too complicated. There is a way to access the database by writing SQL directly and again EF Core has some improvements for this, but raw SQL changes are opaque to an already existing context.

Unit of Work and Repository patterns


The Repository pattern is an example of what I was calling before an alternative to Entity Framework: a separation of data access from business logic that improves testability and keeps distinct responsibilities apart. That doesn't mean you can't do it with EF, but sometimes it feels pretty shallow and developers may be tempted to skip this extra encapsulation.

A typical example is getting a list of items with a filter, like blog posts starting with something. So you create a repository class to take over from the Posts DbSet and create a method like GetPostsStartingWith. A naive implementation returns a List of items, but this actually hinders EF in what it tries to do. Let's assume your business logic requires you to return the first ten posts starting with 'A'. The initial code would look like this:
var posts=context.Posts.Where(p=>p.Title.StartsWith("A")).Take(10).ToList();
In this case the SQL code sent to the database is like SELECT TOP 10 * FROM Posts WHERE Title LIKE 'A%'. However, in a code looking like this:
var repo=new PostsRepository();
var posts=repo.GetPostsStartingWith("A").Take(10).ToList();
will first pull all posts starting with "A" then retrieve the first 10. Ouch! The solution is to return IQueryable instead of IEnumerable or a List, but then things start to feel fishy. Aren't you just shallow proxying the DbSet?

Unit of Work is some sort of encapsulation of similar activities using the same data, something akin to a transaction. Let's assume that we store the number of posts in the Counts table. So if we want to add a post we need to do the adding, then change the value of the count. The code might look like this:
var counts=new CountsRepository();
var blogs=new BlogRepository();
var blog=blogs.Where(b.Name=="Siderite's Blog").First();
blog.Posts.Add(post);
counts.IncrementPostCount(blog);
blog.Save();
counts.Save();
Now, since this selects a blog and changes posts then updates the counts, there is no reason to use different contexts for the operation. So one could create a Unit of Work class that would look a bit like a common repository for blogs and counts. Let's ignore the silly example as well as the fact that we are doing posts operations using the BlogRepository, which is something that we are kind of forced to do in this situation unless we start to deconstruct EF operations and recreate them in our code. There is a bigger elephant in the room: there already exists a class that encapsulates access to database, caches the items retrieved and creates one atomic operation for both changes. It's the context itself! If we instantiate the repositories with a constructor that accepts a context, then all one has to do to atomize the operations is to put the code inside a using block.

There are also controversies related to the use of these two patterns with EF. Rob Conery has a nice blog post suggesting Command/Query objects instead. His rationale is that if you have to pass a context object, as above, there is no much decoupling involved.

I lean towards the idea that you need a Data Access Layer encapsulation no matter what. I would put the using block in a method in a class rather than pass the context or not use a repository. Also, since we saw that entity type is not a good separation of "repositories" - I feel that I should name them differently in this situation - and the intent of the methods is already declared in their name (like GetPosts...) then these encapsulation classes should be separated by some other criteria, like ContentRepository and ForumRepository, for example.

Migrations


Migrations are cool! The idea is that when making changes to structure of the database one can extract those changes in a .cs file that can be added to the project and to source control. This is one of the clear advantages of using Entity Framework.

First of all, there are a zillion tutorials on how to enable migrations, most of them wrong. Let's list the possible ways you could go wrong:
  • Enable-Migrations is obsolete - older tutorials recommended to use the Package Manager Console command Enable-Migrations. This is now obsolete and you should use Add-Migration <Name>
  • Trying to install EntityFramework.Commands - due to namespace changes, the correct namespace would be Microsoft.EntityFrameworkCore.Commands anyway, which doesn't exist. EntityFramework.Commands is version 7, so it shouldn't be used in .NET Core. However, at one point or another, this worked if you added some imports and changed stuff around. I tried all that only to understand the sad truth: you should not install it at all!
  • Having a DbContext inheriting class that doesn't have a default constructor or is not configured for dependency injection - the migration tool looks for such classes then creates instances of them. Unless it knows how to create these instances, the Add-Migration will fail.

The correct way to enable migrations is... to install the packages from the Database First section! Yes, that is right, if you want migrations you need to install
Install-Package Microsoft.EntityFrameworkCore.Tools –Pre
Install-Package Microsoft.EntityFrameworkCore.SqlServer.Design
Only then you may open the Package Manage Console and run
Add-Migration FirstMigration
Note that I am discussing an SQL Server example. It is possible you will need other packages if using a different type of database.

The result is a folder called Migrations in which you will find two files: a snapshot and the migration itself. Here is an example of the snapshot:
[DbContext(typeof(BloggingContext))]
partial class BloggingContextModelSnapshot : ModelSnapshot
{
protected override void BuildModel(ModelBuilder modelBuilder)
{
modelBuilder
.HasAnnotation("ProductVersion", "1.0.0-rtm-21431")
.HasAnnotation("SqlServer:ValueGenerationStrategy", SqlServerValueGenerationStrategy.IdentityColumn);

modelBuilder.Entity("EFCodeFirst.Blog", b =>
{
b.Property<int>("BlogId")
.ValueGeneratedOnAdd();

b.Property<string>("Url");

b.HasKey("BlogId");

b.ToTable("Blogs");
});

modelBuilder.Entity("EFCodeFirst.Post", b =>
{
b.Property<int>("PostId")
.ValueGeneratedOnAdd();

b.Property<int>("BlogId");

b.Property<string>("Content");

b.Property<string>("Title");

b.HasKey("PostId");

b.HasIndex("BlogId");

b.ToTable("Posts");
});

modelBuilder.Entity("EFCodeFirst.Post", b =>
{
b.HasOne("EFCodeFirst.Blog", "Blog")
.WithMany("Posts")
.HasForeignKey("BlogId")
.OnDelete(DeleteBehavior.Cascade);
});
}
}

And here is one of the migration:
public partial class First : Migration
{
protected override void Up(MigrationBuilder migrationBuilder)
{
migrationBuilder.CreateTable(
name: "Blogs",
columns: table => new
{
BlogId = table.Column<int>(nullable: false)
.Annotation("SqlServer:ValueGenerationStrategy", SqlServerValueGenerationStrategy.IdentityColumn),
Url = table.Column<string>(nullable: true)
},
constraints: table =>
{
table.PrimaryKey("PK_Blogs", x => x.BlogId);
});

migrationBuilder.CreateTable(
name: "Posts",
columns: table => new
{
PostId = table.Column<int>(nullable: false)
.Annotation("SqlServer:ValueGenerationStrategy", SqlServerValueGenerationStrategy.IdentityColumn),
BlogId = table.Column<int>(nullable: false),
Content = table.Column<string>(nullable: true),
Title = table.Column<string>(nullable: true)
},
constraints: table =>
{
table.PrimaryKey("PK_Posts", x => x.PostId);
table.ForeignKey(
name: "FK_Posts_Blogs_BlogId",
column: x => x.BlogId,
principalTable: "Blogs",
principalColumn: "BlogId",
onDelete: ReferentialAction.Cascade);
});

migrationBuilder.CreateIndex(
name: "IX_Posts_BlogId",
table: "Posts",
column: "BlogId");
}

protected override void Down(MigrationBuilder migrationBuilder)
{
migrationBuilder.DropTable(
name: "Posts");

migrationBuilder.DropTable(
name: "Blogs");
}
}

Note that this is not something that copies the changes in data, only the ones in the database schema.

Conclusions


Yes, no code in this post. I wanted to explore Entity Framework in my project, but if I would have continued it like that the post would have become too long. As you have seen, there are advantages and disadvantages in using Entity Framework, but at this point I find it more valuable to use it and meet any problems I find face on. Besides, the specifications of my project don't call for complex database operations so the data access mechanism is quite irrelevant.

Stay tuned for the next post in which we actually use EF in ContentAggregator!

About 25 years ago I was getting Compton's Multimedia Encyclopedia CD-ROM as a gift from my father. Back then I had no Internet so I delved into what now seems impossibly boring, looking up facts, weird pictures, reading about this and that.

At one time I remember I found a timeline based feature that showed on a scrolling bar the main events of history. I am not much into history, I can tell you that, but for some reason I became fascinated with how events in American history in particular were lining up. So I extracted only those and, at the end, I presented my findings to my grandmother: America was an expanding empire, conquering, bullying, destabilizing, buying territory. I was really adamant that I had stumbled onto something, since the United States were supposed to be moral and good. Funny how a childhood of watching contraband US movies can make you believe that. My grandmother was not impressed and I, with the typical attention span of a child, abandoned any historical projects in the future.

Fast forward to now, when, looking for Oliver Stone to see what movies he has done lately, I stumble upon a TV Series documentary called The Untold History of the United States. You can find it in video format, but also as a companion book or audio book. While listening to the audio book I realized that Stone was talking about my childhood discovery, also disillusioned after a youth of believing the American propaganda, then going through the Vietnam war and realizing that history doesn't tell the same story as what is being circulated in classes and media now.

However, this is no childish project. The book takes us through the US history, skirting the good stuff and focusing on the bad. Yet it is not done in malice, as far as I could see, but in the spirit that this part of history is "untold", hidden from the average eye, and has to be revealed to all. Stone is a bit extremist in his views, but this is not a conspiracy theory book. It is filled with historical facts, arranged in order, backed by quotes from the people of the era. Most of all, it doesn't provide answers, but rather questions that the reader is invited to answer himself. Critics call it biased, but Stone himself admits that it is with intent. Other materials and tons of propaganda - the history of which is also presented in the book - more than cover the positive aspect of things. This is supposed to be a balancing force in a story that is almost always said from only one side.

The introductory chapter alone was terrifying, not only because of the forgotten atrocities committed by the US in the name of the almighty dollar and God, but also because of the similarities with the present. Almost exactly a century after the American occupation of the Philippines, we find the same situation in the Middle-East. Romanians happy with the US military base at Deveselu should perhaps check what happened to other countries that welcomed US bases on their territory. People swallowing immigration horror stories by the ton should perhaps find out more about a little film called Birth of a Nation, revolutionary in its technical creation and controversial - now - for telling the story of the heroic Ku-Klux-Klan riding to save white folk - especially poor defenseless women - from the savage negroes.

By no means I am calling this a true complete objective history, but the facts that it describes are chilling in their evil banality and unfortunately all true. The thesis of the film is that America is losing its republican founding fathers roots by behaving like an empire, good and moral only in tightly controlled and highly financed media and school curricula. It's hard not to see the similarities between US history a century ago and today, including the presidential candidates and their speeches. The only thing that has changed is the complete military and economic supremacy of the United States and the switch from territorial colonialism to economic colonialism. I am not usually interested in history, but this is a book worth reading.

I leave you with Oliver Stone's interview (the original video was removed by YouTube for some reason):

While researching the new .NET Core features and functionalities I've stumbled upon this pattern for hiding functionality, but also making it accessible when needed.

There was a long history of Microsoft writing the code as closed as possible: classes and interfaces are internal protected and sealed and all that jazz. If you have ever tried to copy paste Microsoft .NET source code into your project, in order to modify it to your needs, you know what I mean. More times than not I gave up because of the immense chain of dependencies that had to be all copy pasted in order for a small piece of code to work.

Well, .NET Core is now open source and there is a strong current of moving away from such practices. One pattern that drew my attention is the IInfrastructure<T> interface and pattern used in EntityFramework. Basically, instead of exposing rarely used members directly, you hide them within a generic interface that can be retrieved at will.

Yes, it is possible to do the same thing with an explicitly implemented interface, but this is more of a two step way of doing it (and also of uncluttering class signatures). The concrete example is DbContext, which is an explicit implementation of IInfrastructure<IServiceProvider>. IService provider has a GetService<T> method that returns specific implementation of interfaces of base classes. Then, with a nice extension method called GetInfrastructure<T>, one can get the service provider. For example one can retrieve the relational type mapper from a context using:
var serviceProvider=context.GetInfrastructure();
var mapper=serviceProvider.GetService<IRelationalTypeMapper>();

I find it interesting as a general pattern, allowing one to expose innumerable interface signatures without inheriting from them all. A class can enable any number of mechanisms for discovery and execution simply by implementing its own service provider. Moreover, if there is some sort of general Service Locator pattern in place, classes can locally override that mechanism while leaving the rest in place. Clearly there is potential for abuse, but I also see it as a way to clearly represent and separate concerns.

I am writing this post to rant against subscription popups. I've been on the Internet long enough to remember when this was a thing: a window would open up and ask you to enter your email address. We went from that time, through all the technical, stylistic and cultural changes to the Internet, to this Web 3.0 thing, and the email subscription popups have emerged again. They are not ads, they are simply asking you to allow them into your already cluttered inbox because - even before you've had a chance to read anything - what they have to say is so fucking important. Sometimes they ask you to like them on Facebook or whatever crap like that.

Let me tell you how to get rid of these real quick. Install an ad blocker, like AdBlockPlus or uBlock Origin. I recommend uBlock Origin, since it is faster and I feel works better than the older AdBlock. Now this is something that anyone should do just to get rid of ads. I've personally never browsed the Internet from a tablet or cell phone because they didn't allow ad blockers. I can't go on the web without them.

What you may not know, though, is that there are several lists of filters that you can choose from and that are not enabled by default when you install an ad blocker. One of my favourite lists is Fanboy's Annoyances list. It takes care of popups of all kinds, including subscriptions. But even so, if the default list doesn't contain the web site you are looking at, you have the option to pick elements and block them. A basic knowledge of CSS selectors helps, but here is the gist of it: ###something means the element with the id "something" and ##.something is the elements with the class name "something". Here is an example: <div id="divPopup" class="popup ad annoying"> is a div element that has id "divPopup" and class names "popup", "ad" and "annoying".

One of the reason why subscription popups are not always blocked is because beside the elements that they cover the page with, they also place some constraints on the page. For example they place a big element over the screen (what is called an overlay), then a popup element in the center of the screen and also change the style of the entire page to not scroll down. So if you would remove the overlay and the popup, the page would only show you the upper part and not allow you to scroll down. This can be solved with another browser extension called Stylish, which allows you to save and apply your own style to pages you visit. The CSS rule that solves this very common scenario is html,body { overflow: auto !important; }. That is all. Just add a new style for the page and copy paste this. 19 in 20 chances you will get the scroll back.

To conclude, whenever you see such a stupid, stupid thing appearing on the screen, consider blocking subscription popups rather than pressing on the closing button. Block it once and never see it again. Push the close button and chances are you will have to keep pressing it each time you visit a page.

Now, if I only had a similar option for jump scares in movies...

P.S. Yes, cookie consent popups are included in my rant. Did you know that you can block all cookie nagware from Blogspot within one fell swoop, rather than having to click OK at each blog individually, for example?

Learning ASP.Net MVC series:

  1. Setup
  2. MVC Concepts
  3. Authentication
  4. Entity Framework Fundamentals
  5. Upgrading project to .NET Core 1.1
  6. Dependency Injection and Services


In the setup part of the series I've created a set of specifications for the ASP.Net MVC app that I am building and I manufactured a blank project to start me up. There was quite a bit of confusion on how I would continue the series. Do I go towards the client side of things, defining the overall HTML structure and how I intend to style it in the future? Do I go towards the functionality of the application, like google search or extracting text and applying word analysis on it? What about the database where all the information is stored?

In the end I opted for authentication, mainly because I have no idea how it's done and also because it naturally leads into the database part of things. I was saying that I don't intend to have users of the application, they can easily connect with their google account - which hopefully I will also use for searching (I hate that API!). However, that's not quite how it goes: there will be an account for the user, only it will be connected to an outside service. While I completely skirt the part where I have to reset the password or email the user and all that crap - which, BTW, was working rather well in the default project - I still have to set up entities that identify the current user.

How was it done before?


In order to proceed, let's see how the original project did it. It was first setting a database context, then adding Identity using a class named ApplicationUser.

services.AddDbContext<ApplicationDbContext>(options =>
options.UseSqlite(Configuration.GetConnectionString("DefaultConnection")));

services.AddIdentity<ApplicationUser, IdentityRole>()
.AddEntityFrameworkStores<ApplicationDbContext>()
.AddDefaultTokenProviders();


ApplicationUser is a class that inherits from IdentityUser, while ApplicationDbContext is something inheriting from IdentityDbContext<ApplicationUser>. Seems like we are out of luck and the identity and db context are coupled pretty strongly. Let's see if we can decouple them :) Our goal: using OAuth to connect with a Google account, while using no database.

Authentication via Google+ API


The starting point of any feature is coding and using autocomplete and Intellisense until it works reading the documentation. In our case, the ASP.Net Authentication section, particularly the authentication using Google part. It's pretty skimpy and it only covers Facebook. Found another link that actually covers Google, but it's for MVC 5.

Enable SSL


Both tutorials agree that first I need to enable SSL on my web project. This is done by going to the project properties, the Debug section, and checking Enable SSL. It's a good idea to copy the https URL and set it as the start URL of the project. Keep that URL in the clipboard, you are going to need it later, as well.



Install Secret Manager


Next step is installing the Secret Manager tool, which in our case is already installed, and specifying a userSecretsId, which should also be already configured.

Create Google OAuth credentials


Next let's create credentials for the Google API. Go to the Google Developer Dashboard, create a project, go to Credentials → OAuth consent screen and fill out the name of the application. Go to the Credentials tab and Create Credentials → OAuth client ID. Select Web Application, fill in a name as well as the two URLs below. We will use the localhost SSL URL for both like this:

  • Authorised JavaScript origins: https://localhost:[port] - the URL that you copied previously
  • Authorised redirect URIs: https://localhost:[port]/account/callback - TODO: create a callback action

Press Create. At this point a popup with the client ID and client secret appears. You can either copy the two values or close the popup and download the json file containing all the data (project id and authorised URLs among them), or copy the values directly from the credentials dialog.



Make sure to go back to the Dashboard section and enable the Google+ API, in the Social APIs group. There is a quota of 10000 requests per day, I hope it's enough. ;)



Writing the authentication code


Let's use the 'dotnet user-secrets' tool to save the two credential values. Run the following two commands in the project folder:

dotnet user-secrets set Authentication:Google:ClientId <client-Id>
dotnet user-secrets set Authentication:Google:ClientSecret <client-Secret>

Use the values from the Google credentials, obviously. In order to get to the two values all we need to do is call Configuration["Authentication:Google:ClientId"] in C#. In order for this to work we need to have loaded the package Microsoft.Extensions.Configuration.UserSecrets in project.json and somewhere in Startup a code that looks like this: builder.AddUserSecrets();, where builder is the ConfigurationBuilder.

Next comes the installation of the middleware responsible for authenticating google and which is called Microsoft.AspNetCore.Authentication.Google. We can install it using NuGet: right click on References in Visual Studio, go to Manage NuGet packages, look for Microsoft.AspNetCore.Authentication.Google ("ASP.NET Core contains middleware to support Google's OpenId and OAuth 2.0 authentication workflows.") and install it.



Now we need to place this in Startup.cs:

app.UseCookieAuthentication(new CookieAuthenticationOptions
{
AuthenticationScheme = "Cookies",
AutomaticAuthenticate = true,
AutomaticChallenge = true,
LoginPath = new PathString("/Account/Login")
});

app.UseGoogleAuthentication(new GoogleOptions
{
AuthenticationScheme="Google",
SignInScheme = "Cookies",
ClientId = Configuration["Authentication:Google:ClientId"],
ClientSecret = Configuration["Authentication:Google:ClientSecret"],
CallbackPath = new PathString("/Account/Callback")
});

Yay! code!

Let's start the website. A useful popup appears with the message "This project is configured to use SSL. To avoid SSL warnings in the browser you can choose to trust the self-signed certificate that IIS Express has generated. Would you like to trust the IIS Express certificate?". Say Yes and click OK on the next dialog.



What did we do here? First, we used cookie authentication, which is not some gluttonous bodyguard with a sweet tooth, but a cookie middleware, of course, and our ticket for authentication without using identity. Then we used another middleware, the Google authentication one, linked to the previous with the "Cookies" SignInScheme. We used the ClientId and ClientSecret we saved previously in the Secret Manager. Note that we specified an AuthenticationScheme name for the Google authentication.

Yet, the project works just fine. I need to do one more thing for the application to ask me for a login and that is to decorate our one action method with the [Authorize] attribute:

[Authorize]
public class HomeController : Controller
{
public IActionResult Index()
{
return View();
}

}

After we do that and restart the project, the start page will still look blank and empty, but if we look in the network activity we will see a redirect to a nonexistent /Account/Login, as configured:


The Account controller


Let's create this Account controller and see how we can finish the example. The controller will need a Login method. Let me first show you the code, then we can discuss it:

public class AccountController : Controller
{
public IActionResult Login(string ReturnUrl)
{
return new ChallengeResult("Google",new AuthenticationProperties
{
RedirectUri = ReturnUrl ?? "/"
});
}
}


We simply return a ChallengeResult with the name of the authentication scheme we want and the redirect path that we get from the login ReturnUrl parameter. Now, when we restart the project, a Google prompt welcomes us:

After clicking Allow, we are returned to the home page.



What happened here? The home page redirected us to Login, which redirected us to the google authentication page, which then redirected us to /Account/Callback, which redirected us - now authenticated - to the home page. But what about the callback? We didn't write any callback method. (Actually I first did, complete with a complex object to receive all the parameters. The code within was never executed). The callback route was actually defined and handled by the Google middleware. In fact, if we call /Account/Callback, we get an authentication error:


One extra functionality that we might need is the logout. Let's add a Logout method:

public async Task<IActionResult> LogOut()
{
await HttpContext.Authentication.SignOutAsync("Cookies");

return RedirectToAction("index", "home");
}

Now, when we go to /Account/Logout we are redirected to the home page, where the whole authentication flow from above is being executed. We are not asked again if we want to give permission to the application to use our google credentials, though. In order to reset that part, go to Apps connected to your account.

What happens when we deny access to the application? Then the callback action will be called with a different set of parameters, triggering a RemoteFailure event. The source code on GitHub contains extra code that covers this scenario, redirecting the user to /Home/Error with the failure reason:

Events = new OAuthEvents
{
OnRemoteFailure = ctx =>
{
ctx.Response.Redirect("/Home/Error?ErrorMessage=" + UrlEncoder.Default.Encode(ctx.Failure.Message));
ctx.HandleResponse();
return Task.FromResult(0);
}
}

What about our user?


In order to check the results of our work, let's add some stuff to the home page. Mainly I want to show all the information we got about our user. Change the index.cshtml file to look like this:

<table class="table">
@foreach (var claim in User.Claims)
{
<tr>
<td>@claim.Type</td>
<td>@claim.Value</td>
</tr>
}
</table>

Now, when I open the home page, this is what gets returned:

http://schemas.xmlsoap.org/ws/2005/05/identity/claims/nameidentifier 111601945496839159547
http://schemas.xmlsoap.org/ws/2005/05/identity/claims/givenname Siderite
http://schemas.xmlsoap.org/ws/2005/05/identity/claims/surname Zackwehdex
http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name Siderite Zackwehdex
http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress sideritezaqwedcxs@gmail.com
urn:google:profile https://plus.google.com/111601945496839159547


User is a System.Security.Claims.ClaimsPrincipal object, that contains not only a simple bag of Claims, but also a list of Identities. In our example I only have an identity and the User.Claims are the same with User.Identities[0].Claims, but in other cases, who knows?

Acknowledgements


If you think it was easy to scrap up this simple example, think again. Before the OAuth2 system there was an OpenID based system that used almost the same method and class names. Then there is the way they did it in .NET proper and the way they do it in ASP.Net Core... which changed recently as well. Everyone and their grandmother have a blog about how to do Google authentication, but most of them either don't apply or are obsolete. So, without further ado, let me give you the links that inspired me to do it this way:

Final thoughts


By no means this is a comprehensive walkthrough for authentication in .NET Core, however I am sure that I will cover a lot more ground in the posts to come. Stay tuned for more!

Source code for the project after this chapter can be found on GitHub.

Learning ASP.Net MVC series:
  1. Setup
  2. MVC Concepts
  3. Authentication
  4. Entity Framework Fundamentals
  5. Upgrading project to .NET Core 1.1
  6. Dependency Injection and Services

After I've spent a day of writing working on the application I realized that many of the concepts I take for granted have not been discussed. Consider this part as an introduction to the things *I* know about ASP.Net MVC. :)

Emveesee


The Model View Controller pattern attempts to separate three different concerns of the application: the flow (Controller), the display and the user interface (View) and the various data objects that are passed, validated and manipulated (Model), which are also responsible for the logic and rules of the application. In ASP.Net, MVC means:
  • the models are POCOs, for which validation constraints, display options and other aspects of how they are intended to be used are expressed with attributes decorating the classes or their properties.
  • the controllers are classes inheriting from Controller, their names ending with "Controller". Their methods are called controller actions and represent endpoints for HTTP calls. Attributes are again used to configure these actions, like if they need to be accessed by POST or PUT. The responsibility of controllers is to... well... control the action in the application.
  • the views are files with the .cshtml extension. They are found in the Views folder and the convention is that the view for a controller action is found in /Views/ControllerName(without "Controller")/ActionName. While I guess someone could hack MVC to use the old ASP.Net engine, the preferred engine for views is Razor (the one with the mustaches). The direction of the ASP.Net MVC views and templates is fine control over the generated markup, as opposed to the old ASP.Net way of encapsulating everything in server side user controls. The preferred way to encapsulate control behavior is now client side, with frameworks helping with it like AngularJS and ReactJS.

Add to this services and middleware. Services are encapsulation of logic. For example a service may determine aspects of flow of data from the controller to the view or validate the values of a model class. While these two services would be part of the Controller and Model parts, respectively, in code they are pieces of code, usually implementations of interfaces - for testing and dependency injection reasons - that determine their purpose. Middleware are components - they can be composed - that react to what happens in the HTTP stream (requests and responses). Most of what MVC does internally depends on middleware and services.

MVC controversies


The ugly truth is that MVC has been around for 30 years and people implement it differently every time. Because it spans a large domain (it handles everything, basically) the vague wording used to describe the pattern causes a lot of confusion. Why did Microsoft choose MVC for their new version of ASP.Net? Well, first of all because their first attempt - ASP.Net Forms, which tried to bring the desktop application development style to the web - failed miserably. Second, because at the time they were making the decision, Ruby on Rails was the coolest thing since man discovered fire. I mean, it made a shitty programming language like Ruby look useful! (Just kidding, irate Ruby developer. Was trying to see if you're paying attention). People are still fighting over the question if the model is supposed to handle the application logic or the controller or even a separate part (services?).

My own interpretation is that models are mostly data classes. They should have code that handles their own internal state, but nothing else. The controller controls the flow of the data, meaning that if the user accesses an action, the method will direct execution towards the correct component handling that action. Therefore, for me, application logic is neither in the controller or the models.

Imagine a family: the wife tells the husband "go to the market and buy 10 eggs and some tomatoes!". Wifey is the user, the husband is the controller. He understands the intent of the user and directs execution towards its implementation. Now, the husband could go to the market and buy the eggs himself, but that would be bad form (heh heh), so he goes to his two sons and tells them "Frank and Joe, go to the market! Frank, get me some eggs, 10 of them. Joe, get me some tomatoes for a salad. Now, git!" (get it? git? I am on a roll). At this point the sons are confused: are they Model or are they Controller? Meanwhile the eggs and tomatoes are clearly part of the model. An egg may spoil, for example, and that is probably the responsibility of the egg. You may consider that the market basket containing eggs and tomatoes is the model, conveniently leaving aside the functionality when the user sees the quality of the purchases and chastises the poor controller for it.

Certainly, ASP.Net MVC leans towards my interpretation of things. Classes in the Models folder in the default application are just classes with decorated properties and the piece of code that interprets the attributes and their values, that binds parameters to properties, that is a service. The code that does stuff, after the controller determined it's OK to be executed, are again managers and services. For example there are sign-in and user managers in the code, which are implementations from the .Net code itself. If one inlines all of them, it looks to me as if the controller is taking care of the logic of the application, not the model.

Convention over configuration


ASP.Net MVC embraced the Convention over configuration paradigm. You don't need to hook up controllers anywhere, or define the dependency between views and controllers. A controller for movies will be a class called MoviesController, placed in the Controllers folder and the convention is that every call to its actions would start with /movies. A view for a List action would be placed in /Views/Movies/List.cshtml and expected to be called as http(s)://host:port/movies/list. A typical code would look like this:
public IActionResult List()
{
return View();
}
View is a shorthand for rendering the view of this method by naming convention.

The pipeline for MVC is based on what they call middleware - what in the old ASP.Net were called handlers, I guess. The main component of an MVC application is its WebHostBuilder, which then uses a class to configure itself, which is usually named Startup. The methods and properties in Startup will be executed/populated using dependency injection, meaning that parameters will be interfaces: their specific implementation will be determined by ASP.Net MVC based on user configuration, if any.

Same thing applies to action parameters. Values are obtained from the body of the call or from the HTTP GET or POST parameters. An action like List(int id, string name) will get the correct parameters from a call like /list?id=1&name=Steven. Based on the routing (the default one: {controller=Home}/{action=Index}/{id?} being the one responsible for the most common REST conventions), the same result can be achieved with /list/1?name=Steven, for example. The values can be retrieved also for a method looking like this: List(User user), if the User class has Id and Name properties. This is called model binding and most of the interaction between the browser and the .NET code at the backend will be done through it.

Extension methods: dependency injection and middleware


The pattern of configuration for your application is to have it built by fluent methods, using the so called builder pattern. You start with the WebHostBuilder, for example, then you .UseKestrel, .UseIISIntegration, .UseStartup, etc. The default template code looks like this:
var host = new WebHostBuilder()
.UseKestrel()
.UseContentRoot(Directory.GetCurrentDirectory())
.UseIISIntegration()
.UseStartup<Startup>()
.Build();

host.Run();
These methods are extension methods, their complex functionality hidden behind this simple pattern of use. Check out the simple .AddMvc() method, how it deceptively covers so much complexity. And in the source code, other extension methods, each with their own complexity, but eventually leading to either injecting some dependency or configuring and adding middleware to the MVC pipeline. It seems to me that dependency injection methods start with Add, while the middleware inserting methods start with Use.

As an example, let's take one of the lines inside .AddMvc(): builder.AddRazorViewEngine();. Following one of the many branches defined by this extension pattern (I am still not sure how much I like it) we get to MvcRazorMvcCoreBuilderExtensions.AddRazorViewEngineServices, which injects a lot of dependencies. Take a look at
// This caches compilation related details that are valid across the lifetime of the application.
services.TryAddSingleton<ICompilationService, DefaultRoslynCompilationService>();
. One can change the implementation of the compilation service! Alternately, let's look at .UseStaticFiles(). It's a wrapper over
app.UseMiddleware<StaticFileMiddleware>();

Open source .NET Core


As you've seen from the examples above, I've often linked to the source code of methods and classes. That is because finally Microsoft decided to open source the .NET Core code and thus both let people find and solve their problems and allow developers to find where pesky hard to explain bugs are coming from. The extension method pattern is making difficult to explore what is going on, as you have to switch from one project to another on the GitHub interface (or your own file system, depending on how you decide to work). Dependency injection makes it even harder, as you have to first find the interface responsible for your current programming task, then find all implementations and what injected them in the first place. I tried to find some decent exploring tool, but found none and I am too busy to make one of my own. Homework? :)

Even so, it is a great boon that one can look into the innards of the Microsoft code. It not only helps pinpoint issues, but also teaches about how one of the biggest software companies writes code. I don't want to dissect middleware in this post, but I strongly suggest you take a look at how they are made and how they are working. Whenever I find it's useful, I will mention the middleware responsible with what I am discussing, so try to make an effort to look its source code and see what it actually does.

Attributes


Attributes are used all over ASP.Net MVC. They tell what HTTP method to accept for controller actions, how to authorize access, how to validate models, how to bind the incoming parameters to models. Here is en example:
//The user needs to be authorized to access this method
[Authorize]
//only POST requests
[HttpPost]
// over HTTPS are accepted
[RequireHttps]
//The URL for this method will be /util not /Hardcore
[ActionName("util")]
//controller method
public IActionResult Hardcore([FromBody] /*the data will be taken from the body of the request*/ HardcoreData data)
{
//only show the view if the model is valid
if (ModelState.IsValid) return View();
//otherwise return a bad request
return BadRequest(ModelState);
}

public class HardcoreData
{
// value needs to be set (not null)
[Required]
// the format of Id needs to be a URL
[DataType(DataType.Url,ErrorMessage = "The Id need to be a URL")]
// shorter or equal than 500 characters
[StringLength(500,ErrorMessage ="Maximum URL length is 500 characters")]
public string Id { get; set; }

//Range validation from 0 to 100
[Range(0,100,ErrorMessage ="Value needs to be between 0 and 100 and even")]
//Custom validation using the class and method mentioned
[CustomValidation(typeof(HarcoreDataValidator),"ValidateValue")]
public int Value { get; set; }
}

public class HarcoreDataValidator {
public static ValidationResult ValidateValue(int value)
{
return value % 2 == 0
? ValidationResult.Success
: new ValidationResult("Value needs to be even");
}
}

These attributes will be read and used by various services injected at startup. Everything can be changed, so for example you may change the validation system to interpret the RangeAttribute values differently or ignore RequiredAttribute or use custom attributes. Attribute classes only mark the intent of the developer, but do almost nothing themselves.

Models


I've mentioned previously that models are used to move data back and forth. Model binding is responsible for taking HTTP requests and turning their parameters into C# classes. Services then use those models, like EntityFramework, or the validation system or the Razor views. You've seen in the previous example how an object may be read from the body of a request. Similarly, they can be read from the HTTP parameters sent to the action method. Read an example of an investigation to see how various methods of model binding can be used with different attributes.

An important use case for models is validation. Some of it was demonstrated above. Read more in the documentation. An interesting part of it is the client validation that is implemented out of the box with the right javascript imports and using the right attributes.

Views


In ASP.Net Forms, the code and the presentation were (somewhat) separated into .aspx markup and .cs codebehind. The aspx syntax is probably isomorphic with the Razor syntax and I remember that at one time you could use ASP.Net Forms with Razor. In MVC, views have code in them, using Razor, but they are not strongly coupled with a specific piece of code. In fact, one can reuse a view for multiple models, especially the partial ones - which take over from UserControls, I guess. So in fact there is quite an overlap between ASP.Net Forms and MVC, if you add a separate injection mechanism to Forms in order to decouple markup and codebehind.

For views, the biggest difference as far as I am concerned is the encapsulation of reusable content, what before were controls. Panels, Grids, UserControls, all of them inherited from a Control class that handled the various ASP.Net phases of its lifecycle. There was no job interview in which you weren't asked about the ASP.Net lifecycle and now it's irrelevant. Nowadays, you render things from the markup up, with focus on the client side. HTML helpers and Tag Helpers are what allows you to encapsulate some rendering logic.

What caused this switch to a new paradigm? Well, I would say HTML5 and javascript frameworks. You would have a wonderful Grid control rendering a nice table layout and the developer would shout foul because he wants everything with DIVs. You would have a nice Calendar extension control and the dev would dismiss it immediately because he wasn't to use the latest jQueryUI client side calendar. Most of all, it would be because the web designer would use Microsoft agnostic tools that create pure HTML and then the poor dev would have to reverse engineer that in order to get the same layout with default controls. Today a grid is only a DIV, a Razor @foreach and a template for the rows using the values of the items displayed. Certainly all of this can be encapsulated further into your own library of HTML helpers, but you would have complete control over it.

In ASP.Net MVC Core partial views will be superseded by View Components. If you thought the Microsoft interpretation of MVC was a little vague, this will make your head explode. View Components are most similar to controllers, only you can't call them directly via HTTP, are not part of the controller lifecycle and can't use filters. They have views associated with them in /Views/ControllerName or Shared/Components/ViewComponentName/ViewName. You may invoke them directly from a controller or from a view, using the wonderfully ridiculous syntax
@Component.InvokeAsync("Name of view component", <anonymous type containing parameters>)

If not specified by the user, views are discovered by their location in the project. Specifying the view means specifying the exact path of the cshtml file, an ugly and not recommended solution. Otherwise, when you just return View();, MVC looks in /Views/ControllerName/ViewName.cshtml and then in /Views/Shared/ViewName.cshtml. As we are accustomed, this default behavior can be changed by implementing a different IViewLocationExpander.

You may specify a model type for a view, which helps a lot with Intellisense. Otherwise, you may render server side data using the viewbag ViewData or you may use the @model keyword, which allows dynamic use of properties, but doesn't help much with Intellisense. Using the wrong property name will generate runtime errors.

Needless to say, before I actually go into the code, views are a bit of a mystery to me as well. They also clash a bit with the architecture of an application that makes most sense to me: API + client side code. I feel I need to discuss this, so...

Types of MVC application architecture


In .NET proper ASP.Net MVC and ASP.Net Web API were two different things, with huge overlap of functionality. In .NET Core, they are the same thing, gathered under the umbrella of MVC. A controller action can receive AJAX calls in JSON format and return properly formatted HTML5 markup, for example. It is very difficult to find a reasonable way to separate the two concepts in .NET. However, there are two major ways of using them, separating them by use, as it were. The MVC application that uses controllers and views is still a product of turning ASP.Net Forms into a Ruby on Rails clone. While the overall architecture of the application has changed, giving more control to the developers, it also constrains them into a type of functional architecture that may be - frankly - obsolete already.

There are three types of architectures that I will discuss for a very simple application that displays news items using their title, description, url and image:
  1. starting from an ASP.Net Forms page and a list of NewsItem objects in C#, we use an .aspx page that contains a Grid control. We define the way the title is rendered as a link, the description as a short text block and the image as a side thumbnail.
  2. starting from an ASP.Net MVC controller and a list of NewsItem objects in C#, we render a view which uses a Razor @foreach to display sections with a title link, a description and a thumbnail.
  3. starting from an HTML page, we fire an AJAX call to a .NET API that returns a Javascript array of NewsItem objects, that then we render as sections with a title link, a description and a thumbnail, maybe by using an MVC client-side library like AngularJS.

See what I mean? The first two versions are basically the same. Whether the mechanism for rendering comes from a Forms Control or from a Razor loop is irrelevant to the overall design of the app. The third, though, presents some very interesting ideas:
  • The website is not a .NET website. It's pure HTML. It can be served from any type of server, on any platform, can be created with any tools.
  • There is no visual interface on the actual .NET server side. It's a simple API that sends and receives data in serialized form.
  • The MVC architecture moves towards the client side, where views, models and controllers are just Javascript code, HTML and CSS.
  • There is a very clear functional separation of concerns. There is server side development: C#, serialization and persistence of data, sensitive or resource intensive processing, the good ole things that .NET developers love. And there is the client side development: HTML, Javascript, CSS, responsive design, native mobile apps and all that crap that designers and frontend developers do.

(Again, joking, dear frontend or mobile developer! Just making sure you weren't asleep)

The conclusions are staggering, actually. With no concern for presentation, the server side API framework can be incredibly tiny. Efforts can be turned towards making it efficient, fast, secure, using less resources, being scalable. There is no need for a Razor engine, HTML helpers, partial views, View Components, no one cares about them. Instead what it enables is working with any kind of client side user interface. Mobile native apps from all platforms, multiple web sites, other APIs, they all could just attach to the API and present functionality. Meanwhile, the client side interface developer is exempt of all the dependencies on Microsoft tools, of concerns over how many servers they are, where they are located, the general background functionality of the application.

I've worked in such a way, it was great! People working on iOS, Android and web applications would just come to me and ask for an API that does this and that. After days of fighting over how the API signature should look :) everyone would just do what they are good at. Even more, because we were so different, bugs were easier to discover when we tried to connect our work over this simple interface.

The downsides are blessings in disguise. As an API call needs to finish quickly and return small quantities of data, the developer is forced to consider from the get go things like: pagination, chunking, asynchronous programming, concurrency, etc. Instead of importing a list of URLs for the news and then waiting for the output of the page while the server side is spidering the data, the app needs to show "importing data, please wait" and then periodically query the API if the import is finished. When hundreds of people try to do this, there is no problem, as the list of links to spider just grows and the same process extracts the data. If two users import the same links, they only get spidered once.

Even if the application that we will be working on is not based on this design, consider from the beginning if you even need to use ASP.Net in an MVC way. The world is moving away from "applications" and towards "services". Perhaps the API itself would only be a front that accesses other APIs in the background, microservices that are optimized perfectly for the tiny bit they perform.

Data Access Architecture


A small rant against ORMs


Just like with the overall design, one may use different ways of accessing data. The ASP.Net MVC guide for working with data suggests a single clear path: the Microsoft ORM Entity Framework. As I am still to use it in any serious capacity, I will not explain EF concepts here. I will ask you a question instead: do you even need an ORM?

Object Relational Mappers are tools that abstract the database from the viewpoint of a developer. They work with contexts and sets and strongly types objects and have great Intellisense support. Started as a way to map an existing database to a .NET data framework, Entity Framework now goes the other direction: code first! You start writing your app, using Entity Framework just like you would already have everything you need and it creates and maintains the database in the background. Switching from SQL Server to PostgreSQL or SQLite or even a custom data persistence method is a breeze. They sound great!

However, if you already know what persistence model you use and are proficient in designing and optimizing the data structure there, using an ORM starts to lose its appeal. Just as with ASP.Net Forms, you have no control over the way the ORM chooses to communicate with the database. It may do better than you or it may do horribly bad. You start developing your app, everything works fine, you add feature after feature and when you finally load the actual real life data something goes wrong and you have no idea what and where.

There already are patterns of abstracting the data access and usually it involves using the data from a separate library (or service) that encapsulates the desired behavior and is structured by intention. Why would I get all NewsItems, when there are millions of them I and in no situation I can conceive would I need all of them? Why would I get a NewsItem by Id, when the Id means nothing to me and things like the URL are more relevant? Why would I choose to store in memory all the items I want to delete, when my condition for them to disappear is a simple WHERE condition?

OK, OK, I know that if you worked with Entity Framework you have a lot of (good) answers to all of these questions. Yet my original question still needs to be considered before you embark on your development journey: do you even need Entity Framework?

The main disadvantages I see for Entity Framework specifically are:
  • It diffuses the API for working with data. Instead of writing a NewsItemManager class that gives you items by url and by date, for example, the developer is tempted to write custom queries inside the logic of the application. This leads to difficulties refactoring the code or redesigning the application.
  • It hides the complexity of the database. Instead of working with the actual stored data, you work with an abstraction that may look good to you, but hides problems that you are tempted to ignore.
  • It forces switches of competencies. If you want to debug and optimize your data access you now need an Entity Framework expert, rather than a database expert.
  • It causes technical debt that you are not even aware of. From this list, I believe this to be the most insidious. There is a chance, that may be very small, that your application needs a functionality that Entity Framework was not designed for. EF works great in any other area except that one. And when you try to fix it, you have several options that are all horrible: create a separate system for it, hack Entity Framework into submission, leave it slow and bad because everything works so well otherwise. At this point, when you notice there is a problem, it's already too late

In our application I will gladly use Entity Framework. It seems some of the basic functionality of MVC, like identity, are strongly designed to work together with the Entity Framework data abstraction. Yet even so, I will try to abstract the data layer - mostly because I have no need to implement it for this demo. This will probably lead to an interesting consequence: the default MVC modules will use EF in a way, while I will use it for my application in another way.

Entity Framework concepts


An actual advantage of EF that I think is great is the concept of migrations. EF is able to save modifications to the data layer as C# code files that can be added to source control. This helps a lot when working in teams of multiple people.

As an aside, I was working for a project that used stored procedures to access the database. The data access layer was getting and changing data using these functions and procedures that were saved in a folder as .SQL create files. It was easy during deployment to delete all procedures and functions and then recreate them, but how about database schema or data changes? For this we used a folder of .SQL changes. For each file we needed to create also a rollback file, to "fix" whatever this was doing. They were difficult to manage at first, but after a while you got the hang of it. I wonder if Entity Framework allows for this kind of workflow. That would be great. Aside over.

The root of an EF model seems to be the context. Inheriting from DbContext (or as in the default template app from IdentityDbContext<ApplicationUser>, coupling it inexorably with the identity of the user), this class need not have a lot of own code at first. As time goes by, one point changes to the data mechanism are probably hooked here. The DbContext will have properties of type DbSet<SomeEntity> which will be used to queries said entities. A simple services.AddDbContext<MyDbContext>() in the startup class declares your willingness to work with a context or another.

A mix of conventions and attributes defines the mapping between your context and the underlying database. A good link to explore this can be found here.

Another interesting quality of Entity Framework is that you can use it in memory, very useful with automated testing. Here is a link that explains it.

Using LInQ to Entities and the DbSet properties of the context, one can create, read, update and delete records, but there are some differences from what you may be used to. The delete or update operations by default need to first retrieve the items, then alter them. A good intro to the changes in Entity Framework 7 can be found here.

The pattern used by Entity Framework is called "unit of work". If you want to go down the rabbit hole, look it up. A nice article about it and some possible improvements can be found here.

An interesting reason for using Entity Framework would be for when you don't have a lot of control over your persistence medium. I haven't worked with "the cloud" yet, but basically they give you some services and tax you for using them. If EF can abstract that away and minimize cost, it would be a boon, but I have no information about this.

Miscelaneous


The post cannot be complete without some concepts like:
In this series I will not go into details for many of them, so read the info the .NET team has prepared for each subject.

Leaving so soon?


The next post will be about authentication, more exploratory and with code examples.

and has 0 comments

The Brain that Changes Itself is a remarkable book for several reasons. M.D. Norman Doidge presents several cases of extraordinary events that constitute proof for the book's thesis: that the brain is plastic, easy to remold, to adapt to the data you feed it. What is astonishing is that, while these cases are not new and are by far not the only ones out there, the medical community is clinging to the old belief that the brain is made of clearly localized parts that have specific roles. Doidge is trying to change that.

The ramifications of brain plasticity are wide spread: the way we learn or unlearn things, how we fall in love, how we adapt to new things and we keep our minds active and young, the way we would educate our children, the minimal requirement for a computer brain interface and so much more. The book is structured in 11 chapters and some addendums that seem to be extra material that the author didn't know how to properly format. A huge part is acknowledgements and references, so the book is not that large.

These are the chapters, in order:

  • Chapter 1 - A Woman Perpetually Falling. Describes a woman that lost her sense of balance. She feels she is falling at all times and barely manages to walk using her sight. Put her in front of a weird patterned rug and she falls down. When sensors fed information to an electrode plate on her tongue she was able to have balance again. The wonder comes from the fact that a time after removing the device she would retain her sense. The hypothesis is that the receptors in her inner ear were not destroyed, by damaged, leaving some in working order and some sending incorrect information to the brain. Once a method to separate good and bad receptors, the brain immediately adapted itself to use only the good ones. The doctor that spearheaded her recovery learned the hard way that the brain is plastic, when his father was almost paralyzed by a stroke. He pushed his father to crawl on the ground and try to move the hand that wouldn't move, the leg that wouldn't hold him, the tongue that wouldn't speak. In the end, his father recovered. Later, after he died from another stroke while hiking on a mountain, the doctor had a chance to see the extent of damage done by the first stroke: 97% of the nerves that run from the cerebral cortex to the spine were destroyed.
  • Chapter 2 - Building Herself a Better Brain. Barbara was born in the '50s with an brain "asymmetry". While leaving a relatively normal life she had some mental disabilities that branded her as "retarded". It took two decades to stumble upon studies that showed that the brain was plastic and could adapt. She trained her weakest traits, the ones that doctors were sure to remain inadequate because the part in the brain "associated" with it was missing and found out that her mind adapted to compensate. She and her husband opened a school for children with disabilities, but her astonishing results come from when she was over 20 years old, after years of doctors telling her there was nothing to be done.
  • Chapter 3 - Redesigning the Brain. Michael Merzenich designs a program to train the brain against cognitive impairments or brain injuries. Just tens of hours help improve - and teach people how to keep improving on their own - from things like strokes, learning disabilities, even conditions like autism and schizophrenia. His work is based on scientific experiments that, when presented to the wider community, were ridiculed and actively attacked for the only reason that they went against the accepted dogma.
  • Chapter 4 - Acquiring Tastes and Loves. Very interesting article about how our experiences shape our sense of normalcy, the things we like or dislike, the people we fall for and the things we like to do with them. The chapter also talks about Freud, in a light that truly explains how ahead of his time he was, about pornography and its effects on the brain, about how our pleasure system affects both learning and unlearning and has a very interesting theory about oxytocin, seeing it not as a "commitment neuromodulator", but as a "demodulator", a way to replastify the part of the brain responsible for attachments, allowing us to let go of them and create new ones. It all culminates with the story of Bob Flanagan, a "supermasochist" who did horrible things to his body on stage because he had associated pain with pleasure.
  • Chapter 5 - Midnight Resurrection. A surgeon has a stroke that affects half of his body. Through brain training and physiotherapy, he manages to recover - and not gain magical powers. The rest of the chapter talks about experiments on monkeys that show how the feedback from sensors rewires the brain and how what is not used gets weaker and what is used gets stronger, finer and bigger in the brain.
  • Chapter 6 - Brain Lock Unlocked. This chapter discusses obsessions and bad habits and defective associations in the brain and how they can be broken.
  • Chapter 7 - Pain: The Dark Side of Plasticity. A plastic brain is also the reason why we strongly remember painful moments. A specific case is phantom limbs, where people continue to feel sensations - often the most traumatic ones - after limbs have been removed. The chapter discusses causes and solutions.
  • Chapter 8 - Imagination: How Thinking Makes It So. The brain maps for skills that we imagine we perform change almost as much as when we are actually doing them. This applies to mental activities, but also physical ones. Visualising doing sports prepared people for the moment when they actually did it. The chapter also discusses how easily the brain adapts to using external tools. Brain activity recorders were wired to various tools and monkeys quickly learned to use them without the need for direct electric feedback.
  • Chapter 9 - Turning Our Ghosts into Ancestors. Discussing the actual brain mechanisms behind psychotherapy, in the light of what the book teaches about brain plasticity, makes it more efficient as well as easier to use and understand. The case of Mr. L., Freud's patient, who couldn't keep a stable relationship as he was always looking for another and couldn't remember his childhood and adolescence, sheds light on how brain associates trauma with day to day life and how simply separating the two brain maps fixes problems.
  • Chapter 10 - Rejuvenation. A chapter talking about the neural stem cells and how they can be activated. Yes, they exist and they can be employed without surgical procedures.
  • Chapter 11 - More than the Sum of Her Parts. A girl born without her left hemisphere learns that her disabilities are just untrained parts of her brain. After decades of doctors telling her there is nothing to be done because the parts of her brain that were needed for this and that were not present, she learns that her brain can actually adapt and improve, with the right training. An even more extreme case than what we saw in Chapter 2.


There is much more in the book. I am afraid I am not making it justice with the meager descriptions there. It is not a self-help book and it is not popularising science, it is discussing actual cases, the experiments done to back what was done and emits theories about the amazing plasticity of the brain. Some things I took from it are that we can train our brain to do almost anything, but the training has to follow some rules. Also that we do not use gets discarded in time, while what is used gets reinforced albeit with diminishing efficiency. That is a great argument to do new things and train at things that we are bad at, rather than cement a single scenario brain. The book made me hungry for new senses, which in light of what I have read, are trivial to hook up to one's consciousness.

If you are not into reading, there is an one hour video on YouTube that covers about the same subjects:

[youtube:sK51nv8mo-o]

Enjoy!

Following my post about things I need to learn, I've decided to start a series about writing an ASP.Net MVC Core application, covering as much ground as possible. As a result, this experience will cover .NET Core subjects and a thorough exploration of ASP.Net MVC, plus some concepts related to Visual Studio, project structure, Entity Framework, HTML5, ECMAScript 6, Angular 2, ReactJs, CSS (LESS/SASS), responsive design, OAuth, OData, shadow DOM, etc.

Learning ASP.Net MVC series:
  1. Setup
  2. MVC Concepts
  3. Authentication
  4. Entity Framework Fundamentals
  5. Upgrading project to .NET Core 1.1
  6. Dependency Injection and Services

Specifications


In order to start any project, some specifications need to be addressed. What will the app do and how will it be implemented? I've decided on a simplified rewrite of my WPF newsletter maker project. It gathers subjects from Google, by searching for configurable queries, it spiders the contents, it displays them, filters them, sorts them, extracting text and analyzing content. It remembers the already loaded URLs and allows for marking them as deleted and setting a category. It will be possible to extract the items that have a category into a newsletter containing links, titles, short descriptions and maybe a picture.

The project will be using ASP.Net Core MVC, combining the API and the display in a single web site (at least for now). Data will be stored in SQLite via Entity Framework. Later on the project will be switched to SQL Server to see how easy that is. The web site itself will have HTML5 structure, using the latest semantic elements, with the simplest possible CSS. All project owned Javascript will be ECMAScript6. OAuth might be needed for using the Google Search API, and I intend to use Google/Facebook/Twitter accounts to log in the application, with a specific account marked in the configuration as Administrator. The IDE will be Visual Studio (not Code). The Javascript needs to be clean, with no CSS or HTML in it, by using CSS classes and HTML templates. The HTML needs to be clean, with no Javascript or styling in it; moreover it needs to be semantically unambiguous, so as to be easily molded with CSS. While starting with a desktop only design, a later phase of the project will revamp the CSS, try to make the interface beautiful and work for all screen formats.

Not the simplest of projects, so let's get started.

Creating the project


I will be using Visual Studio 2015 Update 3 with the .Net Core Preview2 tooling. Personally I had a problem installing the Core tools for Visual Studio, but this link solved it for me with a command line switch (short version: DotNetCore.1.0.0-VS2015Tools.Preview2.exe SKIP_VSU_CHECK=1). First step is to create a New Project → Visual C# → .NET Core → ASP.NET Core Web Application. I will name it ContentAggregator. To the prompt asking which type of project template I want to choose, I will select Web Application, deselect Microsoft Azure Host in Cloud checkbox which for whatever reason is checked by default and click on Change Authentication to select Individual User Accounts.



Close the "Welcome to ASP.Net Core" page, because the template will be heavily changed by the time we finish this chapter.

The default template project


For a more detailed analysis of a .NET Core web project, try reading my previous post of the dotnet default template for web apps. This one will be quick and dirty.

Things to notice:
  • There is a global.json file that lists two projects, src and test. So this is how .NET Core solutions were supposed to work. Since the json format will be abandoned by Microsoft, there is no point of exploring this too much. Interesting, though, while there is a "src" folder, there is no "test".
  • The root folder contains a ContentAggregator.sln file and the src folder contains a ContentAggregator folder with a ContentAggregator.xproj file. Core seems to have abandoned the programming language dependent extension for project files.
  • The rest of the project seems to be pretty much the default dotnet tool one, with the following differences:
  • the template uses SQL Server by default
  • the lib folder in wwwroot is already populated with libraries

So far so good. There is also the little issue of the database. As you may remember from the post about the dotnet tool template, there were some files that needed to initialize the database. The error message then said "In Visual Studio, you can use the Package Manager Console to apply pending migrations to the database: PM> Update-Database". Is that what I have to do? Also, we need to check what the database settings are. While I do have an SQL Server instance on this computer, I haven't configured anything yet. The Project_Readme.html page is not very useful, as the link Run tools such as EF migrations and more goes to an obsolete link on github.io (the documentation seems to have moved to a microsoft.com server now).

I *could* read/watch a tutorial, but what the hell? Let's run the website, see what it does! Amazingly, the web site starts, using IIS Express, so I go to Register, to see how the database works and I get the same error about the migrations. I click on the Apply Migrations button and it says the migrations have been applied and that I need to refresh. I do that and voila, it works!

So, where is the database? It is not in the bin folder as WebApplication.db like in the Sqlite version. It's not in the SQL Server, the service wasn't even running. The DefaultConnection string looks like "Server=(localdb)\\mssqllocaldb;Database=aspnet-ContentAggregator-7fafe484-d38b-4230-b8ed-cf4a5a8df5e1;Trusted_Connection=True;MultipleActiveResultSets=true". What's going on? The answer lies in the SQL Server Express LocalDB instance that Visual Studio comes with.

Changing and removing shit


To paraphrase Antoine de Saint-Exupéry, this project will be set up not when I have nothing else to add, but when I have nothing else to remove.

First order of business it to remove SQL Server and use SQLite instead. Quite the opposite of how I have pictured it, but hey, you do what you must! In theory all I have to do it replace .UseSqlServer with .UseSqlite and then adjust the DefaultConnection string from appsettings.json with something like "Data Source=WebApplication.db". Done that, fixed the namespaces and imported projects, ran the solution. Migration error, Apply Migrations, re-register and everything is working. WebApplication.db and everything.

Second is to remove all crap that I don't need right now. I may need it later, so at this point I am backing up my project. I want to remove:
  • Database - yeah, I know I just recreated it, but I need to know what those migrations contained and if I even need them, considering I want to register with OAuth only
  • Controllers - probably I will end up recreating them, but we need to understand why things are how they are
  • Models - we'll do those from scratch, too
  • Services - they were specific to the default web site, so poof! they're gone.
  • Views - the views will be redesigned completely, so we delete them also
  • Client libraries - we keep jQuery and jQuery validation, but we remove bootstrap
  • CSS - we keep the site.css file, but remove everything in it
  • Javascript - keep site.js, but empty
  • Other assets like images - removed

"What the hell, I read so much of this blog post just for you to remove everything you did until now?" Yes! This part is the project set up and before its end we will have a clean white slate on which to create our masterpiece.

So, action! Close Visual Studio. Delete bin (with the db file in it) and obj, delete all files in Controllers, Data, Models, Services, Views. Empty files site.css and site.js, while deleting the .min versions, delete all images, Project_Readme.html and project.lock.json. In order to cleanly remove bootstrap we need to use bower. Run
bower uninstall bootstrap
which will remove bootstrap, but won't remove it from bower.json, so remove it from there. Reopen Visual Studio and the project, wait until it restores the packages.

When trying to compile the project, there are some errors, obviously. First, namespaces that don't exist anymore, like Controllers, Models, Data, Services. Remove the offending usings. Then there are services that we wanted to inject, like SMS and Email, which for now we don't need. Remove the lines that try to use them under // Add application services. The rest of the errors are about ApplicationDbContext and ApplicationUser. Comment them out. These are needed for when we figure out how the project is going to preserve data. Later on a line in Startup.cs will throw an exception ( app.UseIdentity(); ) so comment it out as well.

Finishing touches


Now the project compiles, but it does nothing. Let's finish up by adding a default Controller and a View.

In Visual Studio right click on the Controllers folder in the Solution Explorer and choose Add → Controller → MVC Controller - Empty. Let's continue to name it HomeController. Go to the Views folder, create a new folder called Home. Now you might think that right clicking on it and selecting Add → View would work, but it doesn't. The Add button stubbornly remains disabled unless you specify a template and a model and other stuff. It may be useful later on, but at this stage ignore it. The way to add a view now is go to Add → New Item → MVC View Page. Create an Index.cshtml view and empty its contents.

Now run the application. It should show a wonderfully empty page with no console errors. That's it for our blank project setup. Check out the source code for this point of the exploration. Stay tuned for the real fun!

I've run into a very interesting discussion on StackOverflow regarding the significant decrease in execution time when using 'use strict' in Javascript.

Basically, there was a simple function added to the prototype of string to count the occurrences of a specific character in the string. Adding
'use strict';
to the function would make it run ten times faster. The answer boiled down to the fact that for normal Javascript the keyword this forces the type to 'object': typeof(this)==='object', while in strict mode it's whatever the object is (in this case string). In other words, to emulate the same behavior without using strict mode we need to "cast" this to the type we need, using a variable
var str=this+'';

It was a cool thing to learn, with the added bonus that I found out about using console.time to measure Javascript performance as well.

Update: oh, by the way, the performance decreased 4 times! by using .charAt instead of the indexer in order to get the character at a certain position.

and has 0 comments
After a month of trial with no one to complain of HTTPS issues, I've decided to set the blog to redirect normal connections to the secure URL. Let me know if you experience any problems.

In September last year I was leaving my job and starting a sabbatical year, with many plans for what seemed then like a lot of time in which to do everything. I was severely underestimating my ability to waste time. Now the year is almost over and I need to start thinking about the technologies in my field of choice that I need to catch up with; and, boy, there is a lot of them! I leave the IT industry alone for one year and kaboom! it blows up like an angry volcano. To be honest, not all of these things that are new for me are just one year old, some I was just ignoring as I didn't need them for my various jobs. Learn from this, as especially in the software business it gets harder and harder to keep up to date and easier and easier to live in a bubble of your own or your employer's creation.

This post is about a list of programming topics that I would like to learn or at least learn to recognize. It's work in progress and probably I will update it for a time. While on my break I created a folder of software development stuff that I would "leave for later". As you can imagine, it got quite large. Today I am opening it for the first time. Be afraid. Be very afraid. I also have a lot of people, either friends or just casual blog or Twitter followings, that constantly let me know of what they are working on. As such, the list will not be very structured, but will be large. Let's begin.

A simple list would look like this. Let me set the list style to ordered list so you can count them:
  1. Typescript 2
  2. ReactJS
  3. JSX
  4. SignalR
  5. Javascript ES6
  6. Xamarin
  7. PhoneGap
  8. Ionic
  9. NodeJS
  10. .NET Core
  11. ASP.Net MVC
  12. R
  13. Python
  14. Unity
  15. Tensorflow
  16. DMTK/CNTK
  17. Visual Studio Code
  18. Jetbrains Project Rider
  19. npm
  20. Bower
  21. Docker
  22. Webpack
  23. Kubernetes
  24. Deep Learning
  25. Statistics
  26. Data mining
  27. Cloud computing
  28. LESS
  29. SASS
  30. CSSX
  31. Responsive design
  32. Multiplatform mobile apps
  33. Blockchains
  34. Parallel programming
  35. Entity Framework
  36. HTML5
  37. AngularJS 2
  38. Cryptography
  39. OpenCV
  40. ZeroNet
  41. Riffle
  42. Bots
  43. Slack
  44. OAuth
  45. OData
  46. DNS
  47. Bittorrent
  48. Roslyn
  49. Universal Windows Platform / Windows 10 development
  50. Katana
  51. Shadow DOM
  52. Serverless architecture
  53. D3 and D4 (d3-like in ReactJs)
  54. KnockoutJs
  55. Caliburn Micro
  56. Fluent Validation
  57. Electron

Yup, there are more than 50 general concepts, major frameworks, programming languages, tools and what not, some of them already researched but maybe not completely. That is not including various miscellaneous small frameworks, pieces of code, projects I want to study or things I want to do. I also need to prioritize them so that I can have at least the semblance of a study plan. Being July 21st, I have about one full month in which to cover the basic minimum. Clearly almost two subjects a day every day is too ambitious a task. Note to self: ignore that little shrieky voice in your head that says it's not!

Being a .NET developer by trade I imagine my next job will be in that area. Also, while I hate this state of affairs, notice there is nothing related to WPF up there. The blogs about the technology that I was reading a few years ago have all dried up, with many of those folks moving to the bloody web. So, I have to start with:

  1. ASP.Net MVC Core - the Model View Controller way of making .NET web applications, I've worked with it, but I am not an expert, as I need to become. Some quickly googled material:
  2. .NET Core - the new version of .NET, redesigned to be cross platform. There is no point of learning .NET Core as standalone: it will be used all over this plan
  3. Entity Framework Core - honestly, I've moved away from ORMs, but clearly Microsoft is moving full steam ahead on using EF everywhere, so I need to learn it well. As resources, everything written or recommended by Julie Lerman should be good, but a quick google later:
  4. OData - an OASIS standard that defines a set of best practices for building and consuming RESTful APIs. When Microsoft adopts an open standard, you pretty much know it will enter the common use vocabulary as a word used more often than "mother". Some resources:
  5. OAuth - An open protocol to allow secure authorization in a simple and standard method from web, mobile and desktop applications. It is increasingly used as "the" authentication method, mostly because it allows for third party integration with Facebook, Twitter, Google and other online identity providers. Some resources:
  6. Typescript 2 - a strict superset of JavaScript from Microsoft, it adds optional static typing and class-based object-oriented programming to the language. Javascript is great, but you can use it in any way you want. There is no way to take advantage of so many cool features of modern IDEs like Visual Studio + ReSharper without some sort of structure. I hope Typescript provides that for me. Resources:
  7. NodeJS - just when I started liking Javascript as a programming language, here comes NodeJs and brings is everywhere! And that made me like it less. Does that make sense? Anyway, with Microsoft tools needing NodeJs for various reasons, I need to look into it. Resources:
  8. Javascript ES6 - the explosion of Javascript put a lot of pressure on the language itself. ECMAScript6 is the newest iteration, adding support for a lot of features that we take for granted in more advanced languages, like classes, block variable scope, lambdas, multiline literals, better regular expressions, modules, etc. I intend to rewrite my browser extension in ES6 Javascript for version 3, among other things. Here are some resources:
  9. npm - npm is a package installer for Javascript. Everybody likes to use it so much that I believe it will soon become an antipattern. Functions like extracting the left side of a string, for example, are considered "packages".
  10. Bower - Bower is a package manager for the web, an attempt to maintain control over a complex ecosystem of web frameworks and libraries and various assets.
  11. Docker - The world’s leading software containerization platform - I don't even know what that means right now - Docker is a tool that I hear more and more about. In August I will even attend an ASP.Net Core + Docker presentation by a Microsoft guy.
  12. Parallel programming - I have built programs that take advantage of parallel programming, but never in a systematic way. I usually write stuff as a single thread, switching to multithreaded work to solve particular problems or to optimize run time. I believe that I need to write everything with parallelism in mind, so I need to train myself in that regard.
  13. Universal Windows Platform - frankly, I don't even know what it means. I am guessing something that brings application development closer to the mobile device/store system, which so far I don't enjoy much, but hey, I need to find out at least what the hell this is all about. The purpose of this software platform is to help develop Metro-style apps that run on both Windows 10 and Windows 10 Mobile without the need to be re-written for each. Resources:
  14. HTML5 - HTML5 is more than a simple rebuttal of the XHTML concept and the adding of a few extra tags and attributes. It is a new way of looking at web pages. While I've used HTML5 features already, I feel like I need to understand the entire concept as a whole.
  15. Responsive design - the bane of my existence was web development. Now I have to do it so it works on any shape, size or DPI screen. It has gone beyond baneful; yet every recruiter seems to have learned the expression "responsive design" by heart and my answer to that needs to be more evolved than a simple "fuck you, too!"
  16. LESS and SASS - CSS is all nice and functional, but it, just like HTML and Javascript, lacks structure. I hope that these compilable-to-CSS frameworks will help me understand a complex stylesheet like I do a complex piece of code.
  17. AngularJS 2 - I hear that Angular 2 is confusing users of Angular 1! which is funny, because I used Angular just for a few weeks without caring too much about it. I've read a book, but I forgot everything about it. Probably it is for the best as I will be learning the framework starting directly with version 2.

So there you have it: less than 20 items, almost two days each. Still bloody tight, but I don't really need to explore things in depth, just to know what they are and how to use them. The in-depth learning needs to come after that, with weeks if not months dedicated to each.

What about the rest of 35 items? Well, the list is still important as a reference. I intend to go through each, however some of the concepts there are there just because I am interested in them, like DNS, Riffle, Bitcoin and Bittorrent, not because they would be useful at my job or even my current side projects. Data mining and artificial intelligence is a fucking tsunami, but I can't become an expert in something like this without first becoming a beginner, and that takes time - in which the bubble might burst, heh heh. Mobile devices are all nice and cool, but the current trend is for tablets to remain a whim, while people divide their experience between laptops and big screen smartphones. The web takes over everything and I dread that the future is less about native apps and more about web sites. What are native mobile apps for? Speed and access to stuff a browser doesn't usually have access to. Two years and a new API later and a web page does that better. APIs move faster nowadays and if they don't, there are browser extensions that can inject anything and work with a locally installed app that provides just the basic functionality.

What do you think of my list? What needs to be added? What needs to be removed? Often learning goes far smoother when you have partners. Anyone interested in going through some subjects and then discuss it over a laptop and a beer?

Wish me luck!