Dependency Injection, Inversion of Control, Testability and other nice things

Published Dec 3, 2016

I am mentally preparing for giving a talk about dependency injection and inversion of control and how are they important, so I intend to clarify my thoughts on the blog first. This has been spurred by seeing how so many talented and even experienced programmers don't really understand the concepts and why they should use them. I also intend to briefly explore these concepts in the context of programming languages other than C#.

And yes, I know I've started an ASP.Net MVC exploration series and stopped midway, and I truly intend to continue it, it's just that this is more urgent.

Head on intro

So, instead of going to the definitions, let me give you some examples, instead.

public class MyClass {
  public IEnumerable<string> GetData() {
    var provider=new StringDataProvider();
    var data=provider.GetStringsNewerThan(DateTime.Now-TimeSpan.FromHours(1));
    return data;
  }
}

In this piece of code I create a class that has a method that gets some text. That's why I use a StringDataProvider, because I want to be provided with string data. I named my class so that it describes as best as possible what it intends to do, yet that descriptiveness is getting lost up the chain when my method is called just GetData. It is called so because it is the data that I need in the context of MyClass, which may not care, for example, that it is in string format. Maybe MyClass just displays enumerations of objects. Another issue with this is that it hides the date and time parameter that I pass in the method. I am getting string data, but not all of it, just for the last hour. Functionally, this will work fine: task complete, you can move to the next. Yet it has some nagging issues.

Dependency Injection

Let me show you the same piece of code, written with dependency injection in mind:

public class MyClass {
  private IDataProvider _dataProvider;
  private IDateTimeProvider _dateTimeProvider;

  public void MyClass(IDataProvider dataProvider, IDateTimeProvider dateTimeProvider) {
    this._dataProvider=dataProvider;
    this._dateTimeProvider=dateTimeProvider;
  }

  public IEnumerable<string> GetData() {
    var oneHourBefore=_dateTimeProvider.Now-TimeSpan.FromHours(1);
    var data=_dataProvider.GetDataNewerThan(oneHourBefore);
    return data;
  }
}

A lot more code, but it solves several issues while introducing so many benefits that I wonder why people don't code like this from the get go.

Let's analyse this for a bit. First of all I introduce a constructor to MyClass, one that accepts and caches two parameters. They are not class types, but interfaces, which declare the intention for any class implementing them. The method then does the same thing as in the original example, using the providers it cached. Now, when I write the code of the class I don't actually need to have any provider implementation. I just declare what I need and worry about it later. I also don't need to inject real providers, I can mock them so that I can test my class as standalone. Note that the previous implementation of the class would have returned different data based on the system time and I had no way to control that behavior. The best benefit, for me, is that now the class is really descriptive. It almost reads like English: "Hi, folks, I am a class that needs someone to give me some data and the time of day and I will give you some processed data in return!". The rule of thumb is that for each method, external factors that may influence its behavior must be abstracted away. In our case if the date time provider provides the same time and the data provider the same data, the effect of the method is always the same.

Note that the interface I used was not IStringDataProvider, but IDataProvider. I don't really care, in my class, that the data is a bunch of strings. There is something called the Single Responsibility Principle, which says that a class or a method or some sort of unit of computation should try to only have one responsibility. If you change that code, it should only affect one area. Now, real life is a little different and classes do many things in many directions, yet they can implement any number of interfaces. The interfaces themselves can declare only one responsibility, which is why this is so nice. I don't actually have to have a class that is only a data provider, but in the context of my class, I only need that part and I am clearly declaring my intent in the code.

This here is called dependency injection, which is a fancy expression for saying "my code receives all third party instances as parameters". It is also in line with the Single Responsibility Principle, as now your class doesn't have to carry the responsibility of knowing how to instantiate the classes it needs. It makes the code more modular, easier to test, more legible and more maintainable.

But there is a problem. While before I was using something like new MyClass().GetData(), now I have to push the instantiation of the providers somewhere up the stream and do maybe something like this:

var dataProvider=new StringDataProvider();
var dateTimeProvider=new DateTimeProvider();
var myClass=new MyClass(dataProvider,dateTimeProvider);
myClass.GetData();

The apparent gains were all for naught! I just pushed the same ugly code somewhere else. But here is where Inversion of Control comes in. What if you never need to instantiate anything again? What it you never actually had to write any new Something() code?

Inversion of Control

Inversion of Control actually takes over the responsibility of creating instances from you. With it, you might get this code instead:

public interface IMyClass {
  IEnumerable<string> GetData();
}

public class MyClass:IMyClass {
  private IDataProvider _dataProvider;
  private IDateTimeProvider _dateTimeProvider;

  public void MyClass(IDataProvider dataProvider, IDateTimeProvider dateTimeProvider) {
    this._dataProvider=dataProvider;
    this._dateTimeProvider=dateTimeProvider;
  }

  public IEnumerable<string> GetData() {
    var oneHourBefore=_dateTimeProvider.Now-TimeSpan.FromHours(1);
    var data=_dataProvider.GetDataNewerThan(oneHourBefore);
    return data;
  }
}

Note that I created an interface for MyClass to implement, one that declares my GetData method. Now, to use it, I could write something like this:

var myClass=Dependency.Get<IMyClass>();
myClass.GetData();

Wow! What happened here? I just used a magical class called Dependency that gets me an instance of IMyClass. And I really don't care how it does it. It can discover implementations by itself or maybe I am manually binding interfaces to implementations when the application starts (for example Dependency.Bind<IMyClass,MyClass>();). When it needs to create a new MyClass it automatically sees that it needs two other interfaces as parameters, so it gets implementations for those first and continues up the chain. It is called a dependency chain and the container will go through it all to simply "Get" you what you need. There are many inversion of control frameworks out there, but the concept is so simple that one can make their own easily.

And I get another benefit: if I want to display some other type of data, all I have to do is instruct the dependency container that I want another implementation for the interface. I can even think about versioning: take a class that I know does the job and compare it with a new implementation of the same interface. I can tell it to use different versions based on the client used. And all of this in exactly one place: the dependency container bindings. You may want to plug different implementations provided by third parties and all they have to care about is respecting the contract in your interface.

Solution structure

This way of writing code forces some changes in the structure of your projects. If all you have is written in a single project, you don't care, but if you want to split your work in several libraries, you have to take into account that interfaces need to be referenced by almost everything, including third party modules that you want to plug. That means the interfaces need their own library. Yet in order to declare the interfaces, you need access to all the data objects that their members need, so your Interfaces project needs to reference all the projects with data objects in them. And that means that your logic will be separated from your data objects in order to avoid circular dependencies. The only project that will probably need to go deeper will be the unit and integration test project.

Bottom line: in order to implement this painlessly, you need an Entities library, containing data objects, then an Interfaces library, containing the interfaces you need and, maybe, the dependency container mechanism, if you don't put it in yet another library. All the logic needs to be in other projects. And that brings us to a nice side effect: the only connection between logic modules is done via abstractions like interfaces and simple data containers. You can now substitute one library with another without actually caring about the rest. The unit tests will work just the same, the application will function just the same and functionality can be both encapsulated and programatically described.

There is a drawback to this. Whenever you need to see how some method is implemented and you navigate to definition, you will often reach the interface declaration, which tells you nothing. You then need to find classes that implement the interface or to search for uses of the interface method to find implementations. Even so, I would say that this is an IDE problem, not a dependency injection issue.

Other points of view

Now, the intro above describes what I understand by dependency injection and inversion of control. The official definition of Dependency Injection claims it is a subset of Inversion of Control, not a separate thing.

For example, Martin Fowler says that when he and his fellow software pattern creators thought of it, they called it Inversion of Control, but they decided that it was too broad a term, so they moved to calling it Dependency Injection. That seems strange to me, since I can describe situations where dependencies are injected, or at least passed around, but they are manually instantiated, or situations where the creation of instances is out of the control of the developer, but no dependencies are passed around. He seems to see both as one thing. On the other hand, the pattern where dependencies are injected by constructor, property setters or weird implementation of yet another set of interfaces (which he calls Dependency Injection) is different from Service Locator, where you specifically ask for a type of service.

Wikipedia says that Dependency Injection is a software pattern which implements Inversion of Control to resolve dependencies, while it calls Inversion of Control a design principle (so, not a pattern?) in which custom-written portions of a computer program receive the flow of control from a generic framework. It even goes so far as to say Dependency Injection is a specific type of Inversion of Control. Anyway, the pages there seem to follow the general definitions that Martin Fowler does, which pits Dependency Injection versus Service Locator.

On StackOverflow a very well viewed answer sees dependency injection as "giving an object its instance variables". I tend to agree. I also liked another answer below that said "DI is very much like the classic avoiding of hardcoded constants in the code." It makes one think of a variable as an abstraction for values of a certain type. Same page holds another interesting view: "Dependency Injection and dependency Injection Containers are different things: Dependency Injection is a method for writing better code, a DI Container is a tool to help injecting dependencies. You don't need a container to do dependency injection. However a container can help you."

Another StackOverflow question has tons of answers explaining how Dependency Injection is a particular case of Inversion of Control. They all seem to have read Fowler before answering, though.

A CodeProject article explains how Dependency Injection is just a flavor of Inversion of Control, others being Service Locator, Events, Delegates, etc.

Composition over inheritance, convention over configuration

An interesting side effect of this drastic decoupling of code is that it promotes composition over inheritance. Let's face it: inheritance was supposed to solve all of humanity's problems and it failed. You either have an endless chain of classes inheriting from each other from which you usually use only one or two or you get misguided attempts to allow inheritance from multiple sources which complicates understanding of what does what. Instead interfaces have become more widespread, as declarations of intent, while composition has provided more of what inheritance started off as promising. And what is dependency injection if not a sort of composition? In the intro example we compose a date time provider and a data provider into a time aware data provider, all the time while the actors in this composition need to know nothing else than the contracts each part must abide by. Do that same thing with other implementations and you get a different result. I will go as far as to say that inheritance defines what classes are, while composition defines what classes do, which is what matters in the end.

Another interesting effect is the wider adoption of convention over configuration. For example you can find the default implementation of an interface as the class that implements it and has the same name minus the preceding "I". Rather than explicitly tell the framework that we want to use the Manager class each time someone needs an IManager implementation, it can figure it out for itself by naming alone. This would never work if the responsibility of getting class instances resided with each method using them.

Real life examples

Simple Injector

If you look on the Internet, one of the first dependency injection frameworks you find for .Net is Simple Injector, which works on every flavor of .Net including Mono and Core. It's as easy to use as installing the NuGet package and doing something like this:

// 1. Create a new Simple Injector container
var container = new Container();

// 2. Configure the container (register)
container.Register<IUserRepository, SqlUserRepository>(Lifestyle.Transient);
container.Register<ILogger, MailLogger>(Lifestyle.Singleton);

// 3. Optionally verify the container's configuration.
container.Verify();

// 4. Get the implementation by type
IUserService service = container.GetInstance<IUserService>();

ASP.Net Core

ASP.Net Core has dependency injection built in. You configure your bindings in ConfigureServices:

public void ConfigureServices(IServiceCollection svcs)
{
  svcs.AddSingleton(_config);
 
  if (_env.IsDevelopment())
  {
    svcs.AddTransient<IMailService, LoggingMailService>();
  }
  else
  {
    svcs.AddTransient<IMailService, MailService>();
  }
 
  svcs.AddDbContext<WilderContext>(ServiceLifetime.Scoped);
 
  // ...
}

then you use any of the registered classes and interfaces as constructor parameters for controllers or even using them as method parameters (see FromServicesAttribute)

Managed Extensibility Framework

MEF is a big beast of a framework, but it can simplify a lot of work you would have to do to glue things together, especially in extensibility scenarios. Typically one would use attributes to declare which interface something "exports" and then use other attributes to "import" implementations in properties and values. All you need to do is put them in the same place. Something like this:

[Export(typeof(ICalculator))]
class SimpleCalculator : ICalculator {
  //...
}

class Program {

  [Import(typeof(ICalculator))]
  public ICalculator calculator;

  // do something with calculator
}

Of course, in order for this to work seamlessly you need stuff like this, as well:

private Program()
{
    //An aggregate catalog that combines multiple catalogs
    var catalog = new AggregateCatalog();
    //Adds all the parts found in the same assembly as the Program class
    catalog.Catalogs.Add(new AssemblyCatalog(typeof(Program).Assembly));
    catalog.Catalogs.Add(new DirectoryCatalog("C:\\Users\\SomeUser\\Documents\\Visual Studio 2010\\Projects\\SimpleCalculator3\\SimpleCalculator3\\Extensions"));


    //Create the CompositionContainer with the parts in the catalog
    _container = new CompositionContainer(catalog);

    //Fill the imports of this object
    try
    {
        this._container.ComposeParts(this);
    }
    catch (CompositionException compositionException)
    {
        Console.WriteLine(compositionException.ToString());
    }
}

Dependency Injection in other languages

Admit it, C# is great, but it is not by far the most used computer language. That place is reserved, at least for now, for Javascript. Not only is it untyped and dynamic, but Javascript isn't even a class inheritance language. It uses the so called prototype inheritance, which uses an instance of an object attached to a type to provide default values for the instance of said type. I know, it sounds confusing and it is, but what is important is that it has no concept of interfaces or reflection. So while it is trivial to create a dictionary of instances (or functions that create instances) of objects which you could then use to get what you need by using a string key (something like var manager=Dependency.Get('IManager');, for example) it is difficult to imagine how one could go through the entire chain of dependencies to create objects that need other objects.

And yet this is done, by AngularJs, RequireJs or any number of modern Javascript frameworks. The secret? Using regular expressions to determine the parameters needed for a constructor function after turning it to string. It's complicated and beyond the scope of this blog post, but take a look at this StackOverflow question and its answers to understand how it's done.

Let me show you an example from AngularJs:

angular.module('myModule', [])
  .directive('directiveName', ['depService', function(depService) {
    // ...
  }])

In this case the key/type of the service is explicit using an array notation that says "this is the list of parameters that the dependency injector needs to give to the function", but this might be have been written just as the function:

angular.module('myModule', [])
  .directive('directiveName', function(depService) {
    // ...
  })

In this case Angular would use the regular expression approach on the function string.

What about other languages? Java is very much like C# and the concepts there are similar. Even if all are flavors of C, C++ is very different, yet Dependency Injection can be achieved. I am not a C++ developer, so I can't tell you much about that, but take a look at this StackOverflow question and answers; it is claimed that there is no one method, but many that can be used to do dependency injection in C++.

In fact, the only languages I can think of that can't do dependency injection are silly ones like SQL. Since you cannot (reasonably) define your own types or pass functions along, the concept makes no sense. Even so, one can imagine creating dummy stored procedures that other stored procedures would use in order to be tested. There is no reason why you wouldn't use dependency injection if the language allows for it.

Testability

I mentioned briefly unit testing. Dependency Injection works hand in hand with automated testing. Given that the practice creates modules of software that give reproducible results for the same inputs and account for all the inputs, testing becomes a breeze. Let me give you some examples using Moq, a mocking library for .Net:

var dateTimeMock=new Mock<IDateTimeProvider>();
dateTimeMock
  .Setup(m=>m.Now)
  .Returns(new DateTime(2016,12,03));

var dataMock=new Mock<IDataProvider>();
dataMock
  .Setup(m=>m.GetDataNewerThan(It.IsAny<DateTime>()))
  .Returns(new[] { "test","data" });

var testClass=new MyClass(dateTimeMock.Object, dataMock.Object);

var result=testClass.GetData();
AssertDeepEqual(result,new[] { "test","data" });

First of all, I take care of all dependencies. I create a "mock" for each of them and I "set up" the methods or property setters/getters that interest me. I don't really need to set up the date time mock for Now, since the data from the data provider is always the same no matter the parameter, but it's there for you to see how it's done. Second, I instantiate the class I want to test using the Object property of my mocks, which returns an object that implements the type given as a generic parameter in the constructor. Third I assert that the side effects of my call are the ones I expect. The mocks need to be as dumb as possible. If you feel you need to write code to define your mocks you are probably doing something wrong.

The type of the tests, for people who are not familiar with this concept, is usually a fully positive one - that is give full valid data and expect the correct result - followed by many negative ones, where the correct data is made incorrect in all possible ways and it is tested that the method fails. If there are many types of combinations of data that would be considered valid, you need a test for as many of them.

Note that the test is instantiating the test class directly, using the constructor. We are not testing the injector here, but the actual class.

Conclusions

What I appreciate most with Dependency Injection is that it forces you to write code that has clear boundaries defined by interfaces. Once this is achieved, you can go write your own stuff and not care about what other people do with theirs. You can test your modules without even caring if the rest of the project even exists. It allows to refactor code in steps and with a lot more confidence since you are covered by unit tests.

While some people work on fire-and-forget projects, like small games or utilities, and they don't care about maintainability, one of the most touted reasons for using unit tests and dependency injection, these practices bring so many other benefits that are almost impossible to get otherwise.

The entire point of this is reducing the complexity of dependencies, which include not only the modules in your application, but also the support frame for them, like people working on them. While some managers might not see the wisdom of reducing friction between software components, surely they can see the positive value of reducing friction between people.

There was one other topic that I wanted to touch, but it is both vast and I have not enough experience with it, however it feels very attractive to me: refactoring old code in order to use dependency injection. Best practices, how to make it safe enough and fast enough to make managers approve it and so on. Perhaps another post later on. I was thinking of a combination of static analysis and automated methods, like replacing all usages of "new" with a single point of instantiation, warning about static methods and properties, automatically replacing known bad practices like DateTime.Now and so on. It might be interesting, right?

I hope I wasn't too confusing and I appreciate any feedback you have. I will be working on a presentation file with similar content, so any help will go into doing a better job explaining it to others.

The perils of giving your data objects methods

Published Nov 23, 2016

Posted in
.NET
programming

and has 0 comments

A colleague of mine hit a strange bug today. It so happened that we use a bastardized dependency injection method that takes into account the WCF session before returning an implementation of an interface. In a piece of code the injection failed and we couldn't see why for a while. Let me give you a simplified version:

var someManager=Package.Get<IManager>();
var someDTOs=Cache.GetDatabaseObjects().Select(x=>x.Pack());

public class DataObject {
    public string Data {get;set;}
    public DataObjectDTO Pack() {
        var anotherManager=Package.Get<IAnother>();
        return new DataObjectDTO {
           Data=anotherManager.Process(Data)
        };
    }
}

Package.Get will attempt to find a session object and if not it will use another mechanism, but if it finds one, it will only use it if it is not expired or invalid, else throwing an exception. This code failed in the Pack method, when trying to get an instance of IAnother. Please take a few moments to reflect on why (and no, it's not that between calls the session expired).

Show explanation

The expression being assigned to '[something] must be constant when using integers inside strings

Published Nov 21, 2016

Posted in
.NET
programming
C#

and has 0 comments

I've stumbled upon a very funny exception today. Basically I was creating a constant string from adding some other constant strings to each other. And it worked. The moment I added an integer, though, I got The expression being assigned to 'Program.x2' must be constant. The code that generated this error is simple:

const string x2 = "string" + 2;

Note that

const string x2 = "string" + "2";

is perfectly valid. Got the same result when using VS2010 and VS2015, so it's not a compiler bug, it's intended behavior.

So, what's going on? Well, my code transforms behind the scenes into

const string x2 = "string" + 2.ToString();

which is not constant because of ToString!

The only way to solve it was to declare the numeric constant as string as well.

Getting random rows from a table in T-SQL: TABLESAMPLE [instructional post, but not a recommended method]

Published Nov 20, 2016

Posted in
database
programming

and has 0 comments

This clause is so obscure that I couldn't even find the Microsoft reference page for it for a few minutes, so no wonder I didn't know about it. Introduced in SQL Server 2005, the TABLESAMPLE clause limits the number of rows returned from a table in the FROM clause to a sample number or PERCENT of rows.

Usage:

TABLESAMPLE (sample_number [ PERCENT | ROWS ] ) [ REPEATABLE (repeat_seed) ]

REPEATABLE is used to set the seed of the random number generator so one can get the same result if running the query again.

It sounds great at the beginning, until you start seeing the limitations:

it cannot be applied to derived tables, tables from linked servers, and tables derived from table-valued functions, rowset functions, or OPENXML
the number of rows returned is approximate. 10 ROWS doesn't necessarily return 10 records. In fact, the functionality underneath transforms 10 into a percentage, first
a join of two tables is likely to return a match for each row in both tables; however, if TABLESAMPLE is specified for either of the two tables, some rows returned from the unsampled table are unlikely to have a matching row in the sampled table.
it isn't even that random!

Funny enough, even the reference page recommends a different way of getting a random sample of rows from a table:

SELECT * FROM Sales.SalesOrderDetail
WHERE 0.01 >= CAST(CHECKSUM(NEWID(), SalesOrderID) & 0x7fffffff AS float) / CAST (0x7fffffff AS int)

Even if probably not really usable, at least I've learned something new about SQL.

Update:
More about getting random samples from a table here, where it explains why ORDER BY NEWID() is not the way to do it and gives hints of what really happens in the background when we invoke TABLESAMPLE.
Another interesting article on the subject, focused more on the statistical probability, can be found here, where it also shows how TABLESAMPLE's cluster sampling may fail in spectacular ways.

Am I a good person?

Published Nov 6, 2016

Posted in
misc
essay
personal

and has 1 comment

I am often left dumbfounded by the motivations other people are assigning to my actions. Most of the time it is caused by their self-centeredness, their assumption that whatever I do is somehow related more to them than to me. And it made me think: am I a good/bad person, or is it all a matter of perception from others?

I rarely feel like I do something out of the ordinary for other people; instead I do it because that's who I am. I help a colleague because I like to help or I refuse to do so because I feel that what I am doing is more important. Same with friends or romantic relationships. Sometimes I need to make an effort to do something, but it's still my choice, my assessment of the situation and my decision to go a certain way. It's not a value judgment on the person, it's not an asshole move or some out of my way effort to improve their life. What I do IS me.

It's also a weird direction of reasoning, since I am aware of the physical impossibility for "free will" and I subscribe to the school of thought that it is all an illusion. I mean, logic dictates that either the world works top-bottom, with some central power of will trickling down reality or it is merely a manifestation of low level forces and laws of physics that lead inexorably towards the reality we perceive. In other words, if you believe in free will, you have to believe in some sort of god, and I don't. Yet living my life as if I have no free will makes no sense either. I need to play the game if I am to play the game. It's kind of circular.

Getting back to my original question: Isn't good or bad just a label I (and other people) assign to a pattern of behavior that belongs to me? And not before I do things, but always afterwards. Just like the illusion of free will there is the illusion of moral quality that guides my path. While one cannot quantify free will, they can measure the effect my behavior has on their life and goals and determine a value. But then is my "goodness" something like an average? Because then it would be more important the number of people I am affecting, rather than the absolute value of the effect per person. Who cares I help a colleague or pay attention to my wife? In the big sea of people, I am just a small fish that affects a few other small fish. We could all die tomorrow in the belly of a whale, all that goodness pointless.

So here I am, asking essentially a "who am I" question - painfully aware it has no final answer - in a world I think is determined by tiny laws of physics that create the illusion of self and with a quantity of consequence that is irrelevant even if it weren't so. I am torturing myself for no good reason, ain't I?

Yet the essence of the question still intrigues me. Is it necessary that I feel a good drive for my actions to be a good person, or is it a posterior calculation of their effect that determines that? If I work really well and fast for a month and then I do less the next, is it that I did good work in the first month or that I am a lazy bastard in the second? If I pay attention to someone or make a nice gesture, is it something to be lauded, or something to be criticized when I don't do it all the time? Is this a statistical problem or an issue of causality?

And I have to ask this question because if I feel no particular drive to do something and just "am myself", I don't think people should assign all kind of stupid motivations to my actions. And if I need to make this sustained effort to go outside my routine just to gain moral value... well, it just feels like a lot of bother. And I have to ask it because the same reasoning can be applied to other people. Is my father making terrible efforts to take care of just about everybody in his life, making him some sort of saint, or is it just what he does and can't help himself, in which case he's just a regular dude?

Personally I feel that I am just an amalgamation of experiences that led to the way I behave. I am neither good nor evil and my actions define me more than my intentions. While there is some sort of consistency that can be statistically assessed, it is highly dependent on the environment and any inference would go down the drain the moment that environment changes. But then, how can I be a good person? And does it even matter?

Controlling JSON serialization in .Net Core Web API (Serialize enum values as strings, not integers)

Published Oct 29, 2016

Posted in
.NET
programming
Core
C#

and has 16 comments

.Net Core Web API uses Newtonsoft's Json.NET to do JSON serialization and for other cases where you wanted to control Json.NET options you would do something like

JsonConvert.DefaultSettings = (() =>
{
    var settings = new JsonSerializerSettings();
    // do something with settings
    return settings;
});

, but in this case it doesn't work. The way to do it is to use the fluent interface method and hook yourself in the ConfigureServices(IServiceCollection services) method, after the call to .AddMvc(), like this:

services
    .AddMvc()
    .AddJsonOptions(options =>
    {
        var settings=options.SerializerSettings;
        // do something with settings
    });

In my particular case I wanted to serialize enums as strings, not as integers. To do that, you need to use the StringEnumConverter class. For example if you wanted to serialize the Gender property of a person as a string you could have defined the entity like this:

public class Person
{
    public string Name { get; set; }
    [JsonConverter(typeof(StringEnumConverter))]
    public GenderEnum Gender { get; set; }
}

In order to do this globally, add the converter to the settings converter list:

services
    .AddMvc()
    .AddJsonOptions(options =>
    {
        options.SerializerSettings.Converters.Add(new StringEnumConverter {
            CamelCaseText = true
        });
    });

Note that in this case, I also instructed the converter to use camel case. The result of the serialization ends up as:

{"name":"James Carpenter","age":51,"gender":"male"}

Beware LINQ OrderBy in performance sensitive cases

Published Oct 22, 2016

Posted in
.NET
programming
C#

and has 0 comments

I was doing this silly HackerRank algorithm challenge and I got the solution correctly, but it would always time out on test 7. I wracked my brain on all sorts of different ideas but to no avail. I was ready to throw in the towel and check out other people solutions, only they were all in C++ and seemed pretty similar to my own. And then I've made a code change and the test passed. I had replaced LINQ's OrderBy with Array.Sort.

Intrigued, I started investigating. The idea was creating a sorted integer array from a space delimited string of values. I had used Console.ReadLine().Split(' ').Select(s=>int.Parse(s)).OrderBy(v=>v); and it consumed above 7% of the total CPU of the test. Now I was using var arr=Console.ReadLine().Split(' ').Select(s=>int.Parse(s)).ToArray(); Array.Sort(arr); and the CPU usage for that piece of the code was 1.5%. So it was almost five times slower. How do the two implementations differ?

Array.Sort should be simple: an in place quicksort, the best general solution for this sort (heh heh heh) of problem. How about Enumerable.OrderBy? It returns an OrderedEnumerable which internally uses a Buffer<T> to get all the values in a container, then uses an EnumerableSorter to ... quicksort the values. Hmm...

Let's get back to Array.Sort. It's not as straightforward as it seems. First of all it "tries" a SZSort. If it works, fine, return that. This is an external native code implementation of QuickSort on several native value types. (More on that here) Then it goes to a SorterObjectArray that chooses, based on framework target, to use either an IntrospectiveSort or a DepthLimitedQuickSort. Even the implementation of this DepthLimitedQuickSort is much, much more complex than the quicksort used by OrderBy. IntrospectiveSort seems to be the one preferred for the future and is also heavily optimized, but less complex and easier to understand, perhaps. It uses quicksort, heapsort and insertionsort together.

Now, before you go all "OrderBy sucks!", read more about it. This StackOverflow list of answers seems to indicate that in case of strings, at least, the performance is similar. A lot of other interesting things there, as well. OrderBy uses a "stable" QuickSort, meaning that two items that are compared as equal will appear in their original order. Array.Sort does not guarantee that.

Anyway, the performance difference in my particular case seems to come from the native code implementation of the sort for integers, rather than algorithmic improvements, although I don't have the time right now to grab the various implementations and test them properly. However, just from the way the code reads, I would bet the IntrospectiveSort will compare favorably to the simple Quicksort implementation used in OrderBy.

My first DMCA notice

Published Oct 13, 2016

Posted in
misc
rant

and has 0 comments

Today I received two DMCA notices. One of them might have been true, but the second was for a file which started with

/*
Copyright (c) 2010, Yahoo! Inc. All rights reserved.
Code licensed under the BSD License:
http://developer.yahoo.com/yui/license.html
version: 2.8.1
*/

Nice, huh?

The funny part is that these are files on my Google Drive, which are not used anywhere anymore and are accessible only by people with a direct link to them. Well, I removed the sharing on them, just in case. The DMCA is even more horrid than I thought. The links in it are general links towards a search engine for notices (not the link to the actual notice) and some legalese documents, the email it is coming from is noreply-6b094097@google.com and any hope that I might fight this is quashed with clear intention from the way the document is worded.

So remember: Google Drive is not yours, it's Google's. I wonder if I would have gotten the DMCA even if the file was not being shared. There is a high chance I would, since no one should be using the link directly.

Bleah, lawyers!

Disqus customer support is non existent [Blogger comment synchronization broken]

Published Oct 11, 2016

Posted in
misc
rant
administrative

and has 0 comments

I have enabled Disqus comments on this blog and it is supposed to work like this: every old comment from Blogger has to be imported into Disqus and every new comment from Disqus needs to be also saved in the Blogger system. Importing works just fine, but "syncing" does not. Every time someone posts a comment I receive this email:

Hi siderite,
 
You are receiving this email because you've chosen to sync your
comments on Disqus with your Blogger blog. Unfortunately, we were not
able to access this blog.
 
This may happen if you've revoked access to Disqus. To re-enable,
please visit:
https://siderite.disqus.com/admin/discussions/import/platform/blogger/
 
Thanks,
The Disqus Team

Of course, I have not revoked any access, but I "reenable" just the same only to be presented with a link to resync that doesn't work. I mean, it is so crappy that it returns the javascript error "e._ajax is undefined" for a line where e._ajax is used instead of e.ajax and even if that would have worked, it uses a config object that is not defined.

It doesn't really matter, because the ajax call just accesses (well, it should access) https://siderite.disqus.com/admin/discussions/import/platform/blogger/resync/. And guess what happens when I go there: I receive an email that the Disqus access in Blogger has been revoked.

No reply for the Disqus team for months, for me or anybody else having this problem. They have a silly page that explains that, of course, they are not at fault, Blogger did some refactoring and broke their system. Yeah, I believe that. They probably renamed the ajax function in jQuery as well. Damn Google!

Fragmentation in an SQL table with lots of inserts, updates and deletes

Published Oct 7, 2016

Posted in
database
programming

and has 0 comments

I've met an interesting case today when we needed to manipulate data from tens of thousands of people daily. Assuming we would use table rows for the information, then we get a table in which rows are constantly added, updated and deleted. The issue is with the space allocated in table pages.

SQL works like this: If it needs space it allocates some as a "page" which can contain more records. When you delete records the space is not reclaimed, it remains as is (this is called ghosting). The exception is when all records in a page are deleted, in which case the page is reused as an empty page. When you update a record with more data then it held before (like when you have a variable length column), the page is split, with the rest of the records on the page moved to a new page.

In a heap table (no clustered index) the space inside pages is reused for new records or for updated records that don't fit in their allocated space, however if you use a clustered index, like a primary key, the space is not reused, since there needs to be a correlation between the value of the column and its position in the page. And here lies the problem. You may end up with a lot of pages with very few records in them. A typical page is 8 kilobytes, so a table with a few integers in a record can hold hundreds of records on a single page.

Fragmentation can be within a page, as described above, also called internal, but also external, between pages, when the recycled pages are used for data that is out of order. To get a large swathe of records the disk might be worked hard in order to jump from page to page to get what is logically a continuous blob of data. It is disk input/output that kills a database.

OK, back to our case. A possible solution was to store all the data for a user in a "blob", a VARBINARY column. For reads or changes only the disk space occupied by the blob would be changed, with C# code handling everything. It's what is called trading CPU for IO, which is generally good. However this NoSql-like idea itself smelled badly to me. We are supposed to trust our databases, not work against them. The solution I chose is monitoring index fragmentation and occasionally issuing clustered index rebuilding or reorganizing. I am willing to bet that reading/writing the data equivalent to several pages of table is going to be more expensive than selecting the changes I want to make. Also, rebuilding the index will end up storing all the data per user in the same space anyway.

However, this case made me think. Here is a situation in which the solution might have been (and it was in a similar case implemented by someone else) to micromanage the way the database works. It made me question using a clustered index/primary key on a table.

These articles helped me understand more:

Undo a Perforce changelist

Published Oct 5, 2016

Posted in
programming
software

and has 0 comments

I had this problem with Perforce where I accidentally Reconciled my offline work with all the files in /bin and /obj folders, resulting in a huge 6000+ file changelist. OK, simple one button mistake, surely there must be some one button undoing what I just did. It appears there is not.

In order to fix this I have to follow these steps:

Change the settings of Perforce to show files even in changelists larger than 1000 items (the default value)
Select by hand in the changelist window the files from obj and bin folders and using Revert on them
Revert the few other files that were unwanted in the changelist, like .suo and .user files - note that Revert on added files doesn't delete them, it just unadds them
Create a file with paths to ignore and then use p4 set P4IGNORE=<filename> for future reconcile work

What didn't work was adding a filename or path filter when visualizing the changelist, since that is a changelist filter, not a files filter. It will show you changelists that have files that contain the pattern, but not filter the files inside the changelists themselves.

For reference, the p4ignore file I used looked like this:

p4ignore
bin/Debug
obj/Debug
*.suo
*.user

Note that I also added the p4ignore file itself, although the file was not in any Perforce repository (yet).

"But, Siderite, you should use Git (or whatever source control is the newest fad at the moment)!" Wish that could, my friend, wish that I could.

The Trials (The Red #2) and Going Dark (The Red #3), by Linda Nagata

Published Oct 5, 2016

Posted in
misc
picture
books

and has 0 comments

Having been so pleasantly surprised by First Light, the first book in The Red series, I quickly read the next two books: The Trials and Going Dark. However, possibly due to my high expectations, I have been disappointed by the continuation. Linda Nagata seemed to have reached that sweet spot between current tech trends and emergent future that makes stories feel both hard sci-fi and realistic. The integration between man and machine, the politics run by shadowy megarich "dragons" from the background, nuclear bombs detonated in major US cities, artificial intelligence and so on. The potential was immense!

Yet, the author chose to continue the story on the same flat note, like an ode to the Stockholm Syndrome, where the hero gets repeatedly coerced to run missions that at first seem bullshit, but in the end are rationalized as necessary and even dutiful by himself. The common intrusion of external forces into his emotional balance by way of direct brain stimulation also makes his feelings and motivations be completely isolated from the ones of the reader. A strange choice, considering the vast possibilities opened by the first book. Frankly, it felt like Nagata liked writing the first book and then was forced by publishers to make it "a trilogy", since that is the norm for fantasy and science fiction, but her heart wasn't really in it.

I don't want to spoil the ending, such as it is, but I will say it was disappointing as well, with no real closure for the reader of any of the important questions raised in First Light. Too bad, since I felt the story was beginning to touch on important subjects that needed to be discussed at a deeper level than just "boots on the ground".

Facebook relationship algorithms and how to use them to hack a relationship

Published Sep 30, 2016

Posted in
misc
picture
idea

and has 0 comments

I was listening to this Software Engineering Daily podcast about Facebook Relationship Algorithms and I had this weird idea. The more I was thinking about it, the more realistic it felt (as well as more than a bit creepy). Let me lay it out for you. The podcast is interesting in its own right, so go listen to it, it's instructive.

So they describe these metrics of your Facebook connections that they reached while trying to find an algorithm to detect your romantic relationship. They took a large sample of users that have declared their significant other and tried to find an automated way of predicting that from the other information Facebook had. The first idea was to use connectivity, one often used idea in sociology that the more common friends you have with someone, the closer you are, but it didn't quite work. One clear counterexample would be coworkers connected on social media. Instead, they realized that such functional connections often cluster, so you would have the cluster of coworkers, your family, your club friends, the people who share your hobby, etc. The romantic partner would be connected with many of these friends, but across clusters, in other words you would have many common friends that are not really connected with each other. By using this metric they called dispersion, they would guess with more than 50% accuracy from the first try who your partner is from the list of hundreds of your friends.

And here is my idea: why not reverse engineer it? Imagine you have someone you want to hook up with, but have no idea how to proceed. Maybe asking them on a date is not an option or maybe, like any engineer out there, you want to maximize the chances your experiment would work. Why not find the smallest subset of friends of that person that have the largest value of dispersion? So here is what this "hook me up" algorithm would do:

Collect the list of friends of your target
Find a sample that are well connected to the target, but less connected with each other
Approach each of them and befriend them

The result would be that you would become the "natural" choice for a relationship, by going backwards and reversing the direction of causality. Automatic stalking, courtesy of your friendly neighborhood software developer: Siderite! We live in an age in which information about us can be used or abused in innumerable ways and we become addicted and stuck to this way of relating. It's not a bad thing, but it has its drawbacks. It is good to know of them.

First Light (The Red #1), by Linda Nagata

Published Sep 28, 2016

Posted in
misc
picture
books

and has 0 comments

First Light is Linda Nagata's first book in The Red series, which follows a military man landing right into the middle of an emergence event. Stuck between his duty as a soldier, his love for his girlfriend and father, the maniacal ambitions of an all powerful defense contractor queen and a mysterious God-like entity which seems to like him, our hero does what he can to survive and do good by his own principles.

At first I thought it was going to be one of those cheap soldiering books. It was short, written by a woman, and frankly I expected a standard pulp fiction "read it on a train" kind of thing. Instead I was blown away by the subtlety with which the characters are being explored and the way the story was constructed. I loved the book and I plan to read all the series. I started reading The Dread Hammer, which is another Nagata book, this time fantasy, but it doesn't even come close to First Light. I may even dislike it.

Anyway, I can't say much about the plot without spoiling it, but I can certainly recommend this book. As I said, it is short enough to read and see if it evokes the same feelings. Instead of hurting it, the female perspective of the author enhances the experience and makes it unique. The technical aspects are spot on and the writing style is fluid and easy to read. Top marks!

Switch any site to a dark/light scheme

Published Sep 26, 2016

Posted in
misc
programming

and has 0 comments

Update May 2020: I used this on a web site and the body was white. It may be that the bug in Chrome was solved in the meantime.

Update October 2019: a CSS media feature (prefers-color-scheme) can be used in conjunction with this. A recent development, it's a media query that allows a browser to activate CSS code based on the theme set in the operating system. You set your preference in Windows or MacOS or wherever and then sites that use prefers-color-scheme will take advantage of that. Something like this:

@media (prefers-color-scheme: dark) {
  html,img, video, object, [style*=url] {
    -webkit-filter:invert(100%) !important;
    filter:invert(100%) !important;
  }

  /* this was solving a bug in Chrome that seems to have been fixed
  body {
    background:black;
  }
  */
}

I am a very light sensitive person. Shine a light in my eyes and you limit my productivity immensely. Not to mention it makes me irritable. Therefore I often have the desire to turn cheerful black on white sites to a dark theme, where the colors are reversed. I am sure other people have the same problem so I thought of building a browser extension to enable a switching button between the two.

The first problem is that I need to interrogate all the elements in a page, including the ones that will be created later. The second is that even so I would have problems determining the dominant color of images. But there is something I can use which makes all of this unnecessary: using the invert CSS filter! Since I already use a browser extension that injects my own styles in any site - it's called Stylish and I highly recommend it - all I have to do is apply a filter on the entire site, right?

Wrong! The problem is that when you invert an entire site, all images on the site get inverted, too. That also includes videos and Flash objects. The worst offenders here are the elements that sport a background image that is declared via CSS, since you can't create a CSS selector for them. I am going to present my partial solution and maybe you can help me find a more elegant or more complete one. Here is a general dark theme stylesheet, without the elements that have a background image declared via CSS (it does include those with a background image declared inline, though):

 html,img, video, object, [style*=url] {
    -webkit-filter:invert(100%) !important;
    filter:invert(100%) !important;
  }

 /* this was solving a bug in Chrome that seems to have been fixed
 body {
   background:black;
 }
 */

What it does is invert the entire page (html), then reinvert video, img, and object elements, as well as those with "url" in the style attribute. In Chrome, at least, there seems to be a bug in the sense that the backgrounds of the direct child elements are not inverted, which means body, as the first child, needs to have the background set to black specifically. (this seems to have been solved by May 2020) The hack to invert elements with "url" in style is pretty ugly, too.

What I think of a solution is this:

inject Javascript to enumerate all elements present and future using document.createTreeWalker and Mutation Observers, check if they have a background image and if so, add a class to them
inject the CSS above with an additional rule for the class for the elements with background image

However, this doesn't completely solve the problem. One of the major issues is the inverted colors sometimes look dumb. For example a red background turns cyan, white text on light gray background turns black text on dark gray background, which makes it hard to read. I've tried various other filters, like hue-rotate or contrast, but it doesn't really help. Detecting individual color patterns doesn't really work, as the filter attribute affects an element and all of its children. The CSS above only works because the images are inverted again when the entire page has been inverted.

The good news is that most of the time you may use the CSS above as a template, then add various rules (manually) to fix small issues with colors of backgrounds. Even if I don't package this in an extension, you have the power to create your own themes for various sites. Never again will you be subjects to the tyranny of the happy bright shiny people!