Intro

  This is part of a series that I plan to build on as time goes on: technical interview questions, dissected and laid bare for both interviewers and interviewees. You can also check out the previous one: Interview question: all items in table A but not in B.

  This question is a little bit more complex and abstract at the same time. The post is written more for interviewers this time, because as candidates go, you need to read the links in it if you didn't know the concepts in it. This also is not a question with a single correct answer. It comes after asking about Dependency Injection as a whole and the candidate answering correctly.

  I expect senior developers to be able to go through this successfully, it is not a test for junior developers, although depending on their previous experience juniors might be able to go through it and seniors be force to reason through it.

The test

Bonus introduction question: why use DI at all? Expected answers would be separation of concerns and testability. 

  The question has two steps.

Step 1: given the following code in a legacy application, improve it to use Dependency Injection:

public SomeClass {
  public List<Item> GetItems(int days, string filter) {
    var service = new ItemService();
    return service.GetItems()
      .Where(i => i.Time >= DateTime.Now.AddDays(-days));
  }
}

Bonus questions:

  • has the candidate worked with LINQ before?
  • what does the code do?

Now, this question is about programming knowledge as it is for attention. There are three irregularities that can attract that attention:

  • the most obvious: the service is being instantiated by calling the constructor
    • the interviewer expects at the very least for the candidate to notice it and suggest moving the instantiation of the service in the constructor of the SomeClass class and inject it instead of using new
    • there is the possibility of passing the service as a parameter, as well, but suggest that the signature of the method should remain the same to get around it. Anyway, one can discuss the idea of moving all dependencies to the constructor and/or the calling methods and get insight in the way the candidate is thinking.
  • the unexplained string filter in the signature of the method
    • the interviewer can either tell the candidate that it will become relevant later, because it will, or that this is a method that implements an interface, to which a more snarky candidate might reply that SomeClass implements nothing (bonus for attention)
  • the use of DateTime.Now
    • it is a static property that gives a different output every time so it should be taken into account for Dependency Injection or at least for unit testing

By now you have filtered out the majority of failing candidates and you are left with someone who used or at least understands DI, can read and understand code, has used or at least understood basic LINQ and you have also gauged their level of attention to detail.

If the candidate only talked about the service and they decided to create an interface for ItemService and then add it as a parameter for the constructor of SomeClass, ask them to write a unit test for the method, explain to them that testability is one of the goals of DI if you didn't cover this already

  • bonus: see if they do unit testing or at least understand the concept
  • if they do attempt to write the unit test, ask them what would happen if you would run the test in different days

The expected result of this part is that the candidate understands the need of abstracting DateTime.Now. It is interesting to note how they intend to abstract it, since they do not have access to the code and it is a static method/property to abstract.

Whether the candidate figured it out by themselves or it was explained to them, the expected answer is that DateTime.Now is abstracted by creating an IDateTimeService interface that is implemented as a wrapper over DateTime.

At this point the code should look like this:

public SomeClass {
  private IItemService _itemService;
  private IDateTimeService _dateTimeService;

  public SomeClass(IItemService itemService, IDateTimeService dateTimeService) {
    _itemService = itemService;
    _dateTimeService = dateTimeService;
  }

  public List<Item> GetItems(int days, string filter) {
    return _itemService.GetItems()
      .Where(i => i.Time >= _dateTimeService.Now.AddDays(-days));
  }
}

Also, the candidate should be asked to write a unit test, just to see they know how, for bonus points. Note if the candidate understands isolation for unit testing or does something that would work but be silly like generate the test data based on current date or duplicate the code logic in the test instead of working with static data.

Step 2: tell the candidate that the legacy code they need to fix looks a bit different:

public SomeClass {
  public List<Item> GetItems(int days, string filter) {
    var service = new ItemService(filter);
    return service.GetItems()
      .Where(i => i.Time >= DateTime.Now.AddDays(-days));
  }
}

The ItemService now receives the filter as the parameter. Ask them what to do in this case.

The expected answer is a factory injected instead of the service, which will then be used to instantiate an IItemService with a parameter. Bonus discussion about software patterns can be inserted here.

There are other valid answers here, like using the DI container itself as a factory for the service, which might provoke interesting discussions in itself, like weighing constructor injection versus service provider in dependency injection and whether hybrid solutions might be better.

Bonus question: what if you cannot control the code of ItemService in step 1 and it does not implement an interface or a base class?

  • warning, this might give a hint for the second part of the interview, so use it at the end 
  • correct answer 1: use the class as the type of the parameter and let the dependency container decide how to instantiate it
  • correct answer 2: use a wrapper over the class that implements the interface and proxies to the instance methods.

Conclusion

For me this test told me a lot more about the candidate than just their dependency injection knowledge. We got to talking, I became aware of how their minds worked and I was both pleasantly surprised when they came with alternate solutions that kind of worked and a bit irked that they went that far and didn't see the superior option. Most of the time this made me think about the differences between what I would answer and what they did and this resulted in interesting discussions that enriched not only their experience, but also mine.

Dependency injection, separation of concerns and unit testing are important concepts for any modern developer. I hope this helps devs evolve and interviewers find the best candidates... at least until all of them get to read my blog.

As with all the programmer questions, I will update the post with the answer after people comment on this. Today's question is:

Here is a question for programmers. I will wait for your comments before answering.

A blog reader asked me to help him get rid of the ugly effect of a large background image getting loaded. I thought of several solutions, all more complicated than the rest, but in the end settled on one that seems to be working well and doesn't require complicated libraries or difficult implementation: using the img onload event.

Let's assume that the background image is on the body element of the page. The solution involves setting a style on the body to hide it (style="display:none") then adding as child of the body an image that also is hidden and that, when completing loading, shows the body element. Here is the initial code:
<style>
body {
background: url(bg.jpg) no-repeat center center fixed;
}
</style>
<body>

And after:

<style>
body {
background: url(bg.jpg) no-repeat center center fixed;
}
</style>
<body style="display:none">
<img src="bg.jpg" onload="document.body.style.display=''" style="display:none;" />

This loads the image in a hidden img element and shows the body element when the image finished loading.

The solution might have some problems with Internet Explorer 9, as it seems the load event is not fired for images retrieved from the cache. In that case, a slightly more complex Javascript solution is needed as detailed in this blog post: How to Fix the IE9 Image Onload Bug. Also, in Internet Explorer 5-7 the load event fires for animated GIFs at every loop. I am sure you know it's a bad idea to have an animated GIF as a page background, though :)

Warning: While this hides the effect of slow loading background images, it also hides the page until the image is loaded. This makes the page appear blank until then. More complex solutions would show some simple html content while the page is loading rather than hiding the entire page, but this post is about the simplest solution for the question asked.

A more comprehensive analysis of image preloading, complete with a very nice Javascript code that covers a lot of cases, can be found at Preloading images using javascript, the right way and without frameworks

I am starting a new blog series called Blog Question, due to the successful incorporation of a blog chat that works, is free, and does exactly what it should do: Chatango. All except letting me know in real time when a question has been posted on the chat :( . Because of that, many times I don't realize that someone is asking me things and so I fail to answer. As a solution, I will try to answer questions in blog posts, after I do my research. The new label associated with these posts is 'question'.

First off, some assumptions. I will assume that the person who said I'm working on this project of address validation. Using crf models is my concern. was talking about Conditional Random Fields and he meant postal addresses. If you are reading this, please let me know if that is correct. Also, since I am .NET developer, I will use concepts related to .NET.

I knew nothing about CRFs before writing this posts, so bear with me. The Wikipedia article about them is hard to understand by anyone without mathematical (specifically probabilities and statistics) training. However the first paragraph is pretty clear: Conditional random fields (CRFs) are a class of statistical modelling method often applied in pattern recognition and machine learning, where they are used for structured prediction. Whereas an ordinary classifier predicts a label for a single sample without regard to "neighboring" samples, a CRF can take context into account. It involves a process that classifies data by taking into account neighboring samples.

A blog post that clarified the concept much better was Introduction to Conditional Random Fields. It describes how one uses so called feature functions to extract a score from a data sample, then aggregates scores using weights. It also explains how those weights can be automatically computed (machine learning).

In the context of postal address parsing, one would create an interface for feature functions, implement a few of them based on domain specific knowledge, like "if it's an English or American address, the word before St. is a street name", then compute the weighting of the features by training the system using a manually tagged series of addresses. I guess the feature functions can ignore the neighboring words and also do stuff like "If this regular expression matches the address, then this fragment is a street name".

I find this concept really interesting (thanks for pointing it out to me) since it successfully combines feature extraction as defined by an expert and machine learning. Actually, the expert part is not that relevant, since the automated weighing will just give a score close to 0 to all stupid or buggy feature functions.

Of course, one doesn't need to do it from scratch, other people have done it in the past. One blog post that discusses this and also uses more probabilistic methods specifically to postal addresses can be found here: Probabilistic Postal Address Elementalization. From Hidden Markov Models, Maximum-Entropy Markov Models, Transformation-Based Learning and Conditional Random Fields, she found that the Maximum-Entropy Markov model and the Conditional Random Field taggers consistently had the highest overall accuracy of the group. Both consistently had accuracies over 98%, even on partial addresses. The code for this is public at GitHub, but it's written in Java.

When looking around for this post, I found a lot of references to a specific software called the Stanford Named Entity Recognizer, also written in Java, but which has a .NET port. I haven't used the software, but it seems as it is a very thorough implementation of a Named Entity Recognizer. Named Entity Recognition (NER) labels sequences of words in a text which are the names of things, such as person and company names, or gene and protein names. It comes with well-engineered feature extractors for Named Entity Recognition, and many options for defining feature extractors. Included with the download are good named entity recognizers for English, particularly for the 3 classes (PERSON, ORGANIZATION, LOCATION). Perhaps this would also come in handy.

This is as far as I am willing to go without discussing existing code or actually writing some. For more details, contact me and we can work on it.

More random stuff:
The primary advantage of CRFs over hidden Markov models is their conditional nature, resulting in the relaxation of the independence assumptions required by HMMs in order to ensure tractable inference. Additionally, CRFs avoid the label bias problem, a weakness exhibited by maximum entropy Markov models (MEMMs) and other conditional Markov models based on directed graphical models. CRFs outperform both MEMMs and HMMs on a number of real-world sequence labeling tasks. - from Conditional Random Fields: An Introduction

Tutorial on Conditional Random Fields for Sequence Prediction

CRFsuite - Documentation

Extracting named entities in C# using the Stanford NLP Parser

Tutorial: Conditional Random Field (CRF)