# Interview question: generate all valid combinations of n pairs of parentheses

## Definition

So, the task at hand is the subject of a common interview question: Implement an algorithm to get all valid (opened and closed) combinations of n pairs of parentheses. This means that for n=1 there is only one solution: "()". "((" or "))" are not valid, for 2 you will have "(())" and "()()" and so on. The question is trying to test how the interviewee handles recursion and what is commonly called backtracking. But as usual, there's more than one way to skin a cat, although for the life of me I can't see why you would want to do that.

The solutions here will be in C# and the expected result is an enumeration of strings containing open and closed parentheses. The code can be easily translated into other languages, including Javascript (ECMAScript 2015 introduced iterators and generator functions), but that's left to the reader. Let's begin.

## Analysis

Before we solve any problem we need to analyse it and see what are the constraints and the expected results. In this case there are several observations that can be made:

• the resulting strings will be of length n*2 (n pairs)
• they will contain n '(' characters and n ')' characters
• they cannot start with a ')' or end in a '('
• in order to generate such a string, we can start with a smaller string to which we add '(' or ')'
• we cannot add a ')' if there isn't at least one corresponding unclosed '('
• if we add a '(' we need to have enough characters left to close the parenthesis, so the number of unclosed parentheses cannot exceed the characters left to fill
• we could count the open and closed parentheses, but we only care about the number of unclosed ones, so instead of "closed" and "open" values, we can only use "open" to represent unclosed parentheses

Let's go for some variables and calculations:

• n = number of pairs
• open = number of unclosed parentheses in a string
• open cannot be negative
• one cannot add ')' if open = 0
• one cannot add '(' if open >= n*2 - substring.length

## Recursive solution

A simple implementation of these requirements can done with recursion:

``````public IEnumerable<string> GenerateRecursive(int n, string root = "", int open = 0)
{
// substring is long enough, return it and exit
if (root.Length == n * 2)
{
yield return root;
yield break;
}
// if we can add '(' to existing substring, continue the process with the result
if (open < n * 2 - root.Length)
{
// if only C# could just 'yield IteratorFunction()' this would look sleeker
foreach (var s in GenerateRecursive(n, root + "(", open + 1))
{
yield return s;
}
}
// if we can add ')' to existing substring, continue the process with the result
if (open > 0)
{
foreach (var s in GenerateRecursive(n, root + ")", open - 1))
{
yield return s;
}
}
}
``````

However, every time you see recursion you have to ask yourself: could n be large enough to cause a stack overflow? For example this fails for n=3000. The nice thing about this method, though, is that it can be limited to the number of items you want to see. For example `var firstTen = GenerateRecursive(1000).Take(10)` is very fast, as the generation is depth first and only computes the first ten values and exits.

So, can we replace the recursion with iteration?

## Iterative solution

In order to do thing iteratively, we need to store the results of the previous step and use them in the current step. This means breadth first generation, which has its own problems. Let's see some code:

``````public IEnumerable<string> GenerateIteration(int n)
{
// using named tuples to store the unclosed parentheses count with the substring
var results = new List<(string Value,int Open)>() { ("",0) };
for (int i = 0; i < n*2; i++)
{
// each step we compute the list of new strings from the list in the previous step
var newResults = new List<(string Value, int Open)>();
foreach (var (Value, Open) in results)
{
if (Open < n * 2 - Value.Length)
{
newResults.Add((Value + "(", Open + 1));
}
if (Open > 0)
{
newResults.Add((Value + ")", Open - 1));
}
}
results = newResults;
}
return results.Select(r=>r.Value);
}``````

It's pretty sleek, but if you try something like `var firstTen = GenerateRecursive(1000).Take(10)` now it will take forever since all combinations of 1000 parentheses need to be computed and stored before taking the first 10! BTW, we can write this much nicer with LINQ, but be careful at the gotcha in the comment:

``````public IEnumerable<string> GenerateLinq(int n)
{
// looks much nicer with LINQ
IEnumerable<(string Value, int Open)> results = new[] { ("", 0) };
for (var i = 0; i < n * 2; i++)
{
results =
results
.Where(r => r.Open < n * 2 - r.Value.Length)
.Select(r => (Value: r.Value + "(", Open: r.Open + 1))
.Concat(results
.Where(r => r.Open > 0)
.Select(r => (Value: r.Value + ")", Open: r.Open - 1))
);  // but if you do not end this with a .ToList()
// it will generate a huge expression that then will be evaluated at runtime! Oops!
}
return results.Select(r => r.Value);
}``````

But can't we do better? One is going to stack overflow, the other memory overflow and the last one kind of does both.

## Incremental solution

They did say this requires an incremental solution, right? So why don't we take this literally? '(' and ')' are like 0 and 1, as ')' must always follow a '('. If you view a parenthesis string as a binary number, then all possible combinations can be encoded as numbers. This means that we could conceivably write a very fast function that would compute all possible combinations using bit operations, maybe even special processor instructions that count bits and so on. However, this would work only for n<=32 or 64 depending on the processor architecture and we don't want to get into that. But we can still use the concept!

If a string represents a fictional number, then you can start with the smallest one, increment it and check for validity. If you combine the incremental operation with the validity check you don't need to go through 2n operations to get the result. It doesn't use any memory except the current string and it is depth first generation. The best of both worlds! Let's see some code:

``````public IEnumerable<string> GenerateIncrement(int n)
{
// the starting point is n open parentheses and n closing ones
// we use the same array of characters to generate the strings we display
var arr = (new string('(', n) + new string(')', n)).ToCharArray();
// iteration will stop when incrementation reaches the "maximum" valid combination
var success = true;
while (success)
{
yield return new string(arr);
success = Increment(arr, n);
}
}

private bool Increment(char[] arr, int n)
{
// we begin with a valid string, which means there are no unclosed parentheses
var open = 0;
// we start from the end of the string
for (var i = arr.Length - 1; i >= 0; i--)
{
// ')' is equivalent to a 1. To "increment" this string we need to go to the previous position
// incrementing 01 in binary results in 10
if (arr[i] == ')')
{
open++;
continue;
}

// '(' is equivalent to a 0. We will turn it into a ')' to increment it,
// but only if there are unclosed parentheses to close
open--;
if (open == 0) continue;

// we 'increment' the value
arr[i] = ')';
// now we need to reset the rest of the array
var k = n - (open + i) / 2;
// as many opening parenthesis as possible
for (var j = i + 1; j < i + 1 + k; j++)
{
arr[j] = '(';
}
// the rest are closing parentheses
for (var j = i + 1 + k; j < n * 2; j++)
{
arr[j] = ')';
}
return true;
}
// if we reached this point it means we got to a maximum
return false;
}``````

Now doing `GenerateIncrement(1000000).Take(10)` took more to display the results than to actually compute them.

## More solutions

As this is a classic interview question, there are a billion solutions to it at LeetCode. Yet the purpose of interview questions is to find out how one thinks, not what the solution of the problem actually is. I hope this helps.

# Interview question: write a CSV exporter

## Intro

So the requirement is "Write a class that would export data into a CSV format". This would be different from "Write a CSV parser", which I think could be interesting, but not as wildly complex as this. The difference comes from the fact that a CSV parser brings a number of problems for the interviewed person to think about right away, but then it quickly dries up as a source for intelligent debate. A CSV exporter seems much simpler, because the developer controls the output, but it increases in complexity as the interview progresses.

This post is written from the viewpoint of the interviewer.

## Setup

First of all, you start with the most basic question: Do you know what CSV is? I was going to try this question out on a guy who came interviewing for senior developer and I was excited to see how it would go. He answered he didn't know what CSV was. Bummer! I was incredulous, but then I quickly found out he didn't know much else either. CSV is a text format for exporting a number of unidimensional records. The name comes from Comma Separated Values and might at first glance appear to be a tabular data format, an idea made even more credible by Excel being able to open and export .csv files. But it is not. As the name says, it has values separated by a comma. It might even be just one record. It might be containing multiple records of different types. In some cases, the separator for value and record are not even commas or newline.

It is important to see how the interviewee explains what CSV is, because it is a concept that looks deceivingly simple. Someone who first considers the complexity of the format before starting writing the code works very differently in a team than someone who throws themselves into the code, confident (or just unrealistically optimistic) that they would solve any problem down the line.

Some ideas to explore, although it pays off to not bring them up yourself:

• What data do you need to export: arrays, data tables, list of records?
• Are the records of the same type?
• Are there restrictions on the type of record?
• What separators will there be used? How to escape values that contain chosen separators?
• Do values have restrictions, like not containing separators?
• CSV header: do we support that? What does it mean in the context of different types of input?
• Text encoding, unicode, non-ASCII characters
• How to handle null values?
• Number and date formatting
• Is there an RFC or a specification document for the CSV export format?

## Implementation

In this particular interview I have chosen that the CSV exporter class will only support an input of `IEnumerable<T>` (this is .NET speak for a bunch of objects of the same type).

Give ample opportunities for questions from the person interviewed. This is not a speed test. It is important if the candidate considers by themselves issues like:

• are the object properties simple types? Like string, long, integer, decimal, double, float, datetime?
• since the requirement is any T, what about objects that are arrays, or self referencing, or having complex objects as properties?
• how to extract the values of any object (discussions about .NET reflection or Javascript object property discovery show a lot about the interviewee, especially if they start said discussions)

Go through the code with the candidate. This shows their ability to develop software. How will they name the class, what signature will they use for export method, how they structure the code and how readable it is.

At this stage you should have a pretty good idea if the candidate is intelligent, competent and how they handle a complex problem from requirement to implementation.

## Dig deeper

This is the time to ask the questions yourself and see how they react to new information, the knowledge that they should have asked themselves the same questions and the stress of changing their design:

• are comma and newline the only supported separators?
• are separators characters or strings?
• what if an exported value is a string containing a comma?
• do you support values containing newline?
• if you use quotes to hold a value containing commas and newlines, what happens if values contain quotes
• empty or null values. Any difference? How to export them? What if the object itself is null?
• how to handle the header of the CSV, where do you get the name of the properties?
• what if the record type is an array or IEnumerable?
• what will be the numeric and date formatting used for export?
• does the candidate know what text encoding is? Which one will they use and why?

How have the answers to these questions changed the design? Did the candidate redesign the work or held tight to the original idea and tried to fix everything as it comes?

At this point you should know how the person being interviewed responds to new information, even scope creep and, maybe most importantly, to stress. But we're not done, are we?

## Bring the pain

Bring up the concept of unit testing. If you are lucky, the candidate already brought it up. Either way, now it is time to:

• split the code into components: the reflection code, the export code, the file system code (if any).
• abstract components into interfaces in order to mock them in unit tests
• collect all the specifications gathered so far in order to cover all the cases
• ask the candidate to write one unit test

## Conclusion

A seemingly simple question will take you and the interview candidate through:

• finding out how the other person thinks
• specification gathering
• code design
• implementation
• technical knowledge in a multitude of directions
• development process
• separation of concerns, unit testing
• human interaction in a variety of circumstances
• determining how the candidate would fit in a team

Not bad for a one line question, right?

# Question: background jobs from a for loop using the index variable

I am going to do this in Javascript, because it's easier to write and easier for you to test (just press F12 in this page and write it in the console), but it applies to any programming language. The issue arises when you want to execute a background job (a setTimeout, an async method that you do not wait, a Task.Run, anything that runs on another execution path than the current one) inside a for loop. You have the index variable (from 0 to 10, for example) and you want to use it as a parameter for the background job. The result is not as expected, as all the background jobs use the same value for some reason.

Let's see a bit of code:

``````// just write in the console numbers from 0 to 9
// but delayed for a second
for (var i=0; i<10; i++)
{
setTimeout(function() { console.log(i); },1000);
}

// the result: 10 values of 10 after a second!``````

But why? The reason is the "scope" of the i variable. In this case, classic (EcmaScript 5) code that uses var generates a value that exists everywhere in the current scope, which for ES5 is defined as the function this code is running from or the global scope if executed directly. If after this loop we write `console.log(i)` we get 10, because the loop has incremented i, got it to 10 - which is not less than 10, and exited the loop. The variable is still available. That explains why, a second later, all the functions executed in setTimeout will display 10: that is the current value of the same variable.

Now, we can solve this by introducing a local scope inside the for. In ES5 it looked really cumbersome:

``````for (var i=0; i<10; i++)
{
(function() {
var li = i;
setTimeout(function() { console.log(li); },1000);
})();
}``````

The result is the expected one, values from 0 to 9.

What happened here? We added an anonymous function and executed it. This generates a function scope. Inside it, we added a li variable (local i) and then set executing the timeout using that variable. For each value from 0 to 9, another scope is created with another li variable. If after this code we write console.log(li) we get an error because li is undefined in this scope. It is a bit confusing, but there are 10 li variables in 10 different scopes.

Now, EcmaScript 6 wanted to align Javascript with other common use modern languages so they introduced local scope for variables by defining them differently. We can now use `let` and `const` to define variables that are either going to get modified or remain constant. They also exist only in the scope of an execution block (between curly brackets).

We can write the same code from above like this:

``````for (let i=0; i<10; i++) {
const li = i;
setTimeout(()=>console.log(li),1000);
}``````

In fact, this is more complex than it has to be, but that's because another Javascript quirk. We can simplify this for the same result as:

``````for (let i=0; i<10; i++) {
setTimeout(()=>console.log(i),1000);
}``````

Why? Because we "let" the index variable, so it only exists in the context of the loop execution block, but apparently it creates one version of the variable for each loop run. Strangely enough, though, it doesn't work if we define it as "const".

Just as an aside, this is way less confusing with for...of loops because you can declare the item as const. Don't use "var", though, or you get the same problem we started with!

``````const arr=[1,2,3,4,5];
for (const item of arr) setTimeout(()=>console.log(item),1000);``````

In other languages, like C#, variables exist in the scope of their execution block by default, but using a for loop will not generate multiple versions of the same variable, so you need to define a local variable inside the loop to avoid this issue. Here is an example in C#:

``````for (var i=0; i<10; i++)
{
var li = i;
}

Note that in the case above we added a Thread.Sleep to make sure the app doesn't close while the tasks are running and that the values of the loop will not necessarily be written in order, but that's beside the point here. Also, var is the way variables are defined in C# when the type can be inferred by the compiler, it is not the same as the one in Javascript.

I hope you now have a better understanding of variable scope.

# Interview question: Dependency Injection

## Intro

This is part of a series that I plan to build on as time goes on: technical interview questions, dissected and laid bare for both interviewers and interviewees. You can also check out the previous one: Interview question: all items in table A but not in B.

This question is a little bit more complex and abstract at the same time. The post is written more for interviewers this time, because as candidates go, you need to read the links in it if you didn't know the concepts in it. This also is not a question with a single correct answer. It comes after asking about Dependency Injection as a whole and the candidate answering correctly.

I expect senior developers to be able to go through this successfully, it is not a test for junior developers, although depending on their previous experience juniors might be able to go through it and seniors be force to reason through it.

## The test

Bonus introduction question: why use DI at all? Expected answers would be separation of concerns and testability.

The question has two steps.

### Step 1: given the following code in a legacy application, improve it to use Dependency Injection:

``````public SomeClass {
public List<Item> GetItems(int days, string filter) {
var service = new ItemService();
return service.GetItems()
}
}``````

Bonus questions:

• has the candidate worked with LINQ before?
• what does the code do?

Now, this question is about programming knowledge as it is for attention. There are three irregularities that can attract that attention:

• the most obvious: the service is being instantiated by calling the constructor
• the interviewer expects at the very least for the candidate to notice it and suggest moving the instantiation of the service in the constructor of the SomeClass class and inject it instead of using new
• there is the possibility of passing the service as a parameter, as well, but suggest that the signature of the method should remain the same to get around it. Anyway, one can discuss the idea of moving all dependencies to the constructor and/or the calling methods and get insight in the way the candidate is thinking.
• the unexplained string filter in the signature of the method
• the interviewer can either tell the candidate that it will become relevant later, because it will, or that this is a method that implements an interface, to which a more snarky candidate might reply that SomeClass implements nothing (bonus for attention)
• the use of DateTime.Now
• it is a static property that gives a different output every time so it should be taken into account for Dependency Injection or at least for unit testing

By now you have filtered out the majority of failing candidates and you are left with someone who used or at least understands DI, can read and understand code, has used or at least understood basic LINQ and you have also gauged their level of attention to detail.

If the candidate only talked about the service and they decided to create an interface for ItemService and then add it as a parameter for the constructor of SomeClass, ask them to write a unit test for the method, explain to them that testability is one of the goals of DI if you didn't cover this already

• bonus: see if they do unit testing or at least understand the concept
• if they do attempt to write the unit test, ask them what would happen if you would run the test in different days

The expected result of this part is that the candidate understands the need of abstracting DateTime.Now. It is interesting to note how they intend to abstract it, since they do not have access to the code and it is a static method/property to abstract.

Whether the candidate figured it out by themselves or it was explained to them, the expected answer is that DateTime.Now is abstracted by creating an IDateTimeService interface that is implemented as a wrapper over DateTime.

At this point the code should look like this:

``````public SomeClass {
private IItemService _itemService;
private IDateTimeService _dateTimeService;

public SomeClass(IItemService itemService, IDateTimeService dateTimeService) {
_itemService = itemService;
_dateTimeService = dateTimeService;
}

public List<Item> GetItems(int days, string filter) {
return _itemService.GetItems()
}
}``````

Also, the candidate should be asked to write a unit test, just to see they know how, for bonus points. Note if the candidate understands isolation for unit testing or does something that would work but be silly like generate the test data based on current date or duplicate the code logic in the test instead of working with static data.

### Step 2: tell the candidate that the legacy code they need to fix looks a bit different:

``````public SomeClass {
public List<Item> GetItems(int days, string filter) {
var service = new ItemService(filter);
return service.GetItems()
}
}``````

The ItemService now receives the filter as the parameter. Ask them what to do in this case.

The expected answer is a factory injected instead of the service, which will then be used to instantiate an IItemService with a parameter. Bonus discussion about software patterns can be inserted here.

There are other valid answers here, like using the DI container itself as a factory for the service, which might provoke interesting discussions in itself, like weighing constructor injection versus service provider in dependency injection and whether hybrid solutions might be better.

Bonus question: what if you cannot control the code of ItemService in step 1 and it does not implement an interface or a base class?

• warning, this might give a hint for the second part of the interview, so use it at the end
• correct answer 1: use the class as the type of the parameter and let the dependency container decide how to instantiate it
• correct answer 2: use a wrapper over the class that implements the interface and proxies to the instance methods.

## Conclusion

For me this test told me a lot more about the candidate than just their dependency injection knowledge. We got to talking, I became aware of how their minds worked and I was both pleasantly surprised when they came with alternate solutions that kind of worked and a bit irked that they went that far and didn't see the superior option. Most of the time this made me think about the differences between what I would answer and what they did and this resulted in interesting discussions that enriched not only their experience, but also mine.

Dependency injection, separation of concerns and unit testing are important concepts for any modern developer. I hope this helps devs evolve and interviewers find the best candidates... at least until all of them get to read my blog.

# Question for programmers [Regular Expressions 2]

and has 1 comment

As with all the programmer questions, I will update the post with the answer after people comment on this. Today's question is:
You have a list of regular expressions for strings to be matched against. You need to turn them into a single regular expression. How can you do it so that a string needs to match any of the initial regular expressions? How can you do it to match them all at the same time?

# Blog Question: How to use a large image as a page background, without showing it loading slowly?

A blog reader asked me to help him get rid of the ugly effect of a large background image getting loaded. I thought of several solutions, all more complicated than the rest, but in the end settled on one that seems to be working well and doesn't require complicated libraries or difficult implementation: using the img onload event.

Let's assume that the background image is on the body element of the page. The solution involves setting a style on the body to hide it (style="display:none") then adding as child of the body an image that also is hidden and that, when completing loading, shows the body element. Here is the initial code:

`<style>body {    background: url(bg.jpg) no-repeat center center fixed;}</style><body>`

And after:

`<style>body {    background: url(bg.jpg) no-repeat center center fixed;}</style><body style="display:none"><img src="bg.jpg" onload="document.body.style.display=''" style="display:none;" />`

This loads the image in a hidden img element and shows the body element when the image finished loading.

The solution might have some problems with Internet Explorer 9, as it seems the load event is not fired for images retrieved from the cache. In that case, a slightly more complex Javascript solution is needed as detailed in this blog post: How to Fix the IE9 Image Onload Bug. Also, in Internet Explorer 5-7 the load event fires for animated GIFs at every loop. I am sure you know it's a bad idea to have an animated GIF as a page background, though :)

Warning: While this hides the effect of slow loading background images, it also hides the page until the image is loaded. This makes the page appear blank until then. More complex solutions would show some simple html content while the page is loading rather than hiding the entire page, but this post is about the simplest solution for the question asked.

A more comprehensive analysis of image preloading, complete with a very nice Javascript code that covers a lot of cases, can be found at Preloading images using javascript, the right way and without frameworks

# Blog question: Address validation using CRF models

I am starting a new blog series called Blog Question, due to the successful incorporation of a blog chat that works, is free, and does exactly what it should do: Chatango. All except letting me know in real time when a question has been posted on the chat :( . Because of that, many times I don't realize that someone is asking me things and so I fail to answer. As a solution, I will try to answer questions in blog posts, after I do my research. The new label associated with these posts is 'question'.

First off, some assumptions. I will assume that the person who said I'm working on this project of address validation. Using crf models is my concern. was talking about Conditional Random Fields and he meant postal addresses. If you are reading this, please let me know if that is correct. Also, since I am .NET developer, I will use concepts related to .NET.

I knew nothing about CRFs before writing this posts, so bear with me. The Wikipedia article about them is hard to understand by anyone without mathematical (specifically probabilities and statistics) training. However the first paragraph is pretty clear: Conditional random fields (CRFs) are a class of statistical modelling method often applied in pattern recognition and machine learning, where they are used for structured prediction. Whereas an ordinary classifier predicts a label for a single sample without regard to "neighboring" samples, a CRF can take context into account. It involves a process that classifies data by taking into account neighboring samples.

A blog post that clarified the concept much better was Introduction to Conditional Random Fields. It describes how one uses so called feature functions to extract a score from a data sample, then aggregates scores using weights. It also explains how those weights can be automatically computed (machine learning).

In the context of postal address parsing, one would create an interface for feature functions, implement a few of them based on domain specific knowledge, like "if it's an English or American address, the word before St. is a street name", then compute the weighting of the features by training the system using a manually tagged series of addresses. I guess the feature functions can ignore the neighboring words and also do stuff like "If this regular expression matches the address, then this fragment is a street name".

I find this concept really interesting (thanks for pointing it out to me) since it successfully combines feature extraction as defined by an expert and machine learning. Actually, the expert part is not that relevant, since the automated weighing will just give a score close to 0 to all stupid or buggy feature functions.

Of course, one doesn't need to do it from scratch, other people have done it in the past. One blog post that discusses this and also uses more probabilistic methods specifically to postal addresses can be found here: Probabilistic Postal Address Elementalization. From Hidden Markov Models, Maximum-Entropy Markov Models, Transformation-Based Learning and Conditional Random Fields, she found that the Maximum-Entropy Markov model and the Conditional Random Field taggers consistently had the highest overall accuracy of the group. Both consistently had accuracies over 98%, even on partial addresses. The code for this is public at GitHub, but it's written in Java.

When looking around for this post, I found a lot of references to a specific software called the Stanford Named Entity Recognizer, also written in Java, but which has a .NET port. I haven't used the software, but it seems as it is a very thorough implementation of a Named Entity Recognizer. Named Entity Recognition (NER) labels sequences of words in a text which are the names of things, such as person and company names, or gene and protein names. It comes with well-engineered feature extractors for Named Entity Recognition, and many options for defining feature extractors. Included with the download are good named entity recognizers for English, particularly for the 3 classes (PERSON, ORGANIZATION, LOCATION). Perhaps this would also come in handy.

This is as far as I am willing to go without discussing existing code or actually writing some. For more details, contact me and we can work on it.

More random stuff:
The primary advantage of CRFs over hidden Markov models is their conditional nature, resulting in the relaxation of the independence assumptions required by HMMs in order to ensure tractable inference. Additionally, CRFs avoid the label bias problem, a weakness exhibited by maximum entropy Markov models (MEMMs) and other conditional Markov models based on directed graphical models. CRFs outperform both MEMMs and HMMs on a number of real-world sequence labeling tasks. - from Conditional Random Fields: An Introduction

Tutorial on Conditional Random Fields for Sequence Prediction

CRFsuite - Documentation

Extracting named entities in C# using the Stanford NLP Parser

Tutorial: Conditional Random Field (CRF)