Interview question: write a CSV exporter
- Posted in
So the requirement is "Write a class that would export data into a CSV format". This would be different from "Write a CSV parser", which I think could be interesting, but not as wildly complex as this. The difference comes from the fact that a CSV parser brings a number of problems for the interviewed person to think about right away, but then it quickly dries up as a source for intelligent debate. A CSV exporter seems much simpler, because the developer controls the output, but it increases in complexity as the interview progresses.
This post is written from the viewpoint of the interviewer.
First of all, you start with the most basic question: Do you know what CSV is? I was going to try this question out on a guy who came interviewing for senior developer and I was excited to see how it would go. He answered he didn't know what CSV was. Bummer! I was incredulous, but then I quickly found out he didn't know much else either. CSV is a text format for exporting a number of unidimensional records. The name comes from Comma Separated Values and might at first glance appear to be a tabular data format, an idea made even more credible by Excel being able to open and export .csv files. But it is not. As the name says, it has values separated by a comma. It might even be just one record. It might be containing multiple records of different types. In some cases, the separator for value and record are not even commas or newline.
It is important to see how the interviewee explains what CSV is, because it is a concept that looks deceivingly simple. Someone who first considers the complexity of the format before starting writing the code works very differently in a team than someone who throws themselves into the code, confident (or just unrealistically optimistic) that they would solve any problem down the line.
Some ideas to explore, although it pays off to not bring them up yourself:
- What data do you need to export: arrays, data tables, list of records?
- Are the records of the same type?
- Are there restrictions on the type of record?
- What separators will there be used? How to escape values that contain chosen separators?
- Do values have restrictions, like not containing separators?
- CSV header: do we support that? What does it mean in the context of different types of input?
- Text encoding, unicode, non-ASCII characters
- How to handle null values?
- Number and date formatting
- Is there an RFC or a specification document for the CSV export format?
In this particular interview I have chosen that the CSV exporter class will only support an input of
IEnumerable<T> (this is .NET speak for a bunch of objects of the same type).
Give ample opportunities for questions from the person interviewed. This is not a speed test. It is important if the candidate considers by themselves issues like:
- are the object properties simple types? Like string, long, integer, decimal, double, float, datetime?
- since the requirement is any T, what about objects that are arrays, or self referencing, or having complex objects as properties?
Go through the code with the candidate. This shows their ability to develop software. How will they name the class, what signature will they use for export method, how they structure the code and how readable it is.
At this stage you should have a pretty good idea if the candidate is intelligent, competent and how they handle a complex problem from requirement to implementation.
This is the time to ask the questions yourself and see how they react to new information, the knowledge that they should have asked themselves the same questions and the stress of changing their design:
- are comma and newline the only supported separators?
- are separators characters or strings?
- what if an exported value is a string containing a comma?
- do you support values containing newline?
- if you use quotes to hold a value containing commas and newlines, what happens if values contain quotes
- empty or null values. Any difference? How to export them? What if the object itself is null?
- how to handle the header of the CSV, where do you get the name of the properties?
- what if the record type is an array or IEnumerable?
- what will be the numeric and date formatting used for export?
- does the candidate know what text encoding is? Which one will they use and why?
How have the answers to these questions changed the design? Did the candidate redesign the work or held tight to the original idea and tried to fix everything as it comes?
At this point you should know how the person being interviewed responds to new information, even scope creep and, maybe most importantly, to stress. But we're not done, are we?
Bring the pain
Bring up the concept of unit testing. If you are lucky, the candidate already brought it up. Either way, now it is time to:
- split the code into components: the reflection code, the export code, the file system code (if any).
- abstract components into interfaces in order to mock them in unit tests
- collect all the specifications gathered so far in order to cover all the cases
- ask the candidate to write one unit test
A seemingly simple question will take you and the interview candidate through:
- finding out how the other person thinks
- specification gathering
- code design
- technical knowledge in a multitude of directions
- development process
- separation of concerns, unit testing
- human interaction in a variety of circumstances
- determining how the candidate would fit in a team
Not bad for a one line question, right?
Be the first to post a comment