Algorithms - Tree Traversals and Depth First Search - Connectivity and Exploration problems

Intro
At the moment I am looking for a job, so in my research into coding test platforms, the problems that cannot be solved with just a little thinking and some loops are 80% covered by just 8 categories of problems:
- Optimization with overlapping subproblems (coin change, longest increasing subsequence, knapsack variants, matrix chain) - solved with Dynamic Programming (memoization/tabulation).
- Pathfinding, connectivity, or level-order processing in grids/graphs (number of islands, shortest path in maze, rotten oranges) - solved with Graph Traversal: BFS and DFS.
- Hierarchical data processing (tree diameter, in order traversal, BST validation, lowest common ancestor) - solved with Tree Traversals and Properties (pre/in/post-order, level-order, recursion/DFS on trees).
- Exhaustive search with pruning (permutations, subsets, N-Queens, sudoku solver, word search) - solved with Backtracking.
- Local optimal choices that work globally (jump game, task scheduler, fractional knapsack) - solved by Greedy Algorithms.
- Efficient "top-k" or priority processing (merge k sorted lists, kth largest element, task scheduler with cooldown) - solved with Heaps / Priority Queues.
- Efficient union of sets / cycle detection in graphs (redundant connection, number of provinces, Kruskal’s MST) - solved with Union-Find (Disjoint Set Union) with path compression & union-by-rank.
- Searching in sorted/monotonic spaces (search in rotated sorted array, minimum in rotated array, binary search on answer for capacity problems) - solved by Binary Search (advanced variants).
Today I will be covering depth first search (DFS) in tree/graph traversals.
When to use DFS
Both DFS and BFS (Breadth First Search) are ways of exploring connected elements, but DFS comes more natural for people. You go as deep as you can, then backtrack when you get stuck. Unlike BFS, DFS explores exhaustively and doesn't care about levels. DFS is mostly used to find tree properties (depth, diameter), path finding, validation and most graph connectivity problems.
Recognize a problem that can be solved with DFS by these signs:
- it has a natural recursive structure (do the same thing for child/neighbor)
- you need to find a path or check the existence of one (and not necessarily the shortest one)
- the solution requires processing children/subtrees before their parents (height, diameter, subtree sum, bottom-up computation)
- is the problem about computing or validating a property along a path (path sum exists, is binary search tree (BST) valid, balanced tree, cycle exists)
- the input is a tree, binary tree, BST, N-ary tree, trie, or graph, and we do not need level-by-level processing or shortest-path distance
Basically you've got some kind of connected structure and you don't care about what happens on specific levels or you need to compare all paths to find the best.
Example 1
The poster problem for DFS is maze path finding as it lends itself perfectly for the "go as deep as possible and backtrack" idea. Here is the problem statement:
Given a 2D grid representing a maze (0 = open path, 1 = wall), determine if there is a path from the top-left cell (0,0) to the bottom-right cell (m-1, n-1). You can move in 4 directions (up, down, left, right), but cannot go through walls or outside the grid. Assume (0,0) and (m-1,n-1) are open (0).
Example grid:
var maze = new int[][]
{
[0, 0, 1, 0],
[0, 0, 0, 0],
[0, 0, 1, 0],
[1, 0, 0, 0]
}; Let's look at the naive solution (recursion without visited marking):
bool HasPath(int[][] maze, int row, int col)
{
int m = maze.Length;
int n = maze[0].Length;
if (row < 0 || row >= m || col < 0 || col >= n || maze[row][col] == 1)
return false;
if (row == m - 1 && col == n - 1)
return true;
// Try all 4 directions
return HasPath(maze, row - 1, col) || // up
HasPath(maze, row + 1, col) || // down
HasPath(maze, row, col - 1) || // left
HasPath(maze, row, col + 1); // right
}
var result = HasPath(maze, 0, 0);
Console.WriteLine(result); Here we try all possible directions to get to a solution and the result is, because there are clearly several paths to the end... a StackOverflowException! This is the error thrown when the entire stack used internally for function recursion has been used up. In other words, we recursed ourselves out of memory. The problem is that we are going in loops forever: up, down, then up again and so on. The solution is marking the visited cells so that we don't do that:
bool HasPath(int[][] maze, int row, int col,
/* marking visited */bool[][] visited = null)
{
int m = maze.Length;
int n = maze[0].Length;
// Initialize visited array on the first call
if (visited == null)
{
visited = new bool[m][];
for (int i = 0; i < m; i++)
visited[i] = new bool[n];
}
if (row < 0 || row >= m || col < 0 || col >= n || maze[row][col] == 1)
return false;
if (row == m - 1 && col == n - 1)
return true;
// avoid cycles by not visiting the same cell again
if (visited[row][col])
return false;
// mark the cell as visited
visited[row][col] = true;
// Try all 4 directions (with visited tracking)
return HasPath(maze, row - 1, col, visited) || // up
HasPath(maze, row + 1, col, visited) || // down
HasPath(maze, row, col - 1, visited) || // left
HasPath(maze, row, col + 1, visited); // right
}
var result = HasPath(maze, 0, 0);
Console.WriteLine(result); The result now is True, as expected.
Now, the path that this version of the code found is really inefficient, too, because of the way the choice of the direction is made, however, in "big O", the complexity is still time O(m*n) and space O(m*n) (because of the visited array). So remember that efficiency in theoretical terms is not the same as efficiency in engineering terms.
It's a nice idea to ponder on how would you alter the algorithm so that it doesn't always try to go back the way it came.
This code guarantees termination (a critical issue in recursive algorithms), avoids redundancy and finds a path - which is not necessarily optimal. Classic DFS. And it is depth first because it goes how far it can go before "backtracking" to a previous position and trying a new path. A breadth first search is obviously possible, but it would take a lot more resources. It would be the best solution for a requirement of finding the fastest path out of the maze, though.
Theory
Now you may have noticed that in this post about trees and graphs I presented something that works with arrays. That's because there is a lot - and I mean a lot - of theory related to graphs in computer science. Trees, too, but they are a special case of graphs, so... remember that when you walk outside looking for shade. OK, enough shade thrown on the computer scientists! It's just very easy to make fun of their hard work.
So let's delve a little in that theory by describing a few terms that will help us understand a true graph problem:
- A graph is a collection of points (called vertices or nodes) connected by lines (called edges)
- Vertices / Nodes are the things you care about (people, cities, web pages, computers, tasks, etc.)
- Edges are the relationships or connections between them (friendship, road, hyperlink, network cable, dependency, etc.)
- An undirected graph is a graph where the edges don't have a direction or rather they are always bidirectional. If A is connected to B, then B is connected to A.
- A directed graph (or a digraph) is a graph where edges have a direction. If an edge connects A to B, you need another edge to connect B to A.
- A connected graph is a graph where all the nodes are connected to at least another node.
- A cycle is a path that starts and ends at the same node.
- An acyclic graph is a graph with no cycles.
- A tree is a fully connected graph with no cycles. Trees are found everywhere because they represent hierarchies: parent-child relationships.
- every tree is an acyclic graph, but not the other way around. An acyclic graph can have more than one components (separated groups of connected nodes). A tree has N-1 edges, where N is the number of nodes.
- A binary tree is a tree where a node has at most two children, usually called the left and the right child. Binary tree nodes are assholes, they think of one child as the right one and the other is just left out.
- they are used a lot in computer science and algorithms, but we're not going to explain them here. As usual, follow the links in the blog to find out more, if you're interested.
- we will need to know what a binary tree is for when we hear the term BST (Binary search tree) which is used for data structures with fast search, insert and delete.
- A binary search tree (BST) is a tree where:
- the values of all the nodes in the left subtree are smaller than the value of the root node
- the values of all the nodes in the right subtree are larger than the value of the root node
- all subtrees are valid BSTs
Usually in these problems we get a class called TreeNode which holds an integer value and the two child nodes, something like this:
class TreeNode {
public int Value {get;set;}
public TreeNode Left {get;set;}
public TreeNode Right {get;set;}
} Use a struct or a record to show you're fancy, but then you will have to explain what's the difference between structs, records and classes.
Other problems use adjacency lists or some other ways of storing graph data.
With this nomenclature in mind, let's see some examples.
Example 2
Problem statement: Given the root of a binary tree, determine if it is a valid binary search tree.
The solution is simple:
bool IsValidBSTDfs(TreeNode node, int min=int.MinValue, int max=int.MaxValue)
{
if (node == null) return true;
if (node.Value <= min || node.Value >= max) return false;
return IsValidBSTDfs(node.Left, min, node.Value)
&& IsValidBSTDfs(node.Right, node.Value, max);
} I didn't provide a naive solution, because this is a very simple problem, but one of the common errors developers make in trying to solve it is to assume that if the node value of the current node is between left and right and then you recurse, the tree is a valid BST. However, that leads to cases where a value in the left subtree is larger than the root node or a smaller one in the right tree. Always check the requirements!
The time complexity of this is O(n), because it goes through all of the nodes once. The space complexity is O(h), where h is the height of the tree, with worse case scenario O(n) if a tree is composed of nodes with just one child and h=n.
Now you might ask how is this O(h) in space if there is a class for each node? Remember, the O notation measures the complexity of the solution, not of the problem. By recursing through h levels (even if that means it adds and removes n elements on the stack) the stack never increases with more than h elements.
Since we are talking about checking all nodes, a breadth first search is also possible. I will discuss this in the BFS post, but also in another post that I plan to do on how to NOT use the native language recursion and control the entire stack/queue yourself. When you do that, the code gets a little bit more complex, but then the difference between DFS and BFS becomes the choice between a stack and a queue. Stay tuned, it's going to be an instructive post.
Example 3
Problem statement: Given a directed graph (as adjacency list: List<List<int>> graph), detect if it contains a cycle. Nodes are labeled 0 to V-1, where V = graph.Count. A cycle exists if there's a path that starts and ends at the same node (following directions).
In this situation the data structure changes again. The node doesn't have a value, it just has a list of directions towards other nodes and is defined by its index. The structure holding this is a list of lists, where the index in the list represents the node and the value in the list is a list of indexes towards other nodes.
Let's try some code:
bool HasCycle(List<List<int>> graph, int node, bool[] visited=null)
{
visited ??= new bool[graph.Count];
if (visited[node]) return true;
visited[node] = true;
foreach (int neighbor in graph[node])
{
if (HasCycle(graph, neighbor, visited))
{
return true;
}
}
return false;
}
List<List<int>> graph = [
[1,2],
[2],
[]
];
Console.WriteLine(HasCycle(graph, 0)); // Output: True The idea here is simple: start with each neighbor of each node then recurse. If you find any node that has been visited before, you have a cycle. Or do you? The code above has a conceptual bug. Can you find it? Look at the example and tell me, is the output correct?
The answer is no. One can reach the same node multiple ways (in our case the index 2 node), but because it's a directed graph, it might not be necessarily a cycle. In our case, node 0 connects to node 1, which then connects to node 2, but node 0 also connects to 2 directly. There is no way to get from node 2 back to 0, though, so there is no cycle.
How would you solve it? The solution - in terms of code - is very simple, but one has to have seen the problem before or have a lot of time to find it. That's why I think this kind of problems are terrible for determining if you're a good developer, but I digress. Here is the working solution:
bool HasCycle(List<List<int>> graph, int node, int[] visited=null)
{
visited ??= new int[graph.Count];
visited[node] = 1;
foreach (int neighbor in graph[node])
{
if (visited[neighbor] == 1) return true;
if (visited[neighbor] == 0 && HasCycle(graph, neighbor, visited))
{
return true;
}
}
visited[node] = 2;
return false;
}
List<List<int>> graph = [
[1,2],
[2],
[]
];
Console.WriteLine(HasCycle(graph, 0)); // Output: False
graph =
[
[1],
[2],
[0]
];
Console.WriteLine(HasCycle(graph, 0)); // Output: True Simply replace the type of the visited array to int, mark a node as "visiting" with 1, then as "visited" with 2. You ignore visited nodes, but if you find one that is in visiting state as a neighbor, then you're in a cycle.
Note: this would be a lot more readable with a three state Enum, and if you feel like it go for it. People will appreciate making the code more readable. But it will take you some extra time to write up the Enum definition, so this is - in my eyes - another reason why these tests measure the wrong thing.
The time complexity of this algorithm is O(V+E) where V is the number of vertices/nodes and E is the number of edges. The space complexity is O(V), as we create an array of size V as part of the solution.
Example 4
This one is very hard to figure out, but easy to implement. You just have to have a feeling for these things to get past the more obvious gotchas.
Problem statement:
- You are given a directed acyclic graph (DAG) with n nodes numbered from 0 to n-1.
- The graph is given as an adjacency list: graph[i] = list of nodes you can go to directly from node i.
- Each node has a value given in array values[0..n-1].
- Return the maximum sum of node values you can collect by following any path in the graph
Example:
int[] values = [1, -2, 3, 4];
int[][] graph = [[1, 2], [3], [3], []]; This problem is deceptive for multiple reasons:
- It looks like "longest path" so people think NP-hard
- But it's a DAG, so longest path is solvable in linear time
- Many people waste time trying brute force / backtracking / permutations
- Many try to keep track of visited nodes unnecessarily (not needed in DAG)
- The correct insight is subtle for many: "longest path ending at each node" is easy to compute with DP + DFS
- Negative values make greedy impossible
- Large n forbids O(n²) solutions
- You must memoize properly or you'll exceed the accepted time limit
- The DP state "maximum path ending at node X" is not the first thing most people think of
The post is long enough already, so here is the code:
int[] values = [1, -2, 3, 4];
int[][] graph = [[1, 2], [3], [3], []];
int MaxPathSum(int[] values, IList<IList<int>> graph)
{
int n = values.Length;
int?[] memo = new int?[n]; // longest path ending at i
int globalMax = int.MinValue;
for (int i = 0; i < n; i++)
{
globalMax = Math.Max(globalMax, Dfs(i, values, graph, memo));
}
return globalMax;
}
int Dfs(int node, int[] values, IList<IList<int>> graph, int?[] memo)
{
if (memo[node].HasValue)
return memo[node].Value;
int best = values[node]; // just this node
foreach (int nei in graph[node])
{
// take the best path ending at nei + current node
int candidate = values[node] + Dfs(nei, values, graph, memo);
best = Math.Max(best, candidate);
}
memo[node] = best;
return best;
}
Console.WriteLine(MaxPathSum(values, graph)); A bonus in this implementation is methods using IList and accepting arrays. You've forgotten that arrays implement IList, didn't you?
Conclusion
Graph theory is a very large and important category in computer science, so explaining all of it simply in a post or two is impossible. But I hope I've clarified a lot of stuff related to the field, at least in regards to interview questions. DFS is one of the most versatile tools in your interview toolbox. Once you get comfortable with the recursive pattern (visit node -> recurse on children/neighbors -> backtrack), you’ll start seeing it everywhere: trees, graphs, backtracking, path problems, cycle detection, and more.




