Algorithms - Greedy Algorithms - Local optimal choices that work globally
Intro
At the moment I am looking for a job, so in my research into coding test platforms, the problems that cannot be solved with just a little thinking and some loops are 80% covered by just 8 categories of problems:
- Optimization with overlapping subproblems (coin change, longest increasing subsequence, knapsack variants, matrix chain) - solved with Dynamic Programming (memoization/tabulation).
- Pathfinding, connectivity, or level-order processing in grids/graphs (number of islands, shortest path in maze, rotten oranges) - solved with Graph Traversal: BFS and DFS.
- Hierarchical data processing (tree diameter, in order traversal, BST validation, lowest common ancestor) - solved with Tree Traversals and Properties (pre/in/post-order, level-order, recursion/DFS on trees).
- Exhaustive search with pruning (permutations, subsets, N-Queens, sudoku solver, word search) - solved with Backtracking.
- Local optimal choices that work globally (jump game, task scheduler, fractional knapsack) - solved by Greedy Algorithms.
- Efficient "top-k" or priority processing (merge k sorted lists, kth largest element, task scheduler with cooldown) - solved with Heaps / Priority Queues.
- Efficient union of sets / cycle detection in graphs (redundant connection, number of provinces, Kruskal's MST) - solved with Union-Find (Disjoint Set Union) with path compression & union-by-rank.
- Searching in sorted/monotonic spaces (search in rotated sorted array, minimum in rotated array, binary search on answer for capacity problems) - solved by Binary Search (advanced variants).
Bonus:
- Efficient prefix and pattern matching - solved by Trie and KMP String Search
- Efficient range queries and updates - solved by Segment Tree and Fenwick Tree
- Shortest paths with negative weights and all-pairs - solved by Bellman-Ford and Floyd-Warshall
Today I cover greedy algorithms.
When to use Greedy algorithms
Recognize the problems that can be solved by greedy algorithms by the following signs:
- The problem asks for a maximum or minimum value under constraints (e.g., max profit, min jumps, min number of something, max items selected).
- A locally optimal choice at each step leads to a globally optimal solution (greedy choice property).
- Making the best immediate decision never forces you to regret it later (no need to backtrack or reconsider past choices).
- The problem has optimal substructure but greedy avoids full dynamic programming (DP) by using a simple rule or priority.
- Sorting or prioritizing by a specific key (end time, value/weight ratio, farthest reach, density) unlocks the optimal solution.
- Brute force or DP would be too slow (exponential or quadratic), but a greedy rule gives linear or n log n time.
Classic patterns appear: activity selection, interval scheduling, fractional knapsack, coin change (canonical coins), jump game reachability, task scheduling with cooldown.
An algorithm is greedy when it follows the problem-solving heuristic of making the locally optimal choice at each stage. In many problems, a greedy strategy does not produce an optimal solution, but a greedy heuristic can yield locally optimal solutions that approximate a globally optimal solution in a reasonable amount of time.
Classic greedy algorithm problems:
- Jump Game I & II
- Task Scheduler
- Fractional Knapsack
- Activity Selection / Non-overlapping Intervals
- Huffman Coding
- Minimum Number of Platforms (train scheduling)
One sentence rule: "If I can always pick the 'best' available option right now according to some clear criterion, and doing so never hurts the final answer, it's probably greedy."
Example - Activity selection
The classic "poster problem" for greedy algorithms is Activity Selection (also known as Interval Scheduling).
Problem statement: You are given n activities with their start time and finish time. Select the maximum number of activities that can be performed by a single person, assuming that a person can only work on a single activity at a time. Return the maximum number of activities that can be selected.
Example: activities = [[1,4], [3,5], [0,6], [5,7], [3,8], [5,9], [6,10], [8,11], [8,12], [2,14], [12,16]]
Output: 4 (one possible selection: [1,4], [5,7], [8,11], [12,16])
Constraints:
- 1 <= n <= 105
- 0 <= start < finish <= 109
Let's see how a naive solution would look like, using DP and brute force:
var countOps = 0;
// Extremely slow - exponential time
int MaxActivitiesNaive(int[][] activities)
{
var n = activities.Length;
var max = 0;
// Generate all 2^n subsets (impossible for n>20)
for (var mask = 0; mask < (1 << n); mask++)
{
List<int[]> selected = [];
for (var i = 0; i < n; i++)
{
if ((mask & (1 << i)) != 0)
{
selected.Add(activities[i]);
}
}
countOps++;
if (IsNonOverlapping(selected))
{
max = Math.Max(max, selected.Count);
}
}
return max;
}
bool IsNonOverlapping(List<int[]> acts)
{
if (acts.Count <= 1) return true;
// Sort by start time
acts.Sort((a, b) => a[0].CompareTo(b[0]));
// Check consecutive pairs
for (var i = 1; i < acts.Count; i++)
{
var (prevEnd, currStart) = (acts[i - 1][1], acts[i][0]);
// If previous ends after (or at) current starts -> overlap
if (prevEnd > currStart)
{
return false;
}
}
return true;
}
int[][] activities = [[1, 4], [3, 5], [0, 6], [5, 7], [3, 8], [5, 9],
[6, 10], [8, 11], [8, 12], [2, 14], [12, 16]];
Console.WriteLine(MaxActivitiesNaive(activities));
Console.WriteLine(countOps); This code will display 4, as expected, and then display 2048, the number of checks for IsNonOverlapping.
Why it fails:
- Time complexity O(2n * n log n) which is useless for n > 20
- No smart way to prune invalid subsets early
Now let's check the optimal Greedy Solution:
- Sort activities by finish time (earliest ending first).
- Then greedily pick the next activity that starts after the previous one ends.
var countOps = 0;
int MaxActivities(int[][] activities)
{
if (activities == null || activities.Length == 0) return 0;
// Sort by finish time - this is the greedy choice
Array.Sort(activities, (a, b) => a[1].CompareTo(b[1]));
countOps += (int)Math.Ceiling(activities.Length * Math.Log(activities.Length));
var count = 1;
var lastEnd = activities[0][1]; // finish time of first selected
for (var i = 1; i < activities.Length; i++)
{
countOps++;
// Can take this activity if it starts after last selected ends
if (activities[i][0] >= lastEnd)
{
count++;
lastEnd = activities[i][1];
}
}
return count;
}
int[][] activities = [[1, 4], [3, 5], [0, 6], [5, 7], [3, 8], [5, 9],
[6, 10], [8, 11], [8, 12], [2, 14], [12, 16]];
Console.WriteLine(MaxActivities(activities));
Console.WriteLine(countOps); This displays a countOps of 37.
This is optimal because:
- Time: O(n log n) due to sorting + O(n) pass
- Space: O(1) extra (or O(log n) for sort)
- Correctness: Proof relies on "greedy stays ahead" or "exchange argument"
- Earliest finish leaves maximum room for future activities
- Any optimal solution can be transformed into this greedy one without decreasing size
This problem is the "hello world" of greedy algorithms:
- Simple to understand
- Shows why greedy works when many people think dynamic programming (DP) is needed
- Easy to prove correctness in interviews
- Appears in countless variations (meeting rooms, non-overlapping intervals, weighted interval scheduling -> DP)
Theory
OK, you will hate me for this, but I found it an interesting tidbit. And if the interview guy knows what a matroid is, you're either applying for a really good company or you're totally screwed. Anyway...
Greedy algorithms and matroids are very closely related in a precise mathematical way, as matroids provide one of the cleanest and most general explanations for when and why greedy algorithms work. But first we need some quick definitions:
Greedy algorithm: At each step, make the locally optimal choice (pick the best-looking option right now) according to some criterion, hoping it leads to a globally optimal solution.
Matroid: A mathematical structure that generalizes the notion of "independence" (like linear independence in vector spaces or acyclic sets in graphs).
It has two key axioms:
- Hereditary property: If a set is independent, all its subsets are independent.
- Exchange property: If two independent sets A and B exist and |A| < |B|, then there exists an element in B that can be added to A while keeping it independent.
The fundamental theorem (the relationship between them) was developed by Jack Edmonds and others in 1971: A greedy algorithm correctly finds the maximum-weight independent set in a weighted set system if and only if the set system is a matroid.
In other words:
- If the problem can be modeled as finding a maximum-weight independent set in a matroid, then the greedy algorithm (sort by weight descending, add if it keeps independence) is guaranteed to give the optimal solution.
- If greedy works correctly on a problem, then the underlying structure is usually a matroid (or can be reduced to one).
Greedy fails when not a matroid:
- 0/1 Knapsack: Greedy by value/weight ratio can fail - no matroid structure (doesn't satisfy exchange)
- General coin change (non-canonical denominations): Greedy can give suboptimal - not a matroid
- Traveling Salesman Problem: Greedy nearest neighbor fails badly - no matroid
The matroid is the set of rules, not anything in the problem statement, so that might be confusing. Let's simplify that for the five year old mind of a software developer. You have a pile of toys and in that case a matroid is a set of rules that tells you:
- if any handful of toys is considered "good", then any smaller handful is also "good". There is no way you can split a "good" pile of toys and get a "bad" one.
- if any two handfuls are "good", then moving any toy from one to the other also results in two "good" handfuls.
In our example above, the toys were the activities, the good handfuls were sets of non-overlapping activities, therefore the algorithm is proven to give the optimal solution.
Greedy algorithm template
And even if the name indicates something very generic, there is in fact a "template" of greedy algorithms:
- define the greedy choice criterion - in our case, activity start time. Other examples:
- Earliest finish time
- Highest value/weight ratio
- Maximum jump distance
- Smallest end time
- sort or prioritize the elements - this is a most common first step
- sort arrays or use a priority queue if dynamic selection is needed
- initialize result/tracking variables:
- result count = 0
- current end time / current capacity / current position / etc. = starting value
- sometimes a "last selected" pointer or index
- Iterate through the elements and for each:
- Check if adding it is safe / possible / non-conflicting (greedy check: does it fit with previous choices?)
- If yes, add it to solution (increment count, update end time/remaining capacity/position)
- If no, skip it
- Return the result
Variations of the template
- No sorting needed (pure linear pass)
- Example: Jump Game I & II
- Track farthest reachable, update when you hit current end
- Priority queue instead of sort
- Example: Task Scheduler, Huffman coding
- Repeatedly pick the max/min element from heap
- Two pointers or sliding window
- Some greedy problems (longest substring, min platforms) use greedy + two pointers instead of explicit sort
In other words: prioritize by the magic key, then greedily take every valid next item that doesn't break the constraints.
Example - Task scheduler
Let's try to solve a more difficult problem. Let's call it Task Scheduler.
Problem statement: You are given n tasks represented by an array tasks where tasks[i] is the time needed to complete task i. You must schedule all tasks on a single CPU with these rules:
- The CPU can only work on one task at a time, and each task takes one unit of time to execute.
- After finishing a task, there must be at least cooldown units of time before you can start the same task again (if there are identical tasks).
- Tasks are not all unique - there can be duplicates (same task type).
You want to find the minimum total time needed to complete all tasks (including idle time). Return the minimum time units required.
Example: tasks = [1,1,2,2,3], cooldown = 2 - Output: 5
One possible schedule: 1,2,3,1,2
Total time = 5 (no idle slots)
int TaskScheduler(int[] tasks, int cooldown)
{
// Step 1: Count the frequency of each task type (since same value = same type)
// We use a dictionary to store how many instances of each task value there are
var freq = new Dictionary<int, int>();
foreach (var t in tasks)
{
freq[t] = freq.GetValueOrDefault(t, 0) + 1;
}
// Step 2: Create a max-heap (priority queue) to always pick the task type with
// the most remaining instances. We use a custom comparer to make it max-heap
// based on count (higher count first)
var pq = new PriorityQueue<(int count, int task), int>(
// Max-heap: higher count has higher priority
Comparer<int>.Create((a, b) => b.CompareTo(a))
);
// Enqueue all task types with their initial counts
foreach (var kv in freq)
{
// Enqueue (count, task), prioritized by count
pq.Enqueue((kv.Value, kv.Key), kv.Value);
}
// Step 3: Create a min-heap for tasks on cooldown
// This tracks when a task type becomes available again after cooldown
// (timeAvailable, task), prioritized by timeAvailable (earliest first)
var cooldownQueue = new PriorityQueue<(int timeAvailable, int task), int>();
// Step 4: Simulate time starting from 0
var time = 0;
// Loop until no more tasks in ready queue or cooldown queue
while (pq.Count > 0 || cooldownQueue.Count > 0)
{
// Increment time by 1 each cycle (each unit represents a potential work or idle slot)
time++;
// Step 5: Move any tasks whose cooldown has ended back to the ready priority queue
// Check the earliest cooldown task and dequeue if its timeAvailable <= current time
while (cooldownQueue.Count > 0 && cooldownQueue.Peek().timeAvailable <= time)
{
var (availTime, task) = cooldownQueue.Dequeue(); // Remove from cooldown
var remaining = freq[task]; // Get current remaining count for this task
if (remaining > 0)
{
// Re-add to ready queue if still work left
pq.Enqueue((remaining, task), remaining);
}
}
// Step 6: If there is a ready task (pq not empty),
// execute one instance of the most frequent one
if (pq.Count > 0)
{
var (count, task) = pq.Dequeue(); // Get the task with highest remaining count
count--; // Execute one instance (decrement remaining)
freq[task] = count; // Update frequency
// If still remaining instances, put it back on cooldown
// Available again after current time + cooldown + 1
// (since time already includes this execution)
if (count > 0)
{
cooldownQueue.Enqueue((time + cooldown + 1, task), time + cooldown + 1);
}
}
// Else: No ready task -> CPU idles this time unit
// (time already incremented, so idle is counted)
}
// The final time is the minimum total units needed (work + idle)
return time;
}
int[] tasks = [1, 1, 2, 2, 3];
int cooldown = 2;
Console.WriteLine(TaskScheduler(tasks, cooldown)); Here's what makes this solution greedy:
- It makes locally optimal choices at each step - at every time unit where the CPU can work (when pq is not empty), the code always picks the task type with the highest remaining count (the "best" available choice right now).
- This is done via the max-heap (pq):
var (count, task) = pq.Dequeue(); - Picking the task with the most remaining instances is the greedy decision, it focuses on the current bottleneck (the type that still has the most work left).
- This is done via the max-heap (pq):
- The greedy choice property holds (or is strongly believed to) - the key insight is: always working on the task type that has the most unfinished work (when it's ready) minimizes the total time, because it reduces the chance of being stuck waiting for cooldowns later.
- This is the "never regret" part: choosing the highest-remaining task early tends to balance the load and avoid long idle periods caused by waiting for one type to become available again.
- In practice, this heuristic works very well for this problem (and is the standard accepted greedy strategy).
- Irreversible decisions, no backtracking - once a task instance is executed at a certain time, the code never undoes it or reconsiders it.
- No recursion, no trying alternative branches, no memoization table, just forward progress.
- After executing:
count--; freq[task] = count; if (count > 0) cooldownQueue.Enqueue(...);the choice is committed, and the task goes on cooldown if needed.
- Uses a priority queue to enforce the greedy rule. The max-heap is the classic greedy tool here:
- It guarantees that at every decision point, you get the "best" (highest remaining count) ready task instantly.
- This replaces sorting the entire input upfront (which wouldn't work well because tasks become ready dynamically over time).
- Overall algorithm is greedy-driven simulation - the loop simulates time tick by tick.
- At each tick:
- Move expired cooldown tasks back to ready (pq)
- If anything is ready, greedily execute the one with most remaining work
- If nothing ready, idle (time++ anyway)
- This is a greedy event-driven simulation: always do the currently best possible action.
- At each tick:
So it repeatedly chooses the currently best option (task with most remaining instances) using a max-heap. It relies on the greedy intuition that prioritizing high-remaining tasks leads to the minimum total time (the greedy choice property). There is no exhaustive search, no backtracking, no DP table, just forward greedy decisions.
However, if you were working on a real piece of software, this would probably not get you promoted. And the reason is that this entire thing can be calculated mathematically.
int[] tasks = [1, 1, 2, 2, 3];
int cooldown = 2;
var maxFreq = 0;
var countMax = 0;
foreach (var f in tasks.GroupBy(t=>t).Select(t=>t.Count()))
{
if (f > maxFreq)
{
maxFreq = f;
countMax = 1;
}
else if (f == maxFreq)
{
countMax++;
}
}
var result = Math.Max(tasks.Length,(maxFreq - 1) * (cooldown + 1) + countMax);
Console.WriteLine(result);
maxFreq = the highest frequency of any single task type
countMax = how many task types have exactly that max frequency
Think of the schedule as consisting of "frames" or "cycles" separated by cooldown gaps. The task that appears most often creates the backbone of the schedule. Each pair of occurrences of that task needs at least cooldown idle slots between them. So for maxFreq occurrences you need (maxFreq - 1) gaps between them and each gap must be at least cooldown units long.
Therefore, you need (maxFreq - 1) * cooldown mandatory idle slots. Then add the maxFreq work slots for that frequent task, plus the work done by other tasks that can fill the gaps or after the last occurrence.
The tightest lower bound becomes: (maxFreq - 1) * (cooldown + 1) + countMax :
- (maxFreq - 1) gaps, each at least cooldown + 1 long (cooldown idles + 1 work slot for next same task)
- the number of tasks that share the max frequency (they all finish in the last cycle)
Finally take the max with total number of tasks (because you can't finish faster than the number of tasks if no idles are needed).
Conclusion
Greedy algorithms are useful when you can make an optimal decision at every step without fear of previous choices. In order word, independent choices. You need an insight about what the requirement actually is: sometimes greedy algorithm is another word for "Oh! That's what you really want!". And if you think well enough, you might not even need any greed, just computation.
That is why when you go on these algorithm interview sites, the problems in the greedy algorithm category have the most confusing descriptions and the most comments from people who didn't get what was asked of them.
Remember the matroid. In case of emergency, bring that up and see the other guy go silent.
Comments
Be the first to post a comment