
Summary
Patterns
One of the biggest fears candidates have is that coding interview questions rely on “blind insights”—fancy tricks you couldn’t possibly know unless you’d seen the answer already. But these insights aren’t blind. They come from knowledge of a small set of patterns that come up again and again. If you can learn the patterns, you can unlock the insights. Today, I’m going to teach you the most common pattern: hashing. First, a couple disclaimers:- To really understand what we’re doing here, you’ll want a working understanding of big O notation. If you’re a bit rusty, you might want to read up on big O.
- We’ll be writing Python code, but we’ll avoid some fancy “Pythonisms” in favor of making our procedure as transparent as possible. Think of it as pseudocode that happens to be runnable Python.
Problem 1: Mode
Given an array of numbers, return the mode—the number that appears the most times. This is a common programming interview question. And as with many great interview questions, there are at least three ways to solve it.Solution 1: Brute Force
By “brute force” we mean “looking at each possible answer.“ In this case, that’s looking at each number and getting its count by looping through the array again. From all these counts, we return the number with the max count.def get_mode_brute_force(nums):This’ll work, and it’s an okay place to start. It’ll take O(n²) time, since we’re nesting two loops that each go through all n items in our array. How could we bring down that time cost? We’d need to avoid looping through the array twice… But how can we know how many times each number appears, without looping through the whole array once for each number?mode = None
max_count = 0
for potential_mode in nums:
#get its count
count = 0
for num in nums:
if num == potential_mode:
count += 1
#is it a new mode?
if count > max_count:
max_count = count
mode = potential_mode
return mode
Solution 2: Sorting
If we sort the array, all occurrences of a number will be next to each other. So in just one pass, we can count the occurrences for each number as we encounter it.def get_mode_sorting(nums):#edge cases
if len(nums) == 0:
return None
elif len(nums) == 1:
return nums[0]
This’ll cost O(nlogn) time in total. That’s faster than O(n²), so it’s an improvement. But the driver of our time cost is that first sorting step, which costs O(nlogn) time. If we avoid sorting, we can bring the time cost all the way down to O(n).sorted_nums = sorted(nums)
mode, previous_num = None, None
max_count, current_count = 0, 0
for current_num in sorted_nums:
#update count
if current_num == previous_num:
current_count += 1
#new mode?
if current_count > max_count:
max_count = current_count
mode = current_num
# reset for next iteration
if current_num != previous_num:
current_count = 1
previous_num = current_num
return mode
Solution 3: Hashing
Let’s take one walk through the array, collecting the counts for each number in a dictionary (called a “hash map,” “hash table,” or “hash” in many non-Python languages). That’s the key—using a dictionary. They have O(1)-time insertions and lookups (on average). So we can collect the counts for all of our numbers in just one pass, by updating each number’s count in our dictionary as we go.def get_mode_hashing(nums):This takes O(n) time, which is the optimal runtime for this problem! And we unlocked it by exploiting the O(1)-time insertions and lookups that dictionaries give us. Using a dictionary or similar data structure to save time is so often the “trick” or “blind insight” needed to get the optimal answer that it should just always be our first thought in a coding interview. To help us build an intuition for how to apply hashing to other problems, we’ll look at a couple more examples.# keys: numbers in our input array
# values: the number of times that key number appears
# e.g.: if 5 appears 3 times, then counts[5] == 3
counts = {}
# step 1: get the counts
for num in nums:
if num in counts:
counts[num] += 1
else:
counts[num] = 1
# step 2: get the number with the max count
max_count = 0
mode = None
for num, count in counts.iteritems():
if count > max_count:
max_count = count
mode = num
return mode
Problem 2: Repeat Number
Given an array of numbers where one number appears twice, find the repeat number. If we make hashing our first thought, we can derive the optimal solution pretty quickly! We could even use the exact same “counts” dictionary again, returning a number as soon as its count crept above 1. But we don’t really need the count for each number—we just need to know whether it has appeared at all. So it’s a bit cleaner to use a set in this case. The idea is similar—under the hood, sets use “hashing” to get O(1)-time insertions and lookups, just like hash maps. In fact, the common set data structure in Java is called “HashSet.”def get_repeat_number(nums):This takes O(n) time, which is the best we can do for this problem! And just as with our first problem, we could have used a brute force approach (nesting two loops) to get O(n²) time. Or we could have started off by sorting, to get O(nlgn) time. See? There are repeating patterns here! If the brute force and/or sorting solutions don’t come to you right away, try to derive them. Part of being great at algorithm design is being able to see the different ways to solve the same problem—including the less efficient ways!nums_seen = set()
for num in nums:
if num in nums_seen:
return num
nums_seen.add(num)
Problem 3: Sorting
Given an array of numbers in the range 1..1000, return a new array with those same numbers, in sorted order. There may be repeats in the input array. If there are, you should include those repeats in your sorted answer. First thought: hash maps! Let’s use that same “counts” dictionary we used in problem 1. Once we have the count for each number (which we can get in O(n) time) we just need to walk through the range 1..1000 once, adding each number to our “sorted” array x times, where x is its count from the counts dictionary.def sort_nums(nums):This is called a counting sort. Importantly, it only works well if we can “bound” our numbers into a certain small-ish range (in this case, 1..1000). Otherwise step 2 above could be quite inefficient. As with the previous two problems, we could use a “brute force” approach, by nesting two loops—we’d end up with a selection sort or an insertion sort, taking O(n²) time. See? Brute force is a pattern, too!counts = {}
# step 1: get the counts
for num in nums:
if num in counts:
counts[num] += 1
else:
counts[num] = 1
# step 2: generate the sorted array
sorted_nums = []
for num in range(1, 1001):
if num in counts:
for _ in range(counts[num]):
sorted_nums.append(num)
return sorted_nums