Published on

Topological Sort

Authors
  • avatar
    Name
    Jyotir Sai
    Twitter
    Engineering Student

Let's say you have to take a number of courses and you are given the prerequisites for each course. Is it possible to finish all of the courses? This problem, known as Course Schedule on Leetcode is an excellent problem to learn how to implement the Topological Sort algorithm. Before learning Topological Sort, you should probably understand the basics of graphs as well as breadth-first search.

Why Topological Sort?

Topological sort is an algorithm that orders the nodes in a graph. The graph must be directed i.e. the edges that connect nodes should have a direction. In the case of the Course Schedule problem, we can make a directed graph to represent the problem. Let's say I have 4 courses and an array of prerequisites as follows: [[1,2], [1,3], [2,4]]. For each entry in the array, we are first given the course followed by the required prerequisite. For example, course 1's prerequisites are courses 2 and 3. Course 2's prerequisite is course 4. Course 3 and 4 do not have any prerequisites. We can easily represent this array as a directed graph:

When the tip of an arrow touches a node, we'll call that node the child node. The node that touches the tail of an arrow will be called the parent. In this example, the parent node represents the prerequisite to a course (child node). As we can see in the above graph, if we want to take course 1, we'll first have to satisfy the prerequisite courses 2 and 3. If we need to take course 2, we first need to complete course 4. So in order to take course 1, I must first take course 4 -> course 2 (or course 3) -> course 3 (or course 2) -> course 1. This can be represented in an array as: [4, 2, 3, 1] or [4, 3, 2, 1].

What about the case where it isn't possible to finish all of the courses? This happens when there is a circular dependency, for example let's say we have 2 courses and the following prerequisite array: [[1,0],[0,1]]. Course 1 depends on Course 0 but Course 0 also depends on Course 1. We have a cycle so there isn't a correct order. Therefore, our result is just an empty array [].

Why topological sort? Applying topological sort to this problem allows us to generate this result array.

How does it work?

Here are the steps in the topological sort algorithm:

  1. Create a graph of all the nodes and count how many parents they have
  2. Add nodes with no parents into a queue
  3. Pop node from queue and add to our result array
  4. Subtract the parent count of the children
  5. If parent count of children is 0, add to queue
  6. Repeat steps (2-5) until queue is empty

In the context of the Course Schedule problem, the parent nodes are the prerequisite courses. In step 1, we want to build the graph and count how many parents each node has. We can use a hash map to make the following:

As you can see, graph is a hash map where the key represents each course while the value is an array of all of the nodes it points to. So in this problem, the keys of the hashmap are the prerequisites and the values in the array are the courses they are a prerequisite for (ex. course 4 is a prereq for course 2). How can we make this with Python? It's pretty simple. I'll first make a hashmap where each key represents all of our courses, along with an empty array where we will add the node they point to.

graph = {node: [] for node in range(numCourses)}

The numCourses variable simply states how many courses there are, so if we have 4 courses, there will be 4 keys in the graph hash map. Now, for each prerequisite we'll add the courses they are a prerequisite for. I can do this with a for loop:

graph = {node: [] for node in range(numCourses)}

for (node, parent) in prerequisites:
    graph[parent].append(node)

To recall, each entry in the prerequisites array represents the course along with its prerequisite. That's all the code we need to make the graph. In step 1, we also need the count of how many parents each node has. We'll first initialize another hash map called count_parents where the key will be each course and the value will be how many parents they have.

...
count_parents = {node: 0 for node in graph.keys()}

How do we find the parent count for each node?

...
count_parents = {node: 0 for node in graph.keys()}
for parent in graph.keys():
    for child in graph[parent]:
        count_parents[child] += 1

In our graph hash map, the keys represent the parent. So we set up one for loop to go through each parent. The inner for loop then loops through each of the parent's child nodes and increments their parent count by 1. We then get a count_parents hash map that looks as follows:

count_parents = { 1: 2, 2: 1, 3: 0, 4: 0 }

That's it for step 1.

For step 2, we need to add in nodes with no parents into our queue. In our case, that is nodes 3 and 4. They are the courses with no prerequisites.

from collections import deque

...

q = deque()
for node in count_parents.keys():
    if count_parents[node] == 0:
        q.append(node)

We are using Python's deque data structure to implement our queue. It allows O(1) popping from the beginning of the queue using the popleft method. To add nodes with no parents into our queue, we simply iterate through the keys of the count_parents hash map and check which nodes have a count of 0.

For the next couple of steps, we need to repeat them inside a while loop that runs as long as our queue isn't empty. Inside the while loop, we will pop a node out of the queue and add it to our result array.

...
res = [] # we will store the correct ordering of nodes in here
while q:
    parent = q.popleft()
    res.append(parent)

Now, we need to subtract the parent count of the parent's children. If the parent count is 0, add it to our queue.

...
res = [] # we will store the correct ordering of nodes in here
while q:
    parent = q.popleft()
    res.append(parent)

    for child in graph[parent]:
        count_parents[child] -= 1

        if count_parents[child] == 0:
            q.append(child)

In this problem, when the length of the result array is equal to the number of courses it means it is indeed possible to finish all of the courses. We have a correct ordering that includes all of the courses so we can return True. If our result array is less than the number of courses, it means we were not able to find an ordering that includes all of the courses. Therefore, it is not possible to finish all of the given courses.

So to finish the problem, we simply need to check if the results array equals the number of courses. If it does, then we can return True otherwise we return False.

...
res = [] # we will store the correct ordering of nodes in here
while q:
    parent = q.popleft()
    res.append(parent)

    for child in graph[parent]:
        count_parents[child] -= 1

        if count_parents[child] == 0:
            q.append(child)

return len(res) == numCourses

Here's the full version of the code.

from collections import deque

graph = {node: [] for node in range(numCourses)}

for (node, parent) in prerequisites:
    graph[parent].append(node)

count_parents = {node: 0 for node in graph.keys()}
for parent in graph.keys():
    for child in graph[parent]:
        count_parents[child] += 1

q = deque()
for node in count_parents.keys():
    if count_parents[node] == 0:
        q.append(node)

res = [] # we will store the correct ordering of nodes in here
while q:
    parent = q.popleft()
    res.append(parent)

    for child in graph[parent]:
        count_parents[child] -= 1

        if count_parents[child] == 0:
            q.append(child)

return len(res) == numCourses

Here's a graphical explanation of what's going on.