Depth-first search

You can think of BFS like a "layered" search, where we visit the start vertex at distance 0, then the next "layer" of vertices at distance 1, then the next layer at distance 2, and so on. We really need the queue to keep track of which vertex we search from next.

Instead, you can search the graph as if it were a maze. That's depth-first search, or DFS. You keep going deeper and deeper in the maze, making choices at the various forks in the road, until you hit a dead-end. Then you back up to one of the choice points and make a different choice. If you're fairy-tale clever, you'll scatter some bread crumbs behind you, so that you can see if you've come back to the same point and are just walking in circles. Then you'd want to do the same thing as if you hit a dead end.

Whereas BFS keeps track of vertices on a queue, DFS uses a stack. We can either use our own stack, or take advantage of the run-time stack and write DFS recursively. I prefer to use the run-time stack, because I find recursive code more natural to understand than maintaining a data structure. So here is recursive pseudocode for DFS:

visit(s);

void visit(u) {
  do whatever it means to start a visit of u;
  for (each vertex v that is adjacent to u)
    if (v has not been visited yet)
       visit(v);
  do whatever it means to finish a visit of u;
}

Now, we can use a Set to keep track of which vertices have been visited. Or we can record depth-first distances, but they're not very interesting.

What's more interesting are discovery times times and finishing times for vertices. Keep a static integer time, and when we start or finish a visit of a vertex, record the current value of time and then increment time:

time = 0;
visit(s);

void visit(u) {
  u's discovery time = time;
  time++;
  for (each vertex v that is adjacent to u)
    if (v has not been visited yet)
       visit(v);
  u's finishing time = time;
  time++;
}

I'll go through an example in class.

Like BFS, DFS visits each vertex and edge at most once, and so its running time is O(n + m).

Why are discovery and finishing times helpful? For one thing, we can use them to perform a topological sort of a directed acyclic graph.

Topological sorting

A directed graph is acyclic if it contains no directed cycles. We call a directed acyclic graph a dag. Sometimes, we want to find a linear order of the vertices of a dag such that all edges in the dag go in the direction of the linear order. That is a topological sort of the dag. For example, if we order all the vertices from left to right, then all directed edges should go from left to right.

Here's a dag that represents dependencies for putting on goalie equipment in ice hockey:

An edge (u, v) indicates that you have to put on article u before putting on article v. For example, the edge from chest pad to sweater says that you have to put on the chest pad before the sweater. On the other hand, some articles have no particular relative order. For example, it doesn't matter whether you put on compression shorts before or after the T-shirt.

You can put on only one article at a time, and so you'd like to find a linear order so that you don't have to undo having put on one article in order to put on another. In other words, you'd like to topologically sort this dag of goalie gear. Here's one possible order, given by the numbers next to the vertices:

So undershorts would go on first, then socks, compression shorts, hose, cup, pants, skates, leg pads, T-shirt, chest pad, sweater, mask, catch glove, and finally the blocker. That's not the only possible order. Here's another: T-shirt, socks, chest pad, undershorts, compression shorts, cup, hose, pants, sweater, mask, skates, leg pads, catch glove, blocker.

Here is an incredibly simple algorithm to topologically sort a dag:

  1. Run DFS on the dag to compute the finishing times for all vertices.
  2. As each vertex is finished, insert it onto the front of a linked list.
  3. The linked list contains the topologically sorted vertices, in order.

This algorithm runs in O(n + m) time. Why? Step 1, DFS, takes O(n + m) time. Step 2, inserting each vertex onto the front of a linked list, takes constant time per vertex, or O(n) time altogether. Step 3 is really just saying that the result of step 2 is the answer. So the entire algorithm takes O(n + m) time.

I'll go through an example for the goalie-gear dag in class.