Direction-Optimizing BFS

Jul 31, 2020
#cs


Breadth-First Search (BFS) is an algorithm that is probably covered in every introductory algorithms course. It solves the single-source shortest path (SSSP) problem for unweighted graphs - given a graph with unweighted edges and a source node \(S\), the algorithm computes the shortest distance from \(S\) to each of the other nodes. It does this by exploring all the nodes that are a distance \(1\) away from \(S\), then all the nodes that are a distance \(2\), and so on.

Given its simplicity, the idea of BFS has been known for quite some time. According to Wikipedia, it was first proposed by Konrad Zuse in 1945, and later rediscovered numerous times.

Despite its simplicity, BFS has a wide range of applications. As a result, there has been a ton of research into optimizing BFS. These days, all of the optimizations take the form of improving cache locality, using multi-processing, and other minute optimizations to squeeze out as much performance as possible. Namely, there haven't been any innovations upon the algorithm itself. As you might imagine, all of the low-hanging fruit were eaten up a long time ago.

Or were they?

In 2012, a paper titled Direction-Optimizing Breadth-First Search was published by researchers at UC Berkeley. They were able to speed up BFS on real social network graphs by up to a factor of 4.6x! Did they quadruple the number of processors used in previous papers? Did they optimize the hell out of cache performance? No, they made a fundamental change to the algorithm itself.

At a high level, the BFS algorithm iteratively computes a sequence of frontiers, where the \(i\)th frontier contains all the nodes that are a distance \(i\) from \(S\). Given the \(i\)th frontier, there are two approaches to compute the \(i+1\)th frontier: top-down and bottom-up.

The top-down approach looks at all the neighbors of the nodes in the current frontier. Every neighbor that has not yet been visited must be part of the next frontier.

On the other hand, the bottom-up approach looks at all the nodes that have not been visited yet. For each node, if at least one of its neighbors is in the current frontier, then the node must be part of the next frontier.

The key insight the authors had was that a hybrid of these two approaches would be much faster than either of them individually. When the size of the current frontier is relatively small compared to the size of the entire graph, the bottom-up approach performs poorly because it visits a lot of nodes that do not end up in the next frontier. In other words, a lot of the work is wasted. However, when the current fronter contains a large percentage of all the nodes, then the top-down approach performs poorly because it revisits nodes that are neighbors of multiple nodes in the current frontier. In other words, a lot of work is redundant. So, the authors thought, why not use the top-down approach when the frontier is small and the bottom-up approach when the frontier is large?

That's (basically) what they did, and based on this algorithmic insight they were able to achieve a \(2-4x\) speed-up on real world graphs like the Facebook friend graph! What's just as impressive is that the direction-optmizing approach can also be applied to other graph algorithms, not just BFS.

I think that this paper is a great inspiration, because it shows that there are still fundamental advances that can be made in established topics and fields, even ones like BFS that are over 50 years old.

Update (8/23/2020)

I implemented the direction-optimizing BFS algorithm and compiled it to WebAssembly so you can run it in the browser (source code).

The code generates a synthetic Kronecker graph with the desired number of nodes and edges, and then uses each of the three BFS approaches to compute the distance from a node to all the other nodes.

You can play around with the number of nodes and edges in the graph, and compare how each BFS algorithm performs. Observe for which distributions of distances does direction-optimizing perform better than top-down, and vice versa.

Loading WebAssembly...may be unsupported by your browser.




Comment