In this month's column, I'll present an advanced solution to the graph maximum clique problem. The solution uses what's called a tabu algorithm, and I'll discuss how to design and test these algorithms. The idea of the maximum clique problem is to find the largest group of nodes in a graph that are all connected to one another. Take a look at the simple graph in Figure 1. The graph has nine nodes and 13 edges. The graph is unweighted (there are no priorities associated with the edges) and undirected (all edges are bidirectional). Nodes 2, 4 and 5 form a clique of size three. The maximum clique is the node set 0, 1, 3 and 4, which forms a clique of size four.
Figure 1 Graph for a Tabu Maximum Clique Algorithm
The maximum clique problem is interesting for several reasons. Although it's not apparent from the simple graph in Figure 1, the maximum clique problem is one of the most challenging in computer science. It arises in many important practical problems, such as social network communication analysis, where nodes represent people and edges represent messages or relationships. And the problem uses techniques such as greedy algorithms and tabu algorithms, which are important advanced programming techniques. Having a solution to the maximum clique problem in your personal library can be a useful practical tool, and understanding the algorithm used can add new techniques to your skill set.
This is the third column in a series that uses the maximum clique problem to illustrate advanced coding and testing techniques, but this column can be read without direct reference to the previous two. In my October column, "Graph Structures and Maximum Clique" (msdn.microsoft.com/magazine/hh456397), I described how to code an efficient data structure to hold a graph data structure in memory. In my November column, "Greedy Algorithms and Maximum Clique" (msdn.microsoft.com/magazine/hh547104), I described how a relatively simple algorithm can be used to find a solution to the difficult maximum clique problem.
In informal terms, a greedy algorithm is an algorithm that starts with a simple, incomplete solution to a difficult problem and then iteratively looks for the best way to improve the solution. The process is repeated until some stopping condition is reached. However, greedy algorithms typically have a weakness: They'll repeatedly generate the same solution over and over. Tabu algorithms are designed to deal with this weakness. The word tabu, sometimes spelled taboo, means forbidden. In simple terms, tabu algorithms maintain a list of forbidden data. The processing part of the algorithm isn't permitted to use tabu data until some prohibit time has passed. Simple forms of tabu algorithms use a fixed prohibit time. Advanced tabu algorithms adapt the prohibit time dynamically while the algorithm executes.
Figure 2 illustrates the important ideas of a tabu algorithm applied to the maximum clique problem and shows you where I'm headed in this column.
Figure 2 Tabu Maximum Clique Demo Run
I have a console application that begins by loading into memory the graph corresponding to the one shown in Figure 1. The file uses a standard graph file format called DIMACS. Designing and coding an efficient graph data structure is a significant problem in itself. I explained the graph data structure code in my October column.
The algorithm begins by selecting a single node at random and initializing a clique with that node. The algorithm then iteratively tries to find and add the best node that will increase the size of the clique. I'll explain what best node means later. If the algorithm can't find a node to add, it attempts to find the best node to drop from the clique. Behind the scenes, the algorithm remembers the last time each node was moved into the current solution clique or moved out from the clique. The algorithm uses this information to prohibit adding or dropping recently used nodes for a certain amount of time. This is the tabu part of the algorithm.
The algorithm restarts itself every so often when no progress has been made in finding a better (larger) clique. The algorithm silently stores the solutions (cliques) it has previously seen. The algorithm uses that solution history information to dynamically adapt the prohibit time. If the algorithm keeps encountering the same solutions, it increases the prohibit time to discourage recently used nodes from being used. If the algorithm doesn't see the same solutions, it decreases the prohibit time so there's a larger pool of nodes from which to choose. This is the adaptive part of the tabu algorithm.
In most situations where a greedy algorithm is used, the optimal solution is not known, so one or more stopping conditions must be specified. When the algorithm hits a stopping condition, the best solution found is displayed.
In the sections that follow, I'll walk you through the code that produced the screenshot in Figure 2. The complete source code is available at code.msdn.microsoft.com/mag201112TestRun. This column assumes you have intermediate-level programming skills with a C-family language or the Visual Basic .NET language. I use C#, but I've written the code so that you'll be able to refactor to another language such as F# or Python without too much difficulty if you want.