Spielman-Teng’s 2004 paper [ST04] on almost-linear Laplacian linear solver has been divided and re-written into three very technically involved ones [ST08a], [ST08b] and [ST08c]. I am planning to write memos for all of them, while this post is the Part I for the first paper [ST08a: A Local Clustering Algorithm for Massive Graphs and its Application to Nearly-Linear Time Graph Partitioning].
This paper at this moment is not the most efficient one, see for instance [Andersen-Peres’09] using evolving sets. Also, in the earlier version [ST04] has provided a provably worse algorithm, however, that one is much easier to understand. This current post of mine is shooting for the harder [ST08a] unfortunately.
The goal of [ST08a] is to construct a local clustering algorithm Nibble, that satisfies for any cut that is sparse, if one starts from a vertex and perform random walk, he can almost recover the set and the running time is almost-linear with respect to . One can then further apply this local algorithm multiple times, to construct Partition which is essentially a pseudo-approximation to balanced cut. Partition is also of further use in [ST08b] to construct spectral sparsifiers for a graph. Formally,
Theorem 2.1 (Nibble). Nibble() is an algorithm that runs in time , and moreover (where is the volume function):
- (N.1) If the output is non-empty, then and .
- (N.2) Each set satisfying and , has a subset such that, (a) , and (b) , if the output Nibble is nonempty, then .
- (N.3) Each such can be partitioned into subsets such that if , then Nibble will not be empty.
In the earliest paper [ST04], the requirement on conductance of was actually . This has been improved in [ST08a].
The Algorithm for Nibble
Basically, Nibble will try to compute the random walk probability vector to the first steps. It starts with vector , and then perform transition matrix at each step , but zero out those indices ‘s whose probability is smaller than , where .
At each step , it sorts all vertices according to probabilities from large to small, and let denote the set of vertices with largest normalized probabilities . Then, Nibble will terminate and output if all of the following are satisfied:
- (C.1) ,
- (C.2) ,
- (C.3) , and
- (C.4) the first (sorted by normalized probabilities) (in terms of volume) vertices all have normalized probabilities at least .
Of course, it won’t be an easy job to show that all of them are likely to be satisfied, and in fact those are the duties of (N.3). Let us do two easy things first, that is to show (N.1) and (N.2), while since both of them assume that a non-empty set is returned, we automatically have (C.1-4) as part of the assumptions for free.
Proof of (N.1) and (N.2)
In fact, (N.1) immediately follows from (C.1) and (C.2) so there is nothing to say. We want to show that (N.2) follows from (C.3) and (C.4).
It is quite famous in random walk theory that, starting from the stationary distribution in , after steps, the amount of probability mass that stays inside is lower bounded . (In fact, it is trivial to extend it to , which was used by [Gharan-Trevisan’12] to achieve a different local partitioning algorithm.) As a consequence, using Markov bound, for at least half number of vertices in , if we start from that single vertex (rather than the stationary distribution on ), the probability mass of going out of is at most . Define this to be the set .
Next, we show (N.2). If we start from such a vertex , then because there is a gap between the conductance requirement on in (N.2), and the total number of steps we make for random walk, the probability mass that goes out of is upper bounded by . However, using (C.4), for at least the first (sorted by normalized probabilities) (in terms of volume) vertices output by Nibble, they have probability mass at least , this means, at most constant fraction of them could come from . By playing with the constants, one can ensure that at most half comes from outside, i.e., (N.2) is satisfied: .
High Level Idea for (N.3)
Before going into the details of proving (N.3), I’d love to provide my own overview to the proof. Since this is a very technically involved proof, I don’t think my overview would be helpful to anyone. Let me give a try:
- First of all, the total number of random walk steps, should be decomposed into epochs, each having number of steps.
- Claim 1: there exists a good epoch such that for all in that epoch, (C.4) is satisfied. Intuitively, this is an epoch in which all first (in terms of volume) vertices are reached with descent probabilities, and at the same time, is chosen so that the probability mass is moving slowly at this moment. In other words, two things are satisfied simultaneously: random walk mixes well inside some range, and it slows down in terms of walking from inside to outside.
- Claim 2: within such epoch, there are still plenty of choices of times steps to stop. One can show that if all (C.1-3) fails in this entire epoch, then the random walk mixes very quickly on the entire graph. This will somehow contradict the choices of epochs, and blame it by saying: you should have stopped the random walk earlier.
Notation on Integrated Probability
Now we need to make a definition for the integrated probability. A similar notion was first introduced by [Lovasz-Simonovits’90,’93]. For a probability vector , for notational simplicity assume that the vertices are already sorted according to normalized probabilities, i.e., . I remark here that I will not distinguish between truncated probabilities and the real probabilities from now on.
Then, define to be a piecewise-linear (concave) function with end points , , and so on. We would always have if we never did truncations. As an exercise, (C.4) above can be re-written as where is the derivative of .
It is not hard to verify that for any . More interestingly,
Lemma 2.9 (informal). for “any” , if we denote , then .
Here by “any” I mean those values of that correspond to end points of the piecewise-linear function , and is the conductance at the vertex corresponding to that endpoint.
This lemma was used in [Lovasz-Simonovits] to prove that, if a graph has set conductance at least , then using Lemma 2.9, one can show that this intergral vector converges quickly into a linear function from to . The convergence rate is .
Now, Spielman-Teng is going to do the following. Let and define . Here where so between two consecutive is an epoch.
For each , define to be the real number such that . In other words, we have epochs and for the -th epoch, we shoot for probability benchmark of , and is the cut-off volume.
It is by definition that . Now, if then define ; otherwise define . It is not hard to show that is well defined. We will soon see that is the good epoch (for vertex ) that we are looking for.
Now, for each vertex , we can compute its and suppose that this real value is in , then we put such vertex into . See the picture below for the definition of , at least in the case .
Starting from vertex , to show that is a good epoch, we want to show that for all , condition (C.4) is satisfied. (Recall that C.4 is independent of the choice of .) There are two possibilities:
- If satisfies that , we know that to move to a higher benchmark, one only needs to at most double the volume. This means the slope must be as large as in this range, which further implies is large, i.e., (C.4) is satisfied. In details, , but the numerator is almost by definition of , while the denominator is almost by definition of .
- If , this means the last benchmark, which is a constant of the probability mass, is reached very late at . By convexity, . This easily implies (C.4).
So far, I hope that I have convinced you that (C.4) is always true within some good epoch .
This is a more involved one. Basically if (C.1) is never true, then by using Lemma 2.9 one can show that the random walk mixes very quickly during epoch (letting ), and at the end of this epoch, . This will contradict with many things. For instance, if , then by plugging in , one can see that this probability is contradicting the choice of definition of , which says that . (The other case when is similar.)
Even though (C.1) can be true some times, one can similarly show that letting (C.2) or (C.3) be false is not a good idea either because the random walk mixes quickly. I don’t want to cover those two cases.
The proof of Nibble in [ST08a] in my understanding is twice as difficult as that in [ST04], while the latter provides a weaker guarantee in conductance. I don’t have time to separately write a post on [ST04].
The high level idea for Nibble as a sum-up, is to perform random walk, and half of the starting vertices in (defined as ) are good. Among them, for each vertex there exists an epoch in which the speed that the probability mass moves forward is slowed down, and at this epoch, (C.4) is satisfied. Now one needs to use the fact that this epoch is large enough, so if conductance is large all the time, the probability mass should not have moved that slowly, resulting in a contradiction.
The last remark here is that, we do not know or even in advance. Therefore, by randomly selecting a vertex from the whole graph, and randomly selecting a radius (with probability proportional to ), and then run Nibble, one can thus construct a new procedure RandomNibble. RandomNibble that guarantees to cover each set satisfying (N.2) by constant expected number of vertices, and the expected running time for RandomNibble is only . Next, by running RandomNibble again and again, shaving off sparse cuts and then continue with the remaining graph, this gives a good balance cut pseudo-approximation, known as procedure Partition. For detailed definition about Partition and its application on sparsifiers, see my other post for [ST08b]: part I, part II.