Spielman-Teng-A: A Local Clustering Algorithm for Massive Graphs and its Application to Nearly-Linear Time Graph Partitioning

Spielman-Teng’s 2004 paper [ST04] on almost-linear Laplacian linear solver has been divided and re-written into three very technically involved ones [ST08a], [ST08b] and [ST08c]. I am planning to write memos for all of them, while this post is the Part I for the first paper [ST08a: A Local Clustering Algorithm for Massive Graphs and its Application to Nearly-Linear Time Graph Partitioning].

This paper at this moment is not the most efficient one, see for instance [Andersen-Peres’09] using evolving sets. Also, in the earlier version [ST04] has provided a provably worse algorithm, however, that one is much easier to understand. This current post of mine is shooting for the harder [ST08a] unfortunately.

Outline

The goal of [ST08a] is to construct a local clustering algorithm Nibble, that satisfies for any cut (S,V-S) that is sparse, if one starts from a vertex v\in S and perform random walk, he can almost recover the set S and the running time is almost-linear with respect to |S|. One can then further apply this local algorithm multiple times, to construct Partition which is essentially a pseudo-approximation to balanced cut. Partition is also of further use in [ST08b] to construct spectral sparsifiers for a graph. Formally,

Theorem 2.1 (Nibble). Nibble(G,v, \phi, b) is an algorithm that runs in time O(2^b (\log^6 m)/\phi^4), and moreover (where \mu is the vol volume function):

  • (N.1) If the output C\subset V is non-empty, then \Phi(C)\leq \phi and \mu(C)\leq (5/6)\mu(V).
  • (N.2) Each set S\subseteq V satisfying \mu(S)\leq (2/3)\mu(V) and \Phi(S)\leq O(\phi^2/\log^3 m), has a subset S^g\subseteq S such that, (a) \mu(S^g)\geq \mu(S)/2, and (b) \forall v\in S^g, if the output C=Nibble(G,v,\phi,b) is nonempty, then \mu(C\cap S)\geq 2^{b-1}.
  • (N.3) Each such S^g can be partitioned into subsets S_0^g,\dots,S_l^g such that if v\in S_b^g, then  Nibble(G,v,\phi,b) will not be empty.

In the earliest paper [ST04], the requirement on conductance of S was actually \Phi(S)\leq O(\phi^3/\log^2 m). This has been improved in [ST08a].

The Algorithm for Nibble

Basically, Nibble will try to compute the random walk probability vector to the first \log^2 m / \phi^2 steps. It starts with vector q_0=\chi_v, and then perform transition matrix M at each step q_t=M q_{t-1}, but zero out those indices i‘s whose probability is smaller than \varepsilon \cdot deg(i), where \varepsilon=\phi^2 / (\log ^3 m 2^b).

At each step j, it sorts all vertices according to probabilities from large to small, and let S_j(q_t) denote the set of j vertices with largest normalized probabilities q_t(i)/deg(i). Then, Nibble will terminate and output S_j(q_t) if all of the following are satisfied:

  • (C.1) \Phi(S_j(q_t)) \leq \phi,
  • (C.2) \mu(S_j(q_t)) \leq (5/6) \mu(V),
  • (C.3) \mu(S_j(q_t)) \geq 2^b, and
  • (C.4) the first (sorted by normalized probabilities) 2^b (in terms of volume) vertices all have normalized probabilities at least \frac{q(\tilde{j})}{deg(\tilde{j})} \geq \frac{1}{\log m \cdot 2^b}.

Of course, it won’t be an easy job to show that all of them are likely to be satisfied, and in fact those are the duties of (N.3). Let us do two easy things first, that is to show (N.1) and (N.2), while since both of them assume that a non-empty set is returned, we automatically have (C.1-4) as part of the assumptions for free.

Proof of (N.1) and (N.2)

In fact, (N.1) immediately follows from (C.1) and (C.2) so there is nothing to say. We want to show that (N.2) follows from (C.3) and (C.4).

It is quite famous in random walk theory that, starting from the stationary distribution in S, after t steps, the amount of probability mass that stays inside S is lower bounded 1-t\Phi_G(S)/2. (In fact, it is trivial to extend it to (1-\Phi_G(S)/2)^t, which was used by [Gharan-Trevisan’12] to achieve a different local partitioning algorithm.) As a consequence, using Markov bound, for at least half number of vertices in S, if we start from that single vertex (rather than the stationary distribution on S), the probability mass of going out of S is at most t\Phi_G(S). Define this to be the set S^g.

Next, we show (N.2). If we start from such a vertex v\in S^g, then because there is a \log m gap between the conductance requirement on S in (N.2), and the total number of steps we make for random walk, the probability mass that goes out of S is upper bounded by 1/\log m. However, using (C.4), for at least the first (sorted by normalized probabilities) 2^b (in terms of volume) vertices output by Nibble, they have probability mass at least 1/(\log m 2^b), this means, at most constant fraction of them could come from \overline{S}. By playing with the constants, one can ensure that at most half comes from outside, i.e., (N.2) is satisfied: \mu(C\cap S)\geq 2^{b-1}.

High Level Idea for (N.3)

Before going into the details of proving (N.3), I’d love to provide my own overview to the proof. Since this is a very technically involved proof, I don’t think my overview would be helpful to anyone. Let me give a try:

  • First of all, the total number of \log^2 m / \phi^2 random walk steps, should be decomposed into \log m epochs, each having \log m/\phi^2 number of steps.
  • Claim 1: there exists a good epoch such that for all t in that epoch, (C.4) is satisfied. Intuitively, this is an epoch in which all first (in terms of volume) 2^b vertices are reached with descent probabilities, and at the same time, b is chosen so that the probability mass is moving slowly at this moment. In other words, two things are satisfied simultaneously: random walk mixes well inside some 2^b range, and it slows down in terms of walking from inside to outside.
  • Claim 2: within such epoch, there are still plenty of choices of \log m/\phi^2 times steps to stop. One can show that if all (C.1-3) fails in this entire epoch, then the random walk mixes very quickly on the entire graph. This will somehow contradict the choices of epochs, and blame it by saying: you should have stopped the random walk earlier.

Notation on Integrated Probability

Now we need to make a definition for the integrated probability. A similar notion was first introduced by [Lovasz-Simonovits’90,’93]. For a probability vector q_t, for notational simplicity assume that the vertices are already sorted according to normalized probabilities, i.e., q_t(1)/deg(1) \geq q_t(2)/deg(2) \geq \cdots. I remark here that I will not distinguish between truncated probabilities and the real probabilities from now on.

Then, define I(q_t,\cdot) to be a piecewise-linear (concave) function with end points I(q_t,0)=0, I(q_t,deg(1))=q_t(1), I(q_t,deg(1)+deg(2))=q_t(1)+q_t(2) and so on. We would always have I(q_t,\mu(V))=1 if we never did truncations. As an exercise, (C.4) above can be re-written as I_x(q_t,2^b)\geq \frac{1}{\log m \cdot 2^b} where I_x is the derivative of I.

It is not hard to verify that I(q_{t+1},x)\leq I(q_t,x) for any x. More interestingly,

Lemma 2.9 (informal). for “any” x, if we denote \hat{x}=\min\{x,2m-x\}, then I(q_{t+1},x)\leq \frac{1}{2}\big( I(p_t,x-2\phi \hat{x}) + I(p_t,x+2\phi \hat{x})\big).

Here by “any” I mean those values of x that correspond to end points of the piecewise-linear function I, and \phi is the conductance at the vertex corresponding to that endpoint.

This lemma was used in [Lovasz-Simonovits] to prove that, if a graph has set conductance at least \phi, then using Lemma 2.9, one can show that this intergral vector I converges quickly into a linear function from (0,0) to (2m,1). The convergence rate is (1-\phi^2)^t.

Refining S^g

Now, Spielman-Teng is going to do the following. Let t_1:=\log m/\phi^2 and define t_h = h\cdot t_1. Here h = 0,1,\dots,\ell+1 where \ell:=\Theta(\log m) so between two consecutive [t_{h-1}, t_h] is an epoch.

For each h, define x_h to be the real number such that I(q_{t_h},x_h) = \frac{h+1}{\log m}. In other words, we have \log m epochs and for the h-th epoch, we shoot for probability benchmark of \frac{h+1}{\log m}, and x_h is the cut-off volume.

It is by definition that x_h > x_{h-1}. Now, if x_\ell\geq 2m/\log m then define h_v := \ell+1; otherwise define h_v := \min\{ h : x_h \leq 2 x_{h-1}\}. It is not hard to show that h_v is well defined. We will soon see that [t_{h_v-1}, t_{h_v}] is the good epoch (for vertex v) that we are looking for.

Now, for each vertex v\in S^g, we can compute its x_{h_v-1} and suppose that this real value is in [2^b, 2^{b+1}], then we put such vertex v into S^g_b. See the picture below for the definition of h_v, at least in the case x_h \leq 2 x_{h-1}.

Claim 1

Starting from vertex x\in S_b^g, to show that [t_{h_v-1}, t_{h_v}] is a good epoch, we want to show that for all t\in (t_{h_v-1},t_{h_v}], condition (C.4) is satisfied. (Recall that C.4 is independent of the choice of j.) There are two possibilities:

  • If h_v satisfies that x_{h_v}\leq 2x_{h_v-1}, we know that to move to a higher benchmark, one only needs to at most double the volume. This means the slope I_x(q_t,\cdot) must be as large as \geq \frac{1}{\log m \cdot 2^B} in this range, which further implies I_x(q_t,2^b) is large, i.e., (C.4) is satisfied. In details, I_x (q_t, x_{h_v-1}) \approx \big( I(q_t, x_{h_v}) - I(q_t,x_{h_v-1}) \big) / (x_{h_v}-x_{h_v-1}), but the numerator is almost \frac{1}{\log m} by definition of x, while the denominator is almost x_{h_v-1}\approx 2^b by definition of S_b^g.
  • If x_\ell \geq 2m/\log m, this means the last benchmark, which is a constant O(1)<1 of the probability mass, is reached very late at x_\ell. By convexity, I_x(q_t,x_\ell) \geq (1-O(1)) / 2m = \Omega(1/m). This easily implies (C.4).

So far, I hope that I have convinced you that (C.4) is always true within some good epoch t\in (t_{h_v-1},t_{h_v}].

Claim 2

This is a more involved one. Basically if (C.1) is never true, then by using Lemma 2.9 one can show that the random walk mixes very quickly during epoch [t_{h-1},t_h] (letting h=h_v), and at the end of this epoch, I(q_{t_h},x)<\frac{3x}{5m}+\sqrt{\hat{x}}\Big(1-\phi^2/2\Big)^{t_1}. This will contradict with many things. For instance, if x_\ell<2m/\log m, then by plugging in x=x_{h}\leq x_\ell < 2m/\log m, one can see that this probability I(q_{t_h},x) is O(1/\log m) contradicting the choice of definition of x_{h}, which says that I(q_{t_{h}},x_{h}) \geq (h+1)/\log m.  (The other case when  h=\ell+1 is similar.)

Even though (C.1) can be true some times, one can similarly show that letting (C.2) or (C.3) be false is not a good idea either because the random walk mixes quickly. I don’t want to cover those two cases.

Summary

The proof of Nibble in [ST08a] in my understanding is twice as difficult as that in [ST04], while the latter provides a weaker guarantee in conductance. I don’t have time to separately write a post on [ST04].

The high level idea for Nibble as a sum-up, is to perform random walk, and half of the starting vertices in S (defined as S^g) are good. Among them, for each vertex there exists an epoch in which the speed that the probability mass moves forward is slowed down, and at this epoch, (C.4) is satisfied. Now one needs to use the fact that this epoch is large enough, so if conductance is large all the time, the probability mass should not have moved that slowly, resulting in a contradiction.

The last remark here is that, we do not know S or even S_b^g in advance. Therefore, by randomly selecting a vertex v\in V from the whole graph, and randomly selecting a radius b (with probability proportional to 2^{-b}), and then run Nibble, one can thus construct a new procedure RandomNibble. RandomNibble that guarantees to cover each set S satisfying (N.2) by constant expected number of vertices, and the expected running time for RandomNibble is only O(\log^7 m / \phi^4. Next, by running RandomNibble again and again, shaving off sparse cuts and then continue with the remaining graph, this gives a good balance cut pseudo-approximation, known as procedure Partition. For detailed definition about Partition and its application on sparsifiers, see my other post for [ST08b]: part I, part II.

Advertisements
This entry was posted in Algorithm, Readings. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s