Spielman-Teng-B: Spectral Sparsification of Graphs (Part II)

Spielman-Teng’s 2004 paper [ST04] on almost-linear Laplacian linear solver has been divided and re-written into three very technically involved papers [ST08a], [ST08b] and [ST08c]. I am planning to write memos for all of them, while this post is the Part II for the second paper [ST08b: Spectral Sparsification of Graphs].

Recap & Today’s Goal

Last time I’ve covered the high level idea to construct a spectral sparsifier, that is to decompose the graph recursively into well-connected components, and then for each component do subsampling. I’ve presented Theorem 7.1 which is an ideal proof for the existance of good sparsifiers. It relies on an exact sparsest cut subroutine, and I am going to relax this assumption to an approximate almost-linaer time sparsest cut subroutine, provided in [ST08a].

The ultimate goal of today is to show the following Theorem:

Theorem 8.1 (ApproxCut). Given G=(V,E) with m edges, parameter \phi,p\in(0,1), there is a procedure ApproxCut(G,\phi,p) that will output a vertex set D\subset V such that:

  1. vol(D) \leq (23/25)vol(V),
  2. if D\neq \emptyset then \Phi_G(D)\leq \phi, and
  3. w.p. 1-p, either (a) vol(D)\geq (1/29) vol(V), or (b) there exists W\supseteq V-D such that the set conductance \Phi_W^G\geq \Omega(\phi^2/\log^4m).
  4. In addition, ApproxCut runs in time O(\phi^{-4}m \log^9 m\log(1/p)).

Theorem 8.1 is here in place of Lemma 7.2 last time. Recall that Lemma 7.2 will ensure that the other side of the cut, that is V-D, to have a high set conductance \Omega(\phi). However, since we only have approximate solver, one cannot hope this to be better than \Omega(\phi^2). Also, we have a bunch of \log factors loss in the approximate solver. More importantly, it is non-trivial to show that the other side has high set conductance, and this is the main topic of today.

Partition from [ST08a]

The proof of Theorem 8.1 is built upon an approximate sparsest cut solver called Partition, proved in [ST08a] which I plan to cover, and has recently been improved in Andersen-Peres which I also plan to cover soon.

Theorem 8.2 (Partition). Let D be the output of Partition(G,\tau,p), where G=(V,E) is the graph with m edges, then:

  1. vol(D) \leq (7/8)vol(V),
  2. if D\neq \emptyset then \Phi_G(D)\leq \tau, and
  3. For every set S satisfying vol(S)\leq vol(V)/2, and \Phi_G(S)\leq O(\tau^2/\log^3 m):
    w.p. 1-p, either (a) vol(D)\geq (1/4) vol(V), or (b) vol(D\cap S) \geq vol(S)/2.
  4. In addition, ApproxCut runs in time O(\tau^{-4}m \log^7 m\log(1/p)).

Let us now compare Theorem 8.1 and Theorem 8.2 by setting \phi = \tau (and by the way the final proof will set \phi=\Theta(\tau)). Then item 1,2 and 3a are equivalent up to constants. Item 4 about complexity differs by two log factors. The only major difference is on Item 3b: Theorem 8.2 says that it will touch any good cut by a large fraction, while Theorem 8.1 says something about the other half of the cut. The relationship between them is not clear yet at this moment, and will be revisited and hopefully clear in the Summary section of this post.

ApproxCut Overview and Partition2

The final algorithm for ApproxCut, is to continuously call Partition on the graph and get set D, if D is large then Item 3a is satisfied and we stop, otherwise we continue calling Partition on V-D until the graph is small. One only needs to call Partition \log m \cdot \log \log m times but need to set up the parameters for Partition carefully.

One of the factors, in fact \log \log m, is due to a strengthening to Partition. Let \varepsilon=\min\{1/\log m, 1/5\}, then by calling Partition at most \log(1/\varepsilon) number of times, one can achieve a slightly stronger result than Theorem 8.2, which is called Partition2:

Lemma 8.3 (Partition2). One can achieve Theorem 8.2 with the only major difference been on Item 3b: vol(D\cap S)\geq (1-\delta)vol(S), where \delta = \max\{\varepsilon, \Phi_G(S)/ (\tau^2/\log^3 m)\}.

The proof of Lemma 8.3 is only routine. Basically Partition2 calls Partition at most \log (1/\varepsilon) times until D constitutes a constant 1/5 fraction of the whole graph. Item 1 in Partition2 is satisfied due to Item 1 in Partition, up to a constant. Item 2 is satisfied because of the definition of sparsity, adding up two components of the same conductance do not change the overall conductance. If Item 3a in Partition holds then it also holds for Partition2 because the set D already constitutes a constant fraction of the whole graph. For the last Item 3b, if it is always true for all \log (1/\varepsilon) calls to Partition, then every time the volume of S is shaved off by a factor of 2 (see Theorem 8.3 item 3b), so cannot happen more than \log(1/\varepsilon) times. BTW, to show this last claim, one needs to check that the same S from Partition2 always satisfies the conductance requirement of Item 3 for each call of Partition.

To sum up, one can simply view Partition2 as a variant of Partition, except that it either (3a) covers a constant 1/5 fraction of the graph, or (3b) covers 1-\delta fraction of vol(S).

Proof Overview and Notations for ApproxCut

Now comes to the hardest proof of today, that is to show Theorem 8.1 using logarithmic number of calls to Partition2. Let D_i be the i-th cut returned from Partition2, and V_i=V-D_1\cup\cdots\cup D_i be the left-over graph after the first i cuts are removed. In other words, D_j = Partition2(G[V_{j-1}], \Theta(\phi), p/r, \varepsilon).

Similar to the proof of Lemma 8.3, one only needs to show that Item 3b is satisfied for the final graph V_r where r=\log m. Item 1 and 2 are automatically satisfied, while if Item 3a ever happens from Partition2, the same will happen to ApproxCut and there is nothing more to prove. Thus, we assume that Partition2 always returns a set D_j that covers (1-\delta) fraction of any set S satisfying Item 3.

The key idea of the proof is to define an imagined procedure (which is not efficient, but for the analysis only) to construct W. I remark here that W does not need to be known algorithmically. This is because back to Part I of this post, recall that the high level idea to use decomposition to construct sparsifiers, requires that if the cut S returned by ApproxCut is small, then we will do subsampling on V-S and recurse only on the smaller side S. In the ideal decomposition we have that V-S is of high set conductance, but using ApproxCut we only know for some superset W\supseteq V-S it is of high set conductance. This doesn’t actually matter because one can do subsampling on a subgraph of a graph with high set conductance, which was also proved in Theorem 6.1 but I have not quite described last time.

Now let me describe this imagined procedure. At each step i, we will maintain W_i\supseteq V_i, as a set of high conductance U_{i-1} plus some vertices in V_i. In other words, W_i = V_i \cup U_{i-1} and see the figure below. The key idea is to show that vol(V_i - U_{i-1}) decreases by a factor of 2 each time, so eventually U_{i-1} will cover the entire V_i and W_i for that level is of high conductance.

Details in the Proof

The first step of the proof, is to assume that vol(V_i)\geq (16/17)vol(V) is always true. If not, then the cut found by ApproxCut will already satisfy Item 3a.

Next, let us define S_i to be the biggest set of conductance at most \sigma_i, where \sigma_i is a factor of (1-2\varepsilon) smaller than the conductance of U_{i-1}, which we call \theta_i. I remark here that \sigma_i is almost the same as the give \phi, but decreases slightly every step. Now, according to Lemma 7.2 (recall Part I of this post), U_i := W_i - S_i must be of high conductance, and we set W_{i+1}:=V_{i+1}\cup U_i = (S_i\cap V_{i+1}) \cup U_i.

There are three claims to show, and see the picture below for illustration:

  • Claim 1: vol(S_i) \leq (2/17)vol(V) is always true.
  • Claim 2: vol(S_i \cap V_i -U_{i-1}) = vol(S_i \cap (S_{i-1}\cap V_i)) \geq 2\varepsilon vol(S_i) is always true.
  • Claim 3: vol(S_i \cap V_{i+1}) \leq \varepsilon vol(S_i) is always true.

If all of the claims are correct, then this means vol(V_i - U_{i-1}) \geq 2\varepsilon vol(S_i) \geq 2 vol(V_{i+1} \cap S_i) = 2 vol(V_{i+1}-U_i), so the volume of V_i - U_{i-1} indeed shrinks by a factor of 2, and this cannot happen too frequently. (See Lemma 8.7 for the rigorous argument.)

To show Claim 1, one can actually argue that since vol(V_i)\geq (16/17)vol(V), if Claim 1 is ever false, i.e., vol(S_i)\geq (2/17)vol(V), then S_i\cap V_i (or its inverse) is large and satisfies the conductance requirement in Item 3 of Partition2 in subgraph V_i. This means, the D_{i+1} of that step will be of large volume, and Item 3a of ApproxCut is already satisfied. Therefore, the hardest case is when Claim 1 is always true. (This is proved in Lemma 8.11 + Lemma 8.12.)

To show Claim 2, one just need to realize that by definition of U_{i-1} being of high set conductance, the number of edges between S_i and U_{i-1}-S_i should be large (based on \theta_i), but S_i is a small cut in W_i (in fact, of conductance \sigma_i\leq (1-2\varepsilon)\theta_i). This means, S_i must go out of U_{i-1} by at most a volume factor of 2\varepsilon. (This is proved in Lemma 8.6.)

To show Claim 3, define locally \mu=vol(S_i\cap V_i)/vol(S_i) (where in the paper this is defined as \delta, overloading the definition). One needs to verify that S_i \cap V_i satisfies the conductance requirement in Item 3, i.e., \Phi_G(S)/ (\tau^2/\log^3 m) \leq \frac{\varepsilon}{\mu}. This means, Partition2 will shave of (1-\frac{\varepsilon}{\mu}) fraction of its volume, and therefore vol(V_{i+1}\cap S_i) \leq \frac{\varepsilon}{\mu} vol(S_i\cap V_i) = \varepsilon vol(S_i).

Summary

In sum, [ST08b] has provided a good almost-linear time clustering algorithm, which is known as the graph decomposition or multiway partitioning. It relies on a crucial subroutine ApproxCut, which either gives a cut of good balance (i.e., covers a constant fraction of the graph), or satisfies that the larger size is well-connected. This subroutine, further relies on an almost-linear time approximate sparsest cut algorithm Partition2, which either gives a cut of good balance, or covers almost all sparse cuts.

To build an equivalence relationship between the two subroutines, one needs to argue in the following way. Suppose the ideal decomposition gives you cut (S_i, U_i), where S_i is the smaller side and U_i is very well connected. Now, even if we only have an approximate solver and cannot detect S_i, because (S_i,U_i) itself is a sparse cut, a large portion of S_i will be shaved (Claim 3).

This would have solved the problem if S_i were big (so Claim 1 is false), or S_1=S_2=\cdots were always the same. However, the later is unrealistic. The right argument is in fact, showing that vol(S_i \cap S_{i-1}) \geq 2\varepsilon vol(S_i) (which is Claim 2). Then, although we cannot shave off a 1-\varepsilon portion of something, however, the part of S_i outside U_{i-1} must be shaved by a factor of at least 2, so this cannot happen with more than logarithmic number of layers.

Thanks for following this post if you are here, and I hope this summary gives a good understanding to [ST08b].

Advertisements
This entry was posted in Algorithm, Readings. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s