## Spielman-Teng-B: Spectral Sparsification of Graphs (Part II)

Spielman-Teng’s 2004 paper [ST04] on almost-linear Laplacian linear solver has been divided and re-written into three very technically involved papers [ST08a], [ST08b] and [ST08c]. I am planning to write memos for all of them, while this post is the Part II for the second paper [ST08b: Spectral Sparsification of Graphs].

## Recap & Today’s Goal

Last time I’ve covered the high level idea to construct a spectral sparsifier, that is to decompose the graph recursively into well-connected components, and then for each component do subsampling. I’ve presented Theorem 7.1 which is an ideal proof for the existance of good sparsifiers. It relies on an exact sparsest cut subroutine, and I am going to relax this assumption to an approximate almost-linaer time sparsest cut subroutine, provided in [ST08a].

The ultimate goal of today is to show the following Theorem:

Theorem 8.1 (ApproxCut). Given $G=(V,E)$ with $m$ edges, parameter $\phi,p\in(0,1)$, there is a procedure ApproxCut($G,\phi,p$) that will output a vertex set $D\subset V$ such that:

1. $vol(D) \leq (23/25)vol(V)$,
2. if $D\neq \emptyset$ then $\Phi_G(D)\leq \phi$, and
3. w.p. $1-p$, either (a) $vol(D)\geq (1/29) vol(V)$, or (b) there exists $W\supseteq V-D$ such that the set conductance $\Phi_W^G\geq \Omega(\phi^2/\log^4m)$.
4. In addition, ApproxCut runs in time $O(\phi^{-4}m \log^9 m\log(1/p))$.

Theorem 8.1 is here in place of Lemma 7.2 last time. Recall that Lemma 7.2 will ensure that the other side of the cut, that is $V-D$, to have a high set conductance $\Omega(\phi)$. However, since we only have approximate solver, one cannot hope this to be better than $\Omega(\phi^2)$. Also, we have a bunch of $\log$ factors loss in the approximate solver. More importantly, it is non-trivial to show that the other side has high set conductance, and this is the main topic of today.

## Partition from [ST08a]

The proof of Theorem 8.1 is built upon an approximate sparsest cut solver called Partition, proved in [ST08a] which I plan to cover, and has recently been improved in Andersen-Peres which I also plan to cover soon.

Theorem 8.2 (Partition). Let $D$ be the output of Partition($G,\tau,p$), where $G=(V,E)$ is the graph with $m$ edges, then:

1. $vol(D) \leq (7/8)vol(V)$,
2. if $D\neq \emptyset$ then $\Phi_G(D)\leq \tau$, and
3. For every set $S$ satisfying $vol(S)\leq vol(V)/2$, and $\Phi_G(S)\leq O(\tau^2/\log^3 m)$:
w.p. $1-p$, either (a) $vol(D)\geq (1/4) vol(V)$, or (b) $vol(D\cap S) \geq vol(S)/2$.
4. In addition, ApproxCut runs in time $O(\tau^{-4}m \log^7 m\log(1/p))$.

Let us now compare Theorem 8.1 and Theorem 8.2 by setting $\phi = \tau$ (and by the way the final proof will set $\phi=\Theta(\tau)$). Then item 1,2 and 3a are equivalent up to constants. Item 4 about complexity differs by two log factors. The only major difference is on Item 3b: Theorem 8.2 says that it will touch any good cut by a large fraction, while Theorem 8.1 says something about the other half of the cut. The relationship between them is not clear yet at this moment, and will be revisited and hopefully clear in the Summary section of this post.

## ApproxCut Overview and Partition2

The final algorithm for ApproxCut, is to continuously call Partition on the graph and get set $D$, if $D$ is large then Item 3a is satisfied and we stop, otherwise we continue calling Partition on $V-D$ until the graph is small. One only needs to call Partition $\log m \cdot \log \log m$ times but need to set up the parameters for Partition carefully.

One of the factors, in fact $\log \log m$, is due to a strengthening to Partition. Let $\varepsilon=\min\{1/\log m, 1/5\}$, then by calling Partition at most $\log(1/\varepsilon)$ number of times, one can achieve a slightly stronger result than Theorem 8.2, which is called Partition2:

Lemma 8.3 (Partition2). One can achieve Theorem 8.2 with the only major difference been on Item 3b: $vol(D\cap S)\geq (1-\delta)vol(S)$, where $\delta = \max\{\varepsilon, \Phi_G(S)/ (\tau^2/\log^3 m)\}$.

The proof of Lemma 8.3 is only routine. Basically Partition2 calls Partition at most $\log (1/\varepsilon)$ times until $D$ constitutes a constant $1/5$ fraction of the whole graph. Item 1 in Partition2 is satisfied due to Item 1 in Partition, up to a constant. Item 2 is satisfied because of the definition of sparsity, adding up two components of the same conductance do not change the overall conductance. If Item 3a in Partition holds then it also holds for Partition2 because the set $D$ already constitutes a constant fraction of the whole graph. For the last Item 3b, if it is always true for all $\log (1/\varepsilon)$ calls to Partition, then every time the volume of $S$ is shaved off by a factor of $2$ (see Theorem 8.3 item 3b), so cannot happen more than $\log(1/\varepsilon)$ times. BTW, to show this last claim, one needs to check that the same $S$ from Partition2 always satisfies the conductance requirement of Item 3 for each call of Partition.

To sum up, one can simply view Partition2 as a variant of Partition, except that it either (3a) covers a constant $1/5$ fraction of the graph, or (3b) covers $1-\delta$ fraction of $vol(S)$.

## Proof Overview and Notations for ApproxCut

Now comes to the hardest proof of today, that is to show Theorem 8.1 using logarithmic number of calls to Partition2. Let $D_i$ be the $i$-th cut returned from Partition2, and $V_i=V-D_1\cup\cdots\cup D_i$ be the left-over graph after the first $i$ cuts are removed. In other words, $D_j =$Partition2$(G[V_{j-1}], \Theta(\phi), p/r, \varepsilon)$.

Similar to the proof of Lemma 8.3, one only needs to show that Item 3b is satisfied for the final graph $V_r$ where $r=\log m$. Item 1 and 2 are automatically satisfied, while if Item 3a ever happens from Partition2, the same will happen to ApproxCut and there is nothing more to prove. Thus, we assume that Partition2 always returns a set $D_j$ that covers $(1-\delta)$ fraction of any set $S$ satisfying Item 3.

The key idea of the proof is to define an imagined procedure (which is not efficient, but for the analysis only) to construct $W$. I remark here that $W$ does not need to be known algorithmically. This is because back to Part I of this post, recall that the high level idea to use decomposition to construct sparsifiers, requires that if the cut $S$ returned by ApproxCut is small, then we will do subsampling on $V-S$ and recurse only on the smaller side $S$. In the ideal decomposition we have that $V-S$ is of high set conductance, but using ApproxCut we only know for some superset $W\supseteq V-S$ it is of high set conductance. This doesn’t actually matter because one can do subsampling on a subgraph of a graph with high set conductance, which was also proved in Theorem 6.1 but I have not quite described last time.

Now let me describe this imagined procedure. At each step $i$, we will maintain $W_i\supseteq V_i$, as a set of high conductance $U_{i-1}$ plus some vertices in $V_i$. In other words, $W_i = V_i \cup U_{i-1}$ and see the figure below. The key idea is to show that $vol(V_i - U_{i-1})$ decreases by a factor of $2$ each time, so eventually $U_{i-1}$ will cover the entire $V_i$ and $W_i$ for that level is of high conductance.

## Details in the Proof

The first step of the proof, is to assume that $vol(V_i)\geq (16/17)vol(V)$ is always true. If not, then the cut found by ApproxCut will already satisfy Item 3a.

Next, let us define $S_i$ to be the biggest set of conductance at most $\sigma_i$, where $\sigma_i$ is a factor of $(1-2\varepsilon)$ smaller than the conductance of $U_{i-1}$, which we call $\theta_i$. I remark here that $\sigma_i$ is almost the same as the give $\phi$, but decreases slightly every step. Now, according to Lemma 7.2 (recall Part I of this post), $U_i := W_i - S_i$ must be of high conductance, and we set $W_{i+1}:=V_{i+1}\cup U_i = (S_i\cap V_{i+1}) \cup U_i$.

There are three claims to show, and see the picture below for illustration:

• Claim 1: $vol(S_i) \leq (2/17)vol(V)$ is always true.
• Claim 2: $vol(S_i \cap V_i -U_{i-1}) = vol(S_i \cap (S_{i-1}\cap V_i)) \geq 2\varepsilon vol(S_i)$ is always true.
• Claim 3: $vol(S_i \cap V_{i+1}) \leq \varepsilon vol(S_i)$ is always true.

If all of the claims are correct, then this means $vol(V_i - U_{i-1}) \geq 2\varepsilon vol(S_i) \geq 2 vol(V_{i+1} \cap S_i) = 2 vol(V_{i+1}-U_i)$, so the volume of $V_i - U_{i-1}$ indeed shrinks by a factor of $2$, and this cannot happen too frequently. (See Lemma 8.7 for the rigorous argument.)

To show Claim 1, one can actually argue that since $vol(V_i)\geq (16/17)vol(V)$, if Claim 1 is ever false, i.e., $vol(S_i)\geq (2/17)vol(V)$, then $S_i\cap V_i$ (or its inverse) is large and satisfies the conductance requirement in Item 3 of Partition2 in subgraph $V_i$. This means, the $D_{i+1}$ of that step will be of large volume, and Item 3a of ApproxCut is already satisfied. Therefore, the hardest case is when Claim 1 is always true. (This is proved in Lemma 8.11 + Lemma 8.12.)

To show Claim 2, one just need to realize that by definition of $U_{i-1}$ being of high set conductance, the number of edges between $S_i$ and $U_{i-1}-S_i$ should be large (based on $\theta_i$), but $S_i$ is a small cut in $W_i$ (in fact, of conductance $\sigma_i\leq (1-2\varepsilon)\theta_i$). This means, $S_i$ must go out of $U_{i-1}$ by at most a volume factor of $2\varepsilon$. (This is proved in Lemma 8.6.)

To show Claim 3, define locally $\mu=vol(S_i\cap V_i)/vol(S_i)$ (where in the paper this is defined as $\delta$, overloading the definition). One needs to verify that $S_i \cap V_i$ satisfies the conductance requirement in Item 3, i.e., $\Phi_G(S)/ (\tau^2/\log^3 m) \leq \frac{\varepsilon}{\mu}$. This means, Partition2 will shave of $(1-\frac{\varepsilon}{\mu})$ fraction of its volume, and therefore $vol(V_{i+1}\cap S_i) \leq \frac{\varepsilon}{\mu} vol(S_i\cap V_i) = \varepsilon vol(S_i)$.

## Summary

In sum, [ST08b] has provided a good almost-linear time clustering algorithm, which is known as the graph decomposition or multiway partitioning. It relies on a crucial subroutine ApproxCut, which either gives a cut of good balance (i.e., covers a constant fraction of the graph), or satisfies that the larger size is well-connected. This subroutine, further relies on an almost-linear time approximate sparsest cut algorithm Partition2, which either gives a cut of good balance, or covers almost all sparse cuts.

To build an equivalence relationship between the two subroutines, one needs to argue in the following way. Suppose the ideal decomposition gives you cut $(S_i, U_i)$, where $S_i$ is the smaller side and $U_i$ is very well connected. Now, even if we only have an approximate solver and cannot detect $S_i$, because $(S_i,U_i)$ itself is a sparse cut, a large portion of $S_i$ will be shaved (Claim 3).

This would have solved the problem if $S_i$ were big (so Claim 1 is false), or $S_1=S_2=\cdots$ were always the same. However, the later is unrealistic. The right argument is in fact, showing that $vol(S_i \cap S_{i-1}) \geq 2\varepsilon vol(S_i)$ (which is Claim 2). Then, although we cannot shave off a $1-\varepsilon$ portion of something, however, the part of $S_i$ outside $U_{i-1}$ must be shaved by a factor of at least $2$, so this cannot happen with more than logarithmic number of layers.

Thanks for following this post if you are here, and I hope this summary gives a good understanding to [ST08b].