Spielman-Teng’s 2004 paper [ST04] on almost-linear Laplacian linear solver has been divided and re-written into three very technically involved papers [ST08a], [ST08b] and [ST08c]. I am planning to write memos for all of them, while this post is the Part II for the second paper [ST08b: Spectral Sparsification of Graphs].
Recap & Today’s Goal
Last time I’ve covered the high level idea to construct a spectral sparsifier, that is to decompose the graph recursively into well-connected components, and then for each component do subsampling. I’ve presented Theorem 7.1 which is an ideal proof for the existance of good sparsifiers. It relies on an exact sparsest cut subroutine, and I am going to relax this assumption to an approximate almost-linaer time sparsest cut subroutine, provided in [ST08a].
The ultimate goal of today is to show the following Theorem:
Theorem 8.1 (ApproxCut). Given with edges, parameter , there is a procedure ApproxCut() that will output a vertex set such that:
- if then , and
- w.p. , either (a) , or (b) there exists such that the set conductance .
- In addition, ApproxCut runs in time .
Theorem 8.1 is here in place of Lemma 7.2 last time. Recall that Lemma 7.2 will ensure that the other side of the cut, that is , to have a high set conductance . However, since we only have approximate solver, one cannot hope this to be better than . Also, we have a bunch of factors loss in the approximate solver. More importantly, it is non-trivial to show that the other side has high set conductance, and this is the main topic of today.
Partition from [ST08a]
The proof of Theorem 8.1 is built upon an approximate sparsest cut solver called Partition, proved in [ST08a] which I plan to cover, and has recently been improved in Andersen-Peres which I also plan to cover soon.
Theorem 8.2 (Partition). Let be the output of Partition(), where is the graph with edges, then:
- if then , and
- For every set satisfying , and :
w.p. , either (a) , or (b) .
- In addition, ApproxCut runs in time .
Let us now compare Theorem 8.1 and Theorem 8.2 by setting (and by the way the final proof will set ). Then item 1,2 and 3a are equivalent up to constants. Item 4 about complexity differs by two log factors. The only major difference is on Item 3b: Theorem 8.2 says that it will touch any good cut by a large fraction, while Theorem 8.1 says something about the other half of the cut. The relationship between them is not clear yet at this moment, and will be revisited and hopefully clear in the Summary section of this post.
ApproxCut Overview and Partition2
The final algorithm for ApproxCut, is to continuously call Partition on the graph and get set , if is large then Item 3a is satisfied and we stop, otherwise we continue calling Partition on until the graph is small. One only needs to call Partition times but need to set up the parameters for Partition carefully.
One of the factors, in fact , is due to a strengthening to Partition. Let , then by calling Partition at most number of times, one can achieve a slightly stronger result than Theorem 8.2, which is called Partition2:
Lemma 8.3 (Partition2). One can achieve Theorem 8.2 with the only major difference been on Item 3b: , where .
The proof of Lemma 8.3 is only routine. Basically Partition2 calls Partition at most times until constitutes a constant fraction of the whole graph. Item 1 in Partition2 is satisfied due to Item 1 in Partition, up to a constant. Item 2 is satisfied because of the definition of sparsity, adding up two components of the same conductance do not change the overall conductance. If Item 3a in Partition holds then it also holds for Partition2 because the set already constitutes a constant fraction of the whole graph. For the last Item 3b, if it is always true for all calls to Partition, then every time the volume of is shaved off by a factor of (see Theorem 8.3 item 3b), so cannot happen more than times. BTW, to show this last claim, one needs to check that the same from Partition2 always satisfies the conductance requirement of Item 3 for each call of Partition.
To sum up, one can simply view Partition2 as a variant of Partition, except that it either (3a) covers a constant fraction of the graph, or (3b) covers fraction of .
Proof Overview and Notations for ApproxCut
Now comes to the hardest proof of today, that is to show Theorem 8.1 using logarithmic number of calls to Partition2. Let be the -th cut returned from Partition2, and be the left-over graph after the first cuts are removed. In other words, Partition2.
Similar to the proof of Lemma 8.3, one only needs to show that Item 3b is satisfied for the final graph where . Item 1 and 2 are automatically satisfied, while if Item 3a ever happens from Partition2, the same will happen to ApproxCut and there is nothing more to prove. Thus, we assume that Partition2 always returns a set that covers fraction of any set satisfying Item 3.
The key idea of the proof is to define an imagined procedure (which is not efficient, but for the analysis only) to construct . I remark here that does not need to be known algorithmically. This is because back to Part I of this post, recall that the high level idea to use decomposition to construct sparsifiers, requires that if the cut returned by ApproxCut is small, then we will do subsampling on and recurse only on the smaller side . In the ideal decomposition we have that is of high set conductance, but using ApproxCut we only know for some superset it is of high set conductance. This doesn’t actually matter because one can do subsampling on a subgraph of a graph with high set conductance, which was also proved in Theorem 6.1 but I have not quite described last time.
Now let me describe this imagined procedure. At each step , we will maintain , as a set of high conductance plus some vertices in . In other words, and see the figure below. The key idea is to show that decreases by a factor of each time, so eventually will cover the entire and for that level is of high conductance.
Details in the Proof
The first step of the proof, is to assume that is always true. If not, then the cut found by ApproxCut will already satisfy Item 3a.
Next, let us define to be the biggest set of conductance at most , where is a factor of smaller than the conductance of , which we call . I remark here that is almost the same as the give , but decreases slightly every step. Now, according to Lemma 7.2 (recall Part I of this post), must be of high conductance, and we set .
There are three claims to show, and see the picture below for illustration:
- Claim 1: is always true.
- Claim 2: is always true.
- Claim 3: is always true.
If all of the claims are correct, then this means , so the volume of indeed shrinks by a factor of , and this cannot happen too frequently. (See Lemma 8.7 for the rigorous argument.)
To show Claim 1, one can actually argue that since , if Claim 1 is ever false, i.e., , then (or its inverse) is large and satisfies the conductance requirement in Item 3 of Partition2 in subgraph . This means, the of that step will be of large volume, and Item 3a of ApproxCut is already satisfied. Therefore, the hardest case is when Claim 1 is always true. (This is proved in Lemma 8.11 + Lemma 8.12.)
To show Claim 2, one just need to realize that by definition of being of high set conductance, the number of edges between and should be large (based on ), but is a small cut in (in fact, of conductance ). This means, must go out of by at most a volume factor of . (This is proved in Lemma 8.6.)
To show Claim 3, define locally (where in the paper this is defined as , overloading the definition). One needs to verify that satisfies the conductance requirement in Item 3, i.e., . This means, Partition2 will shave of fraction of its volume, and therefore .
In sum, [ST08b] has provided a good almost-linear time clustering algorithm, which is known as the graph decomposition or multiway partitioning. It relies on a crucial subroutine ApproxCut, which either gives a cut of good balance (i.e., covers a constant fraction of the graph), or satisfies that the larger size is well-connected. This subroutine, further relies on an almost-linear time approximate sparsest cut algorithm Partition2, which either gives a cut of good balance, or covers almost all sparse cuts.
To build an equivalence relationship between the two subroutines, one needs to argue in the following way. Suppose the ideal decomposition gives you cut , where is the smaller side and is very well connected. Now, even if we only have an approximate solver and cannot detect , because itself is a sparse cut, a large portion of will be shaved (Claim 3).
This would have solved the problem if were big (so Claim 1 is false), or were always the same. However, the later is unrealistic. The right argument is in fact, showing that (which is Claim 2). Then, although we cannot shave off a portion of something, however, the part of outside must be shaved by a factor of at least , so this cannot happen with more than logarithmic number of layers.
Thanks for following this post if you are here, and I hope this summary gives a good understanding to [ST08b].