Corrections and Discussions for my ICML 2017 tutorial

Thanks everyone for attending. In this post I’d like to provide a platform for future discussions of my talk on “Recent Advances in Stochastic Convex and Non-Convex Optimization”. The link for the talk website is here: http://people.csail.mit.edu/zeyuan/topics/icml-2017, and the video (with my personal recording) is already online.  Continue reading

Posted in Uncategorized | 2 Comments

SVD and Eigenvectors for Big Data

People generally believe that a PC can handle SVD for matrices only up to thousands by thousands. Textbooks also suggest that it is not wise to compute singular vectors one by one. In this post, I’ll refute both statements. In particular, on a PC within 10 seconds without even parallelism, we can find the top 20 eigenvectors of a 260k x 260k matrix with 1.2 million entries. Moreover, the algorithm is simply to invoke rank-1 SVD 20 times. Continue reading

Posted in Algorithm, Optimization | Leave a comment

Faster Than SGD 2: the Katyusha Acceleration

SGD is well-known for large-scale optimization. In my mind, there are so-far two fundamentally different improvements since the original introduction of SGD: (1) variance reduction, and (2) acceleration. The following picture predicts —in theory— the performance difference between the naive SGD, the variance-reduced SGD, and the accelerated SGD:
wola Continue reading

Posted in Optimization | 1 Comment

Faster Than SGD 1: Variance Reduction

SGD is well-known for large-scale optimization. In my mind, there are two (and only two) fundamental improvements since the original introduction of SGD: (1) variance reduction, and (2) acceleration. In this guest post at Princeton’s OptiML group, I’d love to conduct a survey regarding (1), and I’d like to especially thank those ICML’16 participants who pushed me to write this post 🙂

Posted in Algorithm, Learning Theory, Optimization | Leave a comment

Faster Comp-Geometry via Optimization

If one wants to compute the minimum enclosing ball (MinEB) of a set of points, would you believe that the running time can be improved by a significant factor if we randomly rotate the space? While seemingly very counter-intuitive because a ball is still a ball after rotation, my authors and I proved that this is TRUE!

More specifically, we connect MinEB (and some other problem) to optimization, and developed much faster algorithms based on stochastic gradient ideas originally used for SVM or Lasso in machine learning. One ingredient of this method is this aforementioned random rotation. This is another big surprise to me because we can now declare another victory for optimization geeks against classical (such as geometry-based) methods.

Continue reading

Posted in Algorithm, Optimization | Leave a comment

ICML posters available!

The third one is mentioned in this blog post. The other two are coming!

PS: the video of the three talks are on YouTube now.

Posted in Optimization | 1 Comment

Coordinate Descent in One Line, or Three if Accelerated

When minimizing a convex function f(x) using first-order methods, if full gradients \nabla f(x) are too costly to compute at each iteration, there are two alternatives that can reduce this per-iteration cost. One is to use a (random) coordinate gradient \nabla_i f(x), and the other is to use a stochastic gradient \tilde{\nabla} f(x) satisfying \mathbb{E}[\tilde{\nabla} f(x)] = \nabla f(x).

I focus on coordinate (gradient) descent (CD) in this tutorial, and leave it another post in the future to describe stochastic gradient descent (SGD). Part of this survey has also appeared in my conference talk at ICML. Continue reading

Posted in Optimization | Leave a comment

How to solve classification and regression fast, faster, and fastest

I am often asked what is the best algorithm to solve SVM, to solve Lasso Regression, to solve Logistic Regression, etc. At the same time, a growing number of first-order methods have been recently proposed, making even experts hard to track down the state-of-the-arts.

In this guest post on Princeton University’s OptiML blog, I finally have a chance to answer all these questions properly and simultaneously.

http://www.minimizingregret.com/2016/06/how-to-solve-classification-and-regression.html

Posted in Optimization | Leave a comment

The Complexity Zoo and Reductions in Optimization

— The following dilemma is encountered by many of my friends when teaching basic optimization: which variant/proof of gradient descent should one start with? Of course, one needs to decide on which depth of convex analysis one should dive into, and decide on issues such as “should I define strong-convexity?”, “discuss smoothness?”, “Nesterov acceleration?”, etc.

The following blog post by Elad Hazan and me made this pedagogical question much easier, through optimal reductions. It surprisingly has some non-trivial research impacts as well.

http://www.minimizingregret.com/2016/05/the-complexity-zoo-and-reductions-in.html

Posted in Optimization, Story | Leave a comment

Spielman-Teng-A: A Local Clustering Algorithm for Massive Graphs and its Application to Nearly-Linear Time Graph Partitioning

Spielman-Teng’s 2004 paper [ST04] on almost-linear Laplacian linear solver has been divided and re-written into three very technically involved ones [ST08a], [ST08b] and [ST08c]. I am planning to write memos for all of them, while this post is the Part I for the first paper [ST08a: A Local Clustering Algorithm for Massive Graphs and its Application to Nearly-Linear Time Graph Partitioning]. Continue reading

Posted in Algorithm, Readings | Leave a comment