Faster Than SGD 1: Variance Reduction

SGD is well-known for large-scale optimization. In my mind, there are two (and only two) fundamental improvements since the original introduction of SGD: (1) variance reduction, and (2) acceleration. In this guest post at Princeton’s OptiML group, I’d love to conduct a survey regarding (1), and I’d like to especially thank those ICML’16 participants who pushed me to write this post 🙂

Advertisements
This entry was posted in Algorithm, Learning Theory, Optimization. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s