Learning Zero-One Loss Objective – The Experiment

(Continued from the previous post.)

Though the complexity depends crucially on L, however, when L is a constant (e.g. 1) there is still reason to believe that the approximated 0-1 loss function using \varphi_{\mathrm{sig}} is more accurate than the hinge loss. However, even if L is constant, the sample complexity m=\Omega(B/\epsilon^2)=\Omega(1/\epsilon^3) has cubic dependency on \epsilon. At the same time, the kernel stochastic gradient descent algorithm (in the distribution manner) requires a time complexity of O(m^2). So there comes a dilemma: neither can we set $m$ to be too large since otherwise the empirical optimization cannot finish in endurable time, nor can we set m to be too small since otherwise the generalization guarantee is not good enough.

Consider some m of medium size, just enough to be trained in several minutes for example. Though zero-one loss does not have a good generalization guarantee, however, it is a more desirable function than the hinge loss. So given training set of medium size, and we run SVM against a zero-one loss minimizer, who is to win this tug-of-war?


In the interest of fairness, the same stochastic gradient descent routine called P-packSVM [ZCW+09] has been adopted for the experiment, on an Intel Quad CPU machine with four cores (4 times speed-up). Regarding the empirical optimization for the zero-one loss, two different approaches are implemented: ZeroOne refers to the classical mirror descent, and ZeroOne-reg refers to the regularized zero one loss objective with strongly-convex mirror descent. At last, PEGASOS refers to the empirical optimizer for regularized SVM algorithm.

The three datasets Splice, Web and Adult presented by the libSVM project team [Fan] are used.

  # Training / Testing Samples # Features
Splice 1000 / 2175 60
Web 2477 / 47272 300
Adult 1605 / 30956 123


From Table 1, one can see that when m is on the magnitude of 1000, it is still unclear that which method outperforms which. For the dataset of Adult, the traditional SVM trainer PEGASOS significantly beats the zero one loss optimizers; while back to Splice and Web, zero-one loss does slightly better.

Another experiment worth conducting is the convergence rate of the three methods. From Figure 2 one can see that because ZeroOne uses a learning rate of 1/\sqrt{T} where T is the total number of iterations, the accuracy curve is smoother than that of PEGASOS and ZeroOne-reg, who uses learning rate 1/T for strongly-convexity. However, because the generalization guarantee for 0-1 loss is weaker than SVM, ZeroOne-reg converges much slower than PEGASOS.

This entry was posted in Learning Theory, Readings. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s