(Continued from the previous post.)
Though the complexity depends crucially on , however, when is a constant (e.g. ) there is still reason to believe that the approximated 0-1 loss function using is more accurate than the hinge loss. However, even if is constant, the sample complexity has cubic dependency on . At the same time, the kernel stochastic gradient descent algorithm (in the distribution manner) requires a time complexity of . So there comes a dilemma: neither can we set $m$ to be too large since otherwise the empirical optimization cannot finish in endurable time, nor can we set to be too small since otherwise the generalization guarantee is not good enough.
Consider some of medium size, just enough to be trained in several minutes for example. Though zero-one loss does not have a good generalization guarantee, however, it is a more desirable function than the hinge loss. So given training set of medium size, and we run SVM against a zero-one loss minimizer, who is to win this tug-of-war?
In the interest of fairness, the same stochastic gradient descent routine called P-packSVM [ZCW+09] has been adopted for the experiment, on an Intel Quad CPU machine with four cores (4 times speed-up). Regarding the empirical optimization for the zero-one loss, two different approaches are implemented: ZeroOne refers to the classical mirror descent, and ZeroOne-reg refers to the regularized zero one loss objective with strongly-convex mirror descent. At last, PEGASOS refers to the empirical optimizer for regularized SVM algorithm.
The three datasets Splice, Web and Adult presented by the libSVM project team [Fan] are used.
|# Training / Testing Samples||# Features|
|Splice||1000 / 2175||60|
|Web||2477 / 47272||300|
|Adult||1605 / 30956||123|
From Table 1, one can see that when is on the magnitude of , it is still unclear that which method outperforms which. For the dataset of Adult, the traditional SVM trainer PEGASOS significantly beats the zero one loss optimizers; while back to Splice and Web, zero-one loss does slightly better.
Another experiment worth conducting is the convergence rate of the three methods. From Figure 2 one can see that because ZeroOne uses a learning rate of where is the total number of iterations, the accuracy curve is smoother than that of PEGASOS and ZeroOne-reg, who uses learning rate for strongly-convexity. However, because the generalization guarantee for 0-1 loss is weaker than SVM, ZeroOne-reg converges much slower than PEGASOS.