Complexities of convex combinations and bounding the generalization error in classification

May 18, 2004

On the Generalization of the C-Bound to Structured Output Ensemble Methods

August 6, 2014

85% Match

François LHC Laviolette, Emilie LHC Morvant, ... , Roy Jean-Francis

Machine Learning

This paper generalizes an important result from the PAC-Bayesian literature for binary classification to the case of ensemble methods for structured outputs. We prove a generic version of the \Cbound, an upper bound over the risk of models expressed as a weighted majority vote that is based on the first and second statistical moments of the vote's margin. This bound may advantageously $(i)$ be applied on more complex outputs such as multiclass labels and multilabel, and $(ii)...

Find SimilarView on arXiv

Voted Kernel Regularization

September 14, 2015

85% Match

Corinna Cortes, Prasoon Goyal, ... , Mohri Mehryar

Machine Learning

This paper presents an algorithm, Voted Kernel Regularization , that provides the flexibility of using potentially very complex kernel functions such as predictors based on much higher-degree polynomial kernels, while benefitting from strong learning guarantees. The success of our algorithm arises from derived bounds that suggest a new regularization penalty in terms of the Rademacher complexities of the corresponding families of kernel maps. In a series of experiments we dem...

Find SimilarView on arXiv

On the Insufficiency of the Large Margins Theory in Explaining the Performance of Ensemble Methods

June 10, 2019

85% Match

Waldyn Martinez, J. Brian Gray

Machine Learning

Computation

Boosting and other ensemble methods combine a large number of weak classifiers through weighted voting to produce stronger predictive models. To explain the successful performance of boosting algorithms, Schapire et al. (1998) showed that AdaBoost is especially effective at increasing the margins of the training data. Schapire et al. (1998) also developed an upper bound on the generalization error of any ensemble based on the margins of the training data, from which it was co...

Find SimilarView on arXiv

Optimal Binary Classifier Aggregation for General Losses

October 2, 2015

85% Match

Akshay Balsubramani, Yoav Freund

Machine Learning

We address the problem of aggregating an ensemble of predictors with known loss bounds in a semi-supervised binary classification setting, to minimize prediction loss incurred on the unlabeled data. We find the minimax optimal predictions for a very general class of loss functions including all convex and many non-convex losses, extending a recent analysis of the problem for misclassification error. The result is a family of semi-supervised ensemble aggregation algorithms whi...

Find SimilarView on arXiv

On Margins and Generalisation for Voting Classifiers

June 9, 2022

85% Match

Felix Biggs, Valentina Zantedeschi, Benjamin Guedj

Machine Learning

Statistics Theory

Machine Learning

Statistics Theory

We study the generalisation properties of majority voting on finite ensembles of classifiers, proving margin-based generalisation bounds via the PAC-Bayes theory. These provide state-of-the-art guarantees on a number of classification tasks. Our central results leverage the Dirichlet posteriors studied recently by Zantedeschi et al. [2021] for training voting classifiers; in contrast to that work our bounds apply to non-randomised votes via the use of margins. Our contributio...

Find SimilarView on arXiv

Tight Risk Bounds for Multi-Class Margin Classifiers

July 10, 2015

85% Match

Yury Maximov, Daria Reshetova

Machine Learning

We consider a problem of risk estimation for large-margin multi-class classifiers. We propose a novel risk bound for the multi-class classification problem. The bound involves the marginal distribution of the classifier and the Rademacher complexity of the hypothesis class. We prove that our bound is tight in the number of classes. Finally, we compare our bound with the related ones and provide a simplified version of the bound for the multi-class classification with kernel b...

Find SimilarView on arXiv

Approximate is Good Enough: Probabilistic Variants of Dimensional and Margin Complexity

March 9, 2020

85% Match

Pritish Kamath, Omar Montasser, Nathan Srebro

Machine Learning

We present and study approximate notions of dimensional and margin complexity, which correspond to the minimal dimension or norm of an embedding required to approximate, rather then exactly represent, a given hypothesis class. We show that such notions are not only sufficient for learning using linear predictors or a kernel, but unlike the exact variants, are also necessary. Thus they are better suited for discussing limitations of linear or kernel methods.

Find SimilarView on arXiv

Linear and Order Statistics Combiners for Pattern Classification

May 20, 1999

85% Match

Kagan Tumer, Joydeep Ghosh

Neural and Evolutionary Comp...

Machine Learning

Several researchers have experimentally shown that substantial improvements can be obtained in difficult pattern recognition problems by combining or integrating the outputs of multiple classifiers. This chapter provides an analytical framework to quantify the improvements in classification results due to combining. The results apply to both linear combiners and order statistics combiners. We first show that to a first order approximation, the error rate obtained over and abo...

Find SimilarView on arXiv

Learning without Concentration for General Loss Functions

October 13, 2014

84% Match

Shahar Mendelson

Machine Learning

We study prediction and estimation problems using empirical risk minimization, relative to a general convex loss function. We obtain sharp error rates even when concentration is false or is very restricted, for example, in heavy-tailed scenarios. Our results show that the error rate depends on two parameters: one captures the intrinsic complexity of the class, and essentially leads to the error rate in a noise-free (or realizable) problem; the other measures interactions betw...

Find SimilarView on arXiv

Boosting, Voting Classifiers and Randomized Sample Compression Schemes

February 5, 2024

84% Match

Cunha Arthur da, Kasper Green Larsen, Martin Ritzert

Machine Learning

In boosting, we aim to leverage multiple weak learners to produce a strong learner. At the center of this paradigm lies the concept of building the strong learner as a voting classifier, which outputs a weighted majority vote of the weak learners. While many successful boosting algorithms, such as the iconic AdaBoost, produce voting classifiers, their theoretical performance has long remained sub-optimal: the best known bounds on the number of training examples necessary for ...

Find SimilarView on arXiv