ICML 2011
TechTalks from event: ICML 2011
Ensemble Methods
-
Efficient Rule Ensemble Learning using Hierarchical KernelsThis paper addresses the problem of Rule Ensemble Learning (REL), where the goal is simultaneous discovery of a small set of simple rules and their optimal weights that lead to good generalization. Rules are assumed to be conjunctions of basic propositions concerning the values taken by the input features. From the perspectives of interpretability as well as generalization, it is highly desirable to construct rule ensembles with low training error, having rules that are i) simple, {em i.e.}, involve few conjunctions and ii) few in number. We propose to explore the (exponentially) large feature space of all possible conjunctions optimally and efficiently by employing the recently introduced Hierarchical Kernel Learning (HKL) framework. The regularizer employed in the HKL formulation can be interpreted as a potential for discouraging selection of rules involving large number of conjunctions -- justifying its suitability for constructing rule ensembles. Simulation results show that the proposed approach improves over state-of-the-art REL algorithms in terms of generalization and indeed learns simple rules. Although this is encouraging, it can be shown that HKL selects a conjunction only if all its subsets are selected, and this is highly undesirable. We propose a novel convex formulation which alleviates this problem and generalizes the HKL framework. The main technical contribution of this paper is an efficient mirror-descent based active set algorithm for solving the new formulation. Empirical evaluations on REL problems illustrate the utility of generalized HKL.
-
Boosting on a Budget: Sampling for Feature-Efficient PredictionIn this paper, we tackle the problem of feature-efficient prediction: classification using a limited number of features per test example. We show that modifying an ensemble classifier such as AdaBoost, by sampling hypotheses from its final weighted predictor, is well-suited for this task. We further consider an extension of this problem, where the costs of examining the various features can differ from one another, and we give an algorithm for this more general setting. We prove the correctness of our algorithms and derive bounds for the number of samples needed for given error rates. We also experimentally verify the effectiveness of our methods.
-
Multiclass Boosting with Hinge Loss based on Output CodingMulticlass classification is an important and fundamental problem in machine learning. A popular family of multiclass classification methods belongs to reducing multiclass to binary based on output coding. Several multiclass boosting algorithms have been proposed to learn the coding matrix and the associated binary classifiers in a problem-dependent way. These algorithms can be unified under a sum-of-exponential loss function defined in the domain of margins. Instead, multiclass SVM uses another type of loss function based on hinge loss. In this paper, we present a new output-coding-based multiclass boosting algorithm using the multiclass hinge loss, which we call HingeBoost.OC. HingeBoost.OC is tested on various real world datasets and shows better performance than the existing multiclass boosting algorithm AdaBoost.ERP, one-vs-one, one-vs-all, ECOC and multiclass SVM in a majority of different cases.
-
Generalized Boosting Algorithms for Convex OptimizationBoosting is a popular way to derive powerful learners from simpler hypothesis classes. Following previous work (Mason et al., 1999; Friedman, 2000) on general boosting frameworks, we analyze gradient-based descent algorithms for boosting with respect to any convex objective and introduce a new measure of weak learner performance into this setting which generalizes existing work. We present the first weak to strong learning guarantees for the existing gradient boosting work for smooth convex objectives, and also demonstrate that this work fails for non-smooth objectives. To address this issue, we present new algorithms which extend this boosting approach to arbitrary convex loss functions and give corresponding weak to strong convergence results. In addition, we demonstrate experimental results that support our analysis and demonstrate the need for the new algorithms we present.
- All Sessions
- Keynotes
- Bandits and Online Learning
- Structured Output
- Reinforcement Learning
- Graphical Models and Optimization
- Recommendation and Matrix Factorization
- Neural Networks and Statistical Methods
- Invited Cross-Conference Track
- Feature Selection, Dimensionality Reduction
- Learning Theory
- Large-Scale Learning
- Latent-Variable Models
- Active and Online Learning
- Latent-Variable Models
- Neural Networks and Deep Learning
- Tutorial: Learning Kernels
- Tutorial: Machine Learning for Large Scale Recommender Systems
- Ensemble Methods
- Tutorial: Introduction to Bandits: Algorithms and Theory
- Tutorial: Machine Learning and Robotics
- Tutorial: Machine Learning in Ecological Science and Environmental Policy
- Tutorial : Collective Intelligence and Machine Learning
- Test-of-Time
- Best Paper
- Robotics and Reinforcement Learning
- Transfer Learning
- Kernel Methods
- Optimization
- Learning Theory
- Invited Cross-Conference Session
- Neural Networks and Deep Learning
- Reinforcement Learning
- Bayesian Inference and Probabilistic Models
- Supervised Learning
- Social Networks
- Evaluation Metrics
- statistical relational learning
- Outlier Detection
- Time Series
- Graphical Models and Bayesian Inference
- Sparsity and Compressed Sensing
- Clustering
- Game Theory and Planning and Control
- Semi-Supervised Learning
- Kernel Methods and Optimization
- Neural Networks and NLP
- Probabilistic Models & MCMC
- Online Learning
- Ranking and Information Retrieval