TechTalks from event: ICML 2011
On the Necessity of Irrelevant VariablesThis work explores the effects of relevant and irrelevant boolean variables on the accuracy of classifiers. The analysis uses the assumption that the variables are conditionally independent given the class, and focuses on a natural family of learning algorithms for such sources when the relevant variables have a small advantage over random guessing. The main result is that algorithms relying predominately on irrelevant variables have error probabilities that quickly go to 0 in situations where algorithms that limit the use of irrelevant variables have errors bounded below by a positive constant. We also show that accurate learning is possible even when there are so few examples that one cannot determine with high confidence whether or not any individual variable is relevant.
A PAC-Bayes Sample-compression Approach to Kernel MethodsWe propose a PAC-Bayes sample compression approach to kernel methods that can accommodate any bounded similarity function and show that the support vector machine (SVM) classifier is a particular case of a more general class of data-dependent classifiers known as majority votes of sample-compressed classifiers. We provide novel risk bounds for these majority votes and learning algorithms that minimize these bounds.
Simultaneous Learning and Covering with Adversarial NoiseWe study simultaneous learning and covering problems: submodular set cover problems that depend on the solution to an active (query) learning problem. The goal is to jointly minimize the cost of both learning and covering. We extend recent work in this setting to allow for a limited amount of adversarial noise. Certain noisy query learning problems are a special case of our problem. Crucial to our analysis is a lemma showing the logical OR of two submodular cover constraints can be reduced to a single submodular set cover constraint. Combined with known results, this new lemma allows for arbitrary monotone circuits of submodular cover constraints to be reduced to a single constraint. As an example practical application, we present a movie recommendation website that minimizes the total cost of learning what the user wants to watch and recommending a set of movies.
Risk-Based Generalizations of f-divergencesWe derive a generalized notion of f-divergences, called (f,l)-divergences. We show that this generalization enjoys many of the nice properties of f-divergences, although it is a richer family. It also provides alternative definitions of standard divergences in terms of surrogate risks. As a first practical application of this theory, we derive a new estimator for the Kulback-Leibler divergence that we use for clustering sets of vectors.