TechTalks from event: IEEE FOCS 2010

51st Annual IEEE Symposium on Foundations of Computer Science

• Geometric complexity theory (GCT) Authors: Ketan Mulmuley
Geometric complexity theory (GCT) is an approach towards the fundamental lower bound problems of complexity theory through algebraic geometry and representation theory. This talk will give a brief introduction to GCT without assuming any background in algebraic geometry or representation theory.
• How to Grow Your Lower Bounds Authors: Mihai Patrascu
I will survey the state of the art in proving hardness for data structure problems in the cell-probe model of computation.
• How To Think About Mechanism Design Authors: Tim Roughgarden, Stanford University
Mechanism design studies optimization problems where the underlying data --- such as the value of a good or the cost of performing a task --- is initially unknown to the algorithm designer. Auction settings are canonical examples, where the private data is the willingness to pay the bidders for the goods on sale, and the optimization problem is to allocate the goods to maximize some objective, such as revenue or overall value to society. A "mechanism" is a protocol that interacts with participants and determines a solution to the underlying optimization problem. We first explain why designing mechanisms with good game-theoretic properties boils down to algorithm design in a certain "constrained computational model." We differentiate between single-parameter problems, where this computational model is well understood, and multi-parameter problems, where it is not. We then study two specific problem domains: revenue-maximizing auctions, and the recent idea of analyzing auctions via a "Bayesian thought experiment"; and welfare-maximizing mechanisms, with an emphasis on black-box reductions that transmute approximation algorithms into truthful approximation mechanisms.
• Constructive Algorithms for Discrepancy Minimization Authors: Nikhil Bansal
Given a set system (V,S), V=[n] and S={S_1,ldots,S_m}, the minimum discrepancy problem is to find a 2-coloring X:V -> {-1,+1}, such that each set is colored as evenly as possible, i.e. find X to minimize max_{j in [m]} |sum_{i in S_j} X(i)|. In this paper we give the first polynomial time algorithms for discrepancy minimization, that achieve bounds similar to those known existentially using the so-called Entropy Method. We also give a first approximation-like result for discrepancy. Specifically we give efficient randomized algorithms to: 1. Construct an \$O(sqrt{n})\$ discrepancy coloring for general sets systems when \$m=O(n)\$, matching the celebrated result of Spencer up to constant factors. Previously, no algorithmic guarantee better than the random coloring bound, i.e. O((n log n)^{1/2}), was known. More generally, for \$mgeq n\$, we obtain a discrepancy bound of O(n^{1/2} log (2m/n)). 2. Construct a coloring with discrepancy \$O(t^{1/2} log n)\$, if each element lies in at most \$t\$ sets. This matches the (non-constructive) result of Srinivasan cite{Sr}. 3. Construct a coloring with discrepancy O(lambdalog(nm)), where lambda is the hereditary discrepancy of the set system. In particular, this implies a logarithmic approximation for hereditary discrepancy. The main idea in our algorithms is to gradually produce a coloring by solving a sequence of semidefinite programs, while using the entropy method to guide the choice of the semidefinite program at each stage.
• Bounded Independence Fools Degree-2 Threshold Functions Authors: Ilias Diakonikolas and Daniel M. Kane and Jelani Nelson
Let x be a random vector coming from any k-wise independent distribution over {-1,1}^n. For an n-variate degree-2 polynomial p, we prove that E[sgn(p(x))] is determined up to an additive eps for k = poly(1/eps). This gives a large class of explicit pseudo-random generators against such functions and answers an open question of Diakonikolas et al. (FOCS 2009).

In the process, we develop a novel analytic technique we dub multivariate FT-mollification. This provides a generic tool to approximate bounded (multivariate) functions by low-degree polynomials (with respect to several different notions of approximation). A univariate version of the method was introduced by Kane et al. (SODA 2010) in the context of streaming algorithms. In this work, we refine it and generalize it to the multivariate setting. We believe that our technique is of independent mathematical interest. To illustrate its generality, we note that it implies a multidimensional generalization of Jackson's classical result in approximation theory due to (Ganzburg 1979).

To obtain our main result, we combine the FT-mollification technique with several linear algebraic and probabilistic tools. These include the invariance principle of of Mossell, O'Donnell and Oleszkiewicz, anti-concentration bounds for low-degree polynomials, an appropriate decomposition of degree-2 polynomials, and a generalized hyper-contractive inequality for quadratic forms which takes the operator norm of the associated matrix into account. Our analysis is quite modular; it readily adapts to show that intersections of halfspaces and degree-2 threshold functions are fooled by bounded independence. From this it follows that Omega(1/eps^2)-wise independence derandomizes the Goemans-Williamson hyperplane rounding scheme.

Our techniques unify, simplify, and in some cases improve several recent results in the literature concerning threshold functions. For the case of ``regular'' halfspaces we give a simple proof of an optimal independence bound of Th

• The Coin Problem, and Pseudorandomness for Branching Programs / Pseudorandom Generators for Regular Branching Programs Authors: Joshua Brody and Elad Verbin / Mark Braverman and Anup Rao and Ran Raz and Amir Yehudayoff
The emph{Coin Problem} is the following problem: a coin is given, which lands on head with probability either \$1/2 + eta\$ or \$1/2 - eta\$. We are given the outcome of \$n\$ independent tosses of this coin, and the goal is to guess which way the coin is biased, and to be correct with probability \$ge 2/3\$. When our computational model is unrestricted, the majority function is optimal, and succeeds when \$eta ge c /sqrt{n}\$ for a large enough constant \$c\$. The coin problem is open and interesting in models that cannot compute the majority function.

In this paper we study the coin problem in the model of emph{read-once width-\$w\$ branching programs}. We prove that in order to succeed in this model, \$eta\$ must be at least \$1/ (log n)^{Theta(w)}\$. For constant \$w\$ this is tight by considering the recursive tribes function.

We generalize this to a emph{Dice Problem}, where instead of independent tosses of a coin we are given independent tosses of one of two \$m\$-sided dice. We prove that if the distributions are too close, then the dice cannot be distinguished by a small-width read-once branching program.

We suggest one application for this kind of theorems: we prove that Nisan's Generator fools width-\$w\$ read-once emph{permutation} branching programs, using seed length \$O(w^4 log n log log n + log n log (1/eps))\$. For \$w=eps=Theta(1)\$, this seedlength is \$O(log n log log n)\$. The coin theorem and its relatives might have other connections to PRGs. This application is related to the independent, but chronologically-earlier, work of Braverman, Rao, Raz and Yehudayoff (which might be submitted to this FOCS).

• Settling the Polynomial Learnability of Mixtures of Gaussians Authors: Ankur Moitra and Gregory Valiant
Given data drawn from a mixture of multivariate Gaussians, a basic problem is to accurately estimate the mixture parameters. We give an algorithm for this problem that has a running time, and data requirement polynomial in the dimension and the inverse of the desired accuracy, with provably minimal assumptions on the Gaussians. As simple consequences of our learning algorithm, we can perform near-optimal clustering of the sample points and density estimation for mixtures of \$k\$ Gaussians, efficiently.

The building blocks of our algorithm are based on the work (Kalai emph{et al}, STOC 2010)~cite{2Gs} that gives an efficient algorithm for learning mixtures of two Gaussians by considering a series of projections down to one dimension, and applying the emph{method of moments} to each univariate projection. A major technical hurdle in~cite{2Gs} is showing that one can efficiently learn emph{univariate} mixtures of two Gaussians. In contrast, because pathological scenarios can arise when considering univariate projections of mixtures of more than two Gaussians, the bulk of the work in this paper concerns how to leverage an algorithm for learning univariate mixtures (of many Gaussians) to yield an efficient algorithm for learning in high dimensions. Our algorithm employs emph{hierarchical clustering} and rescaling, together with delicate methods for backtracking and recovering from failures that can occur in our univariate algorithm.

Finally, while the running time and data requirements of our algorithm depend exponentially on the number of Gaussians in the mixture, we prove that such a dependence in necessary.

• Polynomial Learning of Distribution Families Authors: Mikhail Belkin and Kaushik Sinha
The question of polynomial learnability of probability distributions, particularly Gaussian mixture distributions, has recently received significant attention in theoretical computer science and machine learning. However, despite major progress, the general question of polynomial learnability of Gaussian mixture distributions still remained open. The current work resolves the question of polynomial learnability for Gaussian mixtures in high dimension with an arbitrary but fixed number of components.

The result for Gaussian distributions relies on a very general result of independent interest on learning parameters of distributions belonging to what we call {it polynomial families}. These families are characterized by their moments being polynomial of parameters and, perhaps surprisingly, include almost all common probability distributions as well as their mixtures and products. Using tools from real algebraic geometry, we show that parameters of any distribution belonging to such a family can be learned in polynomial time.

To estimate parameters of a Gaussian mixture distribution the general results on polynomial families are combined with a certain deterministic dimensionality reduction allowing learning a high-dimensional mixture to be reduced to a polynomial number of parameter estimation problems in low dimension.

• Agnostically learning under permutation invariant distributions Authors: Karl Wimmer
We generalize algorithms from computational learning theory that are successful under the uniform distribution on the Boolean hypercube \${0,1}^n\$ to algorithms successful on permutation invariant distributions, distributions where the probability mass remains constant upon permutations in the instances. While the tools in our generalization mimic those used for the Boolean hypercube, the fact that permutation invariant distributions are not product distributions presents a significant obstacle.

Under the uniform distribution, halfspaces can be agnostically learned in polynomial time for constant \$eps\$. The main tools used are a theorem of Peres~cite{Peres04} bounding the {it noise sensitivity} of a halfspace, a result of~cite{KOS04} that this theorem this implies Fourier concentration, and a modification of the Low-Degree algorithm of Linial, Mansour, Nisan~cite{LMN:93} made by Kalai et. al.~cite{KKMS08}. These results are extended to arbitrary product distributions in~cite{BOWi08}.

We prove analogous results for permutation invariant distributions; more generally, we work in the domain of the symmetric group. We define noise sensitivity in this setting, and show that noise sensitivity has a nice combinatorial interpretation in terms of Young tableaux. The main technical innovations involve techniques from the representation theory of the symmetric group, especially the combinatorics of Young tableaux. We show that low noise sensitivity implies concentration on ``simple'' components of the Fourier spectrum, and that this fact will allow us to agnostically learn halfspaces under permutation invariant distributions to constant accuracy in roughly the same time as in the uniform distribution over the Boolean hypercube case.

• Learning Convex Concepts from Gaussian Distributions with PCA Authors: Santosh Vempala
We present a new algorithm for learning a convex set in \$R^n\$ given labeled examples drawn from any Gaussian distribution. The efficiency of the algorithm depends on the dimension of the {em normal subspace}, the span of vectors orthogonal to supporting hyperplanes of the convex set. The key step of the algorithm uses a Singular Value Decomposition (SVD) to approximate the relevant normal subspace. The complexity of the resulting algorithm is \$poly(n)2^{ ilde{O}(k)}\$ for an arbitrary convex set with normal subspace of dimension \$k\$. For the important special case of the intersection of \$k\$ halfspaces, the complexity is [ poly(n,k,1/eps) + n cdot min , k^{O(log k/eps^4)}, (k/eps)^{O(klog (1/eps))} ] to learn a hypothesis that correctly classifies \$1-eps\$ of the unknown Gaussian distribution. This improves substantially on existing algorithms and is the first algorithm to achieve a fixed polynomial dependence on \$n\$. The proof of our main result is based on a monotonicity property of Gaussian space.