TechTalks from event: IEEE FOCS 2010

51st Annual IEEE Symposium on Foundations of Computer Science

  • The Geometry of Manipulation -- a Quantitative Proof of the Gibbard-Satterthwaite Theorem Authors: Marcus Isaksson and Guy Kindler and Elchanan Mossel
    We prove a quantitative version of the Gibbard-Satterthwaite theorem. We show that a uniformly chosen voter for a neutral social choice function f of q>3 alternatives and n voters will be manipulable with probability at least $10^{-4}epsilon^2n^{-3}q^{-30}$, where $epsilon$ is the minimal statistical distance between f and the family of dictator functions.

    Our results extend those of Friedgut Kalai and Naor [FKN09], which were obtained for the case of 3 alternatives, and imply that the approach of masking manipulations behind computational hardness (as considered in [BO91, CS03, EL05, PR06, CS06]) cannot hide manipulations completely.

  • Efficient volume sampling for row/column subset selection Authors: Amit Deshpande and Luis Rademacher
    We give efficient algorithms for volume sampling, i.e., for picking $k$-subsets of the rows of any given matrix with probabilities proportional to the squared volumes of the simplices defined by them and the origin (or the squared volumes of the parallelepipeds defined by these subsets of rows). This solves an open problem from the monograph on spectral algorithms by Kannan and Vempala (see Section $7.4$ of cite{KV}, also implicit in cite{BDM, DRVW}).

    Our first algorithm for volume sampling $k$-subsets of rows from an $m$-by-$n$ matrix runs in $O(kmn^omega log n)$ arithmetic operations and a second variant of it for $(1+epsilon)$-approximate volume sampling runs in $O(mn log m cdot k^{2}/epsilon^{2} + m log^{omega} m cdot k^{2omega+1}/epsilon^{2omega} cdot log(k epsilon^{-1} log m))$ arithmetic operations, which is almost linear in the size of the input (i.e., the number of entries) for small $k$.

    Our efficient volume sampling algorithms imply the following results for low-rank matrix approximation: egin{enumerate} item Given $A in R^{m imes n}$, in $O(kmn^{omega} log n)$ arithmetic operations we can find $k$ of its rows such that projecting onto their span gives a $sqrt{k+1}$-approximation to the matrix of rank $k$ closest to $A$ under the Frobenius norm. This improves the $O(k sqrt{log k})$-approximation of Boutsidis, Drineas and Mahoney cite{BDM} and matches the lower bound shown in cite{DRVW}. The method of conditional expectations gives a emph{deterministic} algorithm with the same complexity. The running time can be improved to $O(mn log m cdot k^{2}/epsilon^{2} + m log^{omega} m cdot k^{2omega+1}/epsilon^{2omega} cdot log(k epsilon^{-1} log m))$ at the cost of losing an extra $(1+epsilon)$ in the approximation factor. item The same rows and projection as in the previous point give a $sqrt{(k+1)(n-k)}$-approximation to the matrix of rank $k$ closest to $A$ under the spectral norm. In this paper, we show an almost matching lower bound of

  • Polylogarithmic Approximation for Edit Distance and the Asymmetric Query Complexity Authors: Alexandr Andoni and Robert Krauthgamer and Krzysztof Onak
    We present a near-linear time algorithm that approximates the edit distance between two strings within a polylogarithmic factor; specifically, for strings of length $n$ and every fixed $eps>0$, it can compute a $(log n)^{O(1/eps)}$-approximation in $n^{1+eps}$ time. This is an {em exponential} improvement over the previously known factor, $2^{ ilde O(sqrt{log n})}$, with a comparable running time~cite{OR-edit,AO-edit}. Previously, no efficient polylogarithmic approximation algorithm was known for any computational task involving edit distance (e.g., nearest neighbor search or sketching).

    This result arises naturally in the study of a new emph{asymmetric query} model. In this model, the input consists of two strings $x$ and $y$, and an algorithm can access $y$ in an unrestricted manner, while being charged for querying every symbol of $x$. Indeed, we obtain our main result by designing an algorithm that makes a small number of queries in this model. We also provide a nearly-matching lower bound on the number of queries.

    Our lower bound is the first to expose we hardness of edit distance stemming from the input strings being ``repetitive'', which means that many of their substrings are approximately identical. Consequently, our lower bound provides the first rigorous separation between edit distance and Ulam distance, which is edit distance on non-repetitive strings, i.e., permutations.

  • New Constructive Aspects of the Lovasz Local Lemma Authors: Bernhard Haeupler and Barna Saha and Aravind Srinivasan
    The Lov'{a}sz Local Lemma (LLL) is a powerful tool that gives sufficient conditions for avoiding all of a given set of ``bad'' events, with positive probability. A series of results have provided algorithms to efficiently construct structures whose existence is non-constructively guaranteed by the LLL, culminating in the recent breakthrough of Moser & Tardos for the full asymmetric LLL. We show that the output distribution of the Moser-Tardos algorithm well-approximates the emph{conditional LLL-distribution} -- the distribution obtained by conditioning on all bad events being avoided. We show how a known bound on the probabilities of events in this distribution can be used for further probabilistic analysis and give new constructive and non-constructive results.

    We also show that when an LLL application provides a small amount of slack, the number of resamplings of the Moser-Tardos algorithm is nearly linear in the number of underlying independent variables (not events!), and can thus be used to give efficient constructions in cases where the underlying proof applies the LLL to super-polynomially many events. Even in cases where finding a bad event that holds is computationally hard, we show that applying the algorithm to avoid a polynomial-sized ``core'' subset of bad events leads to a desired outcome with high probability. This is shown via a simple union bound over the probabilities of non-core events in the conditional LLL-distribution, and automatically leads to simple and efficient Monte-Carlo (and in most cases $RNC$) algorithms.

    We demonstrate this idea on several applications. We give the first constant-factor approximation algorithm for the Santa Claus problem by making an LLL-based proof of Feige constructive. We provide Monte Carlo algorithms for acyclic edge coloring, non-repetitive graph colorings, and Ramsey-type graphs. In all these applications the algorithm falls directly out of the non-constructive LLL-based proof. Our algorithms are very simple, often provide bet

  • The Geometry of Scheduling Authors: Nikhil Bansal and Kirk Pruhs
    We consider the following general scheduling problem: There are $n$ jobs with arbitrary release times and sizes. In addition, each job has an associated arbitrary monotone function specifying the cost incurred when the job is completed at a particular time. This problem formulation is general enough to include many natural scheduling objectives, such as weighted flow time, weighted tardiness, and sum of flow time squared. The main contribution of this paper is a randomized polynomial-time algorithm with approximation ratio $O(log log (nP) )$, where $P$ is the maximum job size. This general result improves the best known approximation ratios by at least an exponential factor (and much more in some cases) for essentially {em all} of the nontrivial common special cases of this problem.

    Our result is based on a novel connection between scheduling and geometry. We start with a certain strong linear programming relaxation for this problem, based on exponentially many knapsack cover inequalities. We show that the problem of rounding this linear program can be reduced to geometric weighted set multicover problems, where the sets/objects are have near linear union complexity. We then show how to apply Varadarajan's quasi-uniform sampling technique to obtain solutions for these geometric set multicover problems. We believe that this geometric interpretation of scheduling is of independent interest, and will likely find numerous future applications.

  • Sublinear Time Optimization for Machine Learning Authors: Kenneth L. Clarkson and Elad Hazan and David P. Woodruff
    We give sublinear-time approximation algorithms for some optimization problems arising in machine learning, such as training linear classifiers and finding minimum enclosing balls. Our algorithms can be extended to some kernelized versions of these problems, such as SVDD, hard margin SVM, and $L_2$-SVM, for which sublinear-time algorithms were not known before. These new algorithms use a combination of novel sampling techniques and the randomized implementation of online learning algorithms. We give lower bounds which show the running times of our algorithms to be nearly best possible in the unit-cost RAM model. We also give implementations of our algorithms in the semi-streaming setting, obtaining the first low pass sublinear time algorithms with polylogarithmic space and arbitrary approximation factor. Finally we show that our algorithms, which are Monte Carlo, can be made Las Vegas with an additional linear-time step, and we show that such linear work is necessary for Las Vegas results.
  • Estimating the longest increasing sequence in sublinear time Authors: Michael Saks and C. Seshadhri
    Finding the longest increasing subsequence (LIS) is a classic algorithmic problem. Let $n$ denote the size of the array. Simple $O(nlog n)$ algorithms are known for this problem. What can a sublinear time algorithm achieve? We develop a polylogarithmic time randomized algorithm that for any constant $delta > 0$, given $f$ outputs an estimate of the LIS that, with high probability, is accurate to within an additive $delta n$. More precisely, the running time of the algorithm is $(log n)^c (1/delta)^{O(1/delta)}$ where the exponent $c$ is independent of $delta$. Previously, the best known polylogarithmic time algorithms could only achieve an additive $n/2$ approximation.

    The LIS problem can be formulated as a dynamic program. Our overall approach seems very general, and might be applicable to approximating other dynamic programs.

  • Testing Properties of Sparse Images Authors: Dana Ron and Gilad Tsur
    We initiate the study of testing properties of images that correspond to {em sparse/} $0/1$-valued matrices of size $n imes n$. Our study is related to but different from the study initiated by Raskhodnikova ({em Proceedings of RANDOM, 2003/}), where the images correspond to {em dense/} $0/1$-valued matrices. Specifically, while distance between images in the model studied by Raskhodnikova is the fraction of entries on which the images differ taken with respect to all $n^2$ entries, the distance measure in our model is defined by the fraction of such entries taken with respect to the actual number of $1$'s in the matrix. We study several natural properties: connectivity, convexity, monotonicity, and being a line. In all cases we give testing algorithms with sublinear complexity, and in some of the cases we also provide corresponding lower bounds.
  • A Unified Framework for Testing Linear-Invariant Properties Authors: Arnab Bhattacharyya and Elena Grigorescu and Asaf Shapira
    In a sequence of recent papers, Sudan and coauthors have investigated the relation between testability of properties of Boolean functions and the invariance of the properties with respect to transformations of the domain. Linear-invariance is arguably the most common such symmetry for natural properties of Boolean functions on the hypercube. Hence, it is an important goal to find necessary and sufficient conditions for testability of linear-invariant properties. This is explicitly posed as an open problem in a recent survey of Sudan. We obtain the following results:

    1. We show that every linear-invariant property that can be characterized by forbidding induced solutions to a (possibly infinite) set of linear equations can be tested with one-sided error.

    2. We show that every linear-invariant property that can be tested with one-sided error can be characterized by forbidding induced solutions to a (possibly infinite) set of systems of linear equations.

    We conjecture that our result from item (1) can be extended to cover systems of linear equations. We further show that the validity of this conjecture would have the following implications:

    1. It would imply that every linear-invariant property that is closed under restrictions to linear subspaces is testable with one-sided error. Such a result would unify several previous results on testing Boolean functions, such as the results on testing low-degree polynomials and results on testing Fourier dimensionality.

    2. It would imply that a linear-invariant property P is testable with one-sided error *if and only if* P is closed under restrictions to linear subspaces, thus resolving Sudan's problem.

  • Optimal Testing of Reed-Muller Codes Authors: Arnab Bhattacharyya and Swastik Kopparty and Grant Schoenebeck and Madhu Sudan and David Zuckerman
    We consider the problem of testing if a given function $f : F_2^n ightarrow F_2$ is close to any degree $d$ polynomial in $n$ variables, also known as the Reed-Muller testing problem. The Gowers norm is based on a natural $2^{d+1}$-query test for this property. Alon et al.~cite{AKKLR} rediscovered this test and showed that it accepts every degree $d$ polynomial with probability $1$, while it rejects functions that are $Omega(1)$-far with probability $Omega(1/(d 2^{d}))$. We give an asymptotically optimal analysis of this test, and show that it rejects functions that are (even only) $Omega(2^{-d})$-far with $Omega(1)$-probability (so the rejection probability is a universal constant independent of $d$ and $n$). This implies a tight relationship between the $(d+1)^{ m{st}}$-Gowers norm of a function and its maximal correlation with degree $d$ polynomials, when the correlation is close to 1.

    Our proof works by induction on $n$ and yields a new analysis of even the classical Blum-Luby-Rubinfeld~cite{BLR} linearity test, for the setting of functions mapping $F_2^n$ to $F_2$. The optimality follows from a tighter analysis of counterexamples to the ``inverse conjecture for the Gowers norm'' constructed by cite{GT07,LMS}.

    Our result has several implications. First, it shows that the Gowers norm test is tolerant, in that it also accepts close codewords. Second, it improves the parameters of an XOR lemma for polynomials given by Viola and Wigderson~cite{VW}. Third, it implies a ``query hierarchy'' result for property testing of affine-invariant properties. That is, for every function $q(n)$, it gives an affine-invariant property that is testable with $O(q(n))$-queries, but not with $o(q(n))$-queries, complementing an analogous result of cite{GKNR08} for graph properties.