
Upload Video
videos in mp4/mov/flv
close
Upload video
Note: publisher must agree to add uploaded document 
Upload Slides
slides or other attachment
close
Upload Slides
Note: publisher must agree to add uploaded document 
Feedback
help us improve
close
Feedback
Please help us improve your experience by sending us a comment, question or concern
Please help transcribe this video using our simple transcription tool. You need to be logged in to do so.
Description
We consider $k$median clustering in finite metric spaces and $k$means clustering in Euclidean spaces, in the setting where $k$ is part of the input (not a constant). For the $k$means problem, Ostrovsky et al.~cite{Ostrovsky06} show that if the input satisfies the condition that the optimal $(k1)$means clustering is more expensive than the optimal $k$means clustering by a factor of $max{100, 1/alpha^2}$, then one can achieve a $(1+f(alpha))$approximation to the $k$means optimal in time polynomial in $n$ and $k$ by using a variant of Lloyd's algorithm. In this work we substantially improve this approximation guarantee. We show that given only the condition that the $(k1)$means optimal is more expensive than the $k$means optimal by a factor $1+alpha$ for {em some} constant $alpha>0$, we can obtain a PTAS. In particular, under this assumption, for any $eps>0$ we can achieve a $(1+eps)$approximation to the $k$means optimal in time polynomial in $n$ and $k$, and exponential in $1/eps$ and $1/alpha$. We thus decouple the strength of the assumption from the quality of the approximation ratio. We also give a PTAS for the $k$median problem in finite metrics under the analogous assumption as well. For $k$means, we in addition give a randomized algorithm with improved running time of $n^{O(1)}(k log n)^{poly(1/epsilon,1/alpha)}$.
We also use our technique to obtain a PTAS under the assumption considered by Balcan et al.~cite{Balcan09} that all $(1+alpha)$ approximations are $delta$close to a desired target clustering, when all target clusters have size greater than $2delta n$. Both results are based on a new notion of clustering stability, that extends both the notions of~cite{Ostrovsky06} and of~cite{Balcan09}. No FPTAS for the $k$median problem over such stable instances exists, unless $P=NP$. Thus our algorithm is in a sense best possible for such instances.