Given a unlabelled set of points X \in R^N belonging to k groups, we propose a method to identify cluster assignments that provides maximum separating margin among the clusters. We address this problem by exploiting sparsity in data points inherent to margin regions, which a max-margin classifier would produce under a supervised setting to separate points belonging to different groups. By analyzing the projections of X on the set of all possible lines L in R^N, we first establish some basic results that are satisfied only by those line intervals lying outside a cluster, under assumptions of linear separability of clusters and absence of outliers. We then encode these results into a pair-wise similarity measure to determine cluster assignments, where we accommodate non-linearly separable clusters using the kernel trick. We validate our method on several UCI datasets and on some computer vision problems, and empirically show its robustness to outliers, and in cases where the exact number of clusters is not available. The proposed approach offers an improvement in clustering accuracy of about 6% on the average, and up to 15% when compared with several existing methods.
Questions and AnswersYou need to be logged in to be able to post here.