In this work we address the problem of feature extraction for object recognition in the context of cameras providing RGB and depth information (RGB-D data). We consider this problem in a bag of features like setting and propose a new, learned, local feature descriptor for RGB-D images, the <i>convolutional k-means descriptor</i>. The descriptor is based on recent results from the machine learning community. It automatically learns feature responses in the neighborhood of detected interest points and is able to combine all available information, such as color and depth into one, concise representation. To demonstrate the strength of this approach we show its applicability to different recognition problems. We evaluate the quality of the descriptor on the <i>RGB-D Object Dataset</i> where it is competitive with previously published results and propose an embedding into an image processing pipeline for object recognition and pose estimation.

Questions and Answers

You need to be logged in to be able to post here.