TechTalks from event: Technical session talks from ICRA 2012

Conference registration code to access these videos can be accessed by visiting this link: PaperPlaza. Step-by-step to access these videos are here: step-by-step process .
Why some of the videos are missing? If you had provided your consent form for your video to be published and still it is missing, please contact support@techtalks.tv

Human Detection and Tracking

  • Iterative Pedestrian Segmentation and Pose Tracking under a Probabilistic Framework Authors: Li, Yanli
    This paper presents a unified probabilistic framework to tackle two closely related visual tasks: pedestrian segmentation and pose tracking along monocular videos. Although the two tasks are complementary in nature, most previous approaches focus on them individually. Here, we resolve the two problems simultaneously by building and inferring a single body model. More specifically, pedestrian segmentation is performed by optimizing body region with constraint of body pose in a Markov Random Field (MRF), and pose parameters are reasoned about through a Bayesian filtering, which takes body silhouette as an observation cue. Since the two processes are inter-related, we resort to an Expectation-Maximization (EM) algorithm to refine them alternatively. Additionally, a template matching scheme is utilized for initialization. Experimental results on challenging videos verify the framework's robustness to non-rigid human segmentation, cluttered backgrounds and moving cameras.
  • A Connectionist-Based Approach for Human Action Identification Authors: Alazrai, Rami; Lee, C. S. George
    This paper presents a hierarchal, two-layer, connectionist-based human-action recognition system (CHARS) as a first step towards developing socially intelligent robots. The first layer is a K-nearest neighbor (K-NN) classifier that categorizes human actions into two classes based on the existence of locomotion, and the second layer consists of two multi-layer recurrent neural networks that distinguish between subclasses within each class. A pyramid of histograms of oriented gradients (PHOG) descriptor is proposed for extracting local and spatial features. The PHOG descriptor reduces the dimensionality of input space drastically, which results in better convergence for the learning and classification processes. Computer simulations were conducted to illustrate the performance of the proposed CHARS and the role of temporal factor in solving this problem. A widely used KTH human-action database and the human-action dataset from our lab were utilized for performance evaluation. The proposed CHARS was found to perform better than other existing human-action recognition methods and achieved a 95.55% recognition rate.
  • Using Dempster’s Rule of Combination to Robustly Estimate Pointed Targets Authors: Pateraki, Maria; Baltzakis, Haris; Trahanias, Panos
    In this paper we address an important issue in human-robot interaction, that of accurately deriving pointing information from a corresponding gesture. Based on the fact that in most applications it is the pointed object rather than the actual pointing direction which is important, we formulate a novel approach which takes into account prior information about the location of possible pointed targets. To decide about the pointed object, the proposed approach uses the Dempster-Shafer theory of evidence to fuse information from two different input streams: head pose, estimated by visually tracking the off-plane rotations of the face, and hand pointing orientation. Detailed experimental results are presented that validate the effectiveness of the method in realistic application setups.
  • Head-To-Shoulder Signature for Person Recognition Authors: Kirchner, Nathan; Alempijevic, Alen; Virgona, Alexander Joseph
    Ensuring that an interaction is initiated with a particular and unsuspecting member of a group is a complex task. As a first step the robot must effectively, expediently and reliably recognise the humans as they carry on with their typical behaviours (in situ). A method for constructing a scale and viewing angle robust feature vector (from analysing a 3D pointcloud) designed to encapsulate the inter-person variations in the size and shape of the people's head to shoulder region (Head-to-shoulder signature - HSS) is presented. Furthermore, a method for utilising said feature vector as the basis of person recognition via a Support-Vector Machine is detailed. An empirical study was performed in which person recognition was attempted on in situ data collected from 25 participants over 5 days in a office environment. The results report a mean accuracy over the 5 days of 78.15% and a peak accuracy 100% for 9 participants. Further, the results show a considerably better-than-random (1/23 = 4.5%) result for when the participants were: in motion and unaware they were being scanned (52.11%), in motion and face directly away from the sensor (36.04%), and post variations in their general appearance. Finally, the results show the HSS has considerable ability to accommodate for a person's head, shoulder and body rotation relative to the sensor - even in cases where the person is faced directly away from the robot.
  • Bigram-Based Natural Language Model and Statistical Motion Symbol Model for Scalable Language of Humanoid Robots Authors: Takano, Wataru; Nakamura, Yoshihiko
    The language is a symbolic system unique to human being. The acquisition of language, which has its meanings in the real world, is important for robots to understand the environment and communicate with us in our daily life. This paper propose a novel approach to establish a fundamental framework for the robots which can understand language through their whole body motions. The proposed framework is composed of three modules : ``motion symbol", ``motion language model", and ``natural language model". In the motion symbol module, motion data is symbolized by Hidden Markov Models (HMMs). Each HMM represents abstract motion patterns. Then the HMMs are defined as motion symbols. The motion language model is stochastically designed for links between motion symbols and words. This model consists of three layers of motion symbols, latent variables and words. The connections between the motion symbol and the latent state, and between the latent state and the words is denoted by two kinds of probabilities respectively. One connection is represented by the probability that the motion symbol generates the latent state, and the other connection is represented by the probability that the latent state generates the words. Therefore, the motion language model can connect the motion symbols to the words through the latent state. The natural language model stochastically represents sequences of words. In this paper, a bigram, which is a special case of N-gram model, is adopted as the natura
  • Cognitive Active Vision for Human Identification Authors: Utsumi, Yuzuko; Sommerlade, Eric; Bellotto, Nicola; Reid, Ian
    We describe an integrated, real-time multi-camera surveillance system that is able to find and track individuals, acquire and archive facial image sequences, and perform face recognition. The system is based around an inference engine that can extract high-level information from an observed scene, and generate appropriate commands for a set of pan-tiltzoom (PTZ) cameras. The incorporation of a reliable facial recognition into the high-level feedback is a main novelty of our work, showing how high-level understanding of a scene can be used to deploy PTZ sensing resources effectively. The system comprises a distributed camera system using SQL tables as virtual communication channels, Situation Graph Trees for knowledge representation, inference and high-level camera control, and a variety of visual processing algorithms including an on-line acquisition of facial images, and on-line recognition of faces by comparing image sets using subspace distance. We provide an extensive evaluation of this method using our system for both acquisition of training data, and later recognition. A set of experiments in a surveillance scenario show the effectiveness of our approach and its potential for real applications of cognitive vision.

SLAM I

  • On the Number of Local Minima to the Point Feature Based SLAM Problem Authors: Huang, Shoudong; Wang, Heng; Frese, Udo; Dissanayake, Gamini
    Map joining is an efficient strategy for solving feature based SLAM problems. This paper demonstrates that joining of two 2D local maps, formulated as a nonlinear least squares problem has at most two local minima, when the associated uncertainties can be described using spherical covariance matrices. Necessary and sufficient condition for the existence of two minima is derived and it is shown that more than one minimum exists only when the quality of the local maps used for map joining is extremely poor. The analysis explains to some extent why a number of optimization based SLAM algorithms proposed in the recent literature that rely on local search strategies are successful in converging to the globally optimal solution from poor initial conditions, particularly when covariance matrices are spherical. It also demonstrates that the map joining problem has special properties that may be exploited to reliably obtain globally optimal solutions to the SLAM problem.
  • On the Comparison of Uncertainty Criteria for Active SLAM Authors: Carrillo, Henry; Reid, Ian; Castellanos, Jose A.
    In this paper, we consider the computation of the D-optimality criterion as a metric for the uncertainty of a SLAM system. Properties regarding the use of this uncertainty criterion in the active SLAM context are highlighted, and comparisons against the A-optimality criterion and entropy are presented. This paper shows that contrary to what has been previously reported, the D-optimality criterion is indeed capable of giving fruitful information as a metric for the uncertainty of a robot performing SLAM. Finally, through various experiments with simulated and real robots, we support our claims and show that the use of D-opt has desirable effects in various SLAM related tasks such as active mapping and exploration.
  • Continuous-Time Batch Estimation Using Temporal Basis Functions Authors: Furgale, Paul Timothy; Barfoot, Timothy; Sibley, Gabe
    Roboticists often formulate estimation problems in discrete time for the practical reason of keeping the state size tractable. However, the discrete-time approach does not scale well for use with high-rate sensors, such as inertial measurement units or sweeping laser imaging sensors. The difficulty lies in the fact that a pose variable is typically included for every time at which a measurement is acquired, rendering the dimension of the state impractically large for large numbers of measurements. This issue is exacerbated for the simultaneous localization and mapping (SLAM) problem, which further augments the state to include landmark variables. To address this tractability issue, we propose to move the full maximum likelihood estimation (MLE) problem into continuous time and use temporal basis functions to keep the state size manageable. We present a full probabilistic derivation of the continuous-time estimation problem, derive an estimator based on the assumption that the densities and processes involved are Gaussian, and show how coefficients of a relatively small number of basis functions can form the state to be estimated, making the solution efficient. Our derivation is presented in steps of increasingly specific assumptions, opening the door to the development of other novel continuous-time estimation algorithms using different assumptions. Results from a self-calibration experiment involving a camera and a high-rate IMU are provided to validate the approach.
  • SLAM with Single Cluster PHD Filters Authors: Lee, Chee Sing; Clark, Daniel; Salvi, Joaquim
    Recent work by Mullane, Vo, and Adams has re-examined the probabilistic foundations of feature-based Simultaneous Localization and Mapping (SLAM), casting the problem in terms of filtering with random finite sets. Algorithms were developed based on Probability Hypothesis Density (PHD) filtering techniques that provided superior performance to leading feature-based SLAM algorithms in challenging mea- surement scenarios with high false alarm rates, high missed detection rates, and high levels of measurement noise. We investigate this approach further by considering a hierarchical point process, or single-cluster multi-object, model, where we consider the state to consist of a map of landmarks conditioned on a vehicle state. Using Finite Set Statistics, we are able to find tractable formulae to approximate the joint vehicle-landmark state based on a single Poisson multi-object assumption on the predicted density. We describe the single-cluster PHD filter and the practical implementation developed based on a particle-system representation of the vehicle state and a Gaussian mixture approximation of the map for each particle. Synthetic simulation results are presented to compare the novel algorithm against the previous PHD filter SLAM algorithm. Results presented indicate a superior performance in vehicle and map landmark localization, and comparable performance in landmark cardinality estimation.
  • Simultaneous Localization and Scene Reconstruction with Monocular Camera Authors: Huang, Kuo- Chen; Tseng, Shih-Huan; Mou, Wei-Hao; Fu, Li-Chen
    In this paper, we propose an online scene recon- struction algorithm with monocular camera since there are many advantages on modeling and visualization of an environ- ment with physical scene reconstruction instead of resorting to sparse 3D points. The goal of this algorithm is to simultaneously track the camera position and map the 3D environment, which is close to the spirit of visual SLAM. There’re plenty of visual SLAM algorithms in the current literature which can provide a high accuracy performance, but many of them rely on stereo cameras. It’s true that we’ll face many more challenges to accomplish this task with monocular camera. However, the advantages of cheaper and easier deployable hardware setting have made monocular approach more attractive. Specifically, we apply a maximum a posteriori Bayesian approach with optimization technique to simultaneously track the camera and build a dense point cloud. We also propose a feature expansion method to expand the density of points, and then online reconstruct the scene with a delayed approach. Furthermore, we utilize the reconstructed model to accomplish visual localization task without extracting the features. Finally, a number of experiments have been conducted to validate our proposed approach, and promising performance can be observed.
  • Rhythm-based Adaptive Localization in Incomplete RFID Landmark Environments Authors: Kodaka, Kenri; Ogata, Tetsuya; Sugano, Shigeki
    This paper proposes a novel hybrid-structured model for the adaptive localization of robots combining a stochastic localization model and a rhythmic action model, for avoiding vacant spaces of landmarks efficiently. In regularly arranged landmark environments, robots may not be able to detect any landmarks for a long time during a straight-like movement. Consequently, locally diverse and smooth movement patterns need to be generated to keep the position estimation stable. Conventional approaches aiming at the probabilistic optimization cannot rapidly generate the detailed movement pattern due to a huge computational cost; therefore a simple but diverse movement structure needs to be introduced as an alternative option. We solve this problem by combining a particle filter as the stochastic localization module and the dynamical action model generating a zig-zagging motion. The validation experiments, where virtual-line-tracing tasks are exhibited on a floor-installed RFID environment, show that introducing the proposed rhythm pattern can improve a minimum error boundary and a velocity performance for arbitrary tolerance errors can be improved by the rhythm amplitude adaptation fed back by the localization deviation.

Image-Guided Interventions

  • Full state visual forceps tracking under a microscope using projective contour models Authors: Baek, Young Min; Tanaka, Shinichi; Harada, Kanako; Sugita, Naohiko; Morita, Akio; Sora, Shigeo; Mochizuki, Ryo; Mitsuishi, Mamoru
    Forceps tracking is an important element of high-level surgical assistance such as visual servoing and surgical motion analysis. In many computer vision algorithms, artificial markers are used to enable robust tracking; however, markerless tracking methods are more appropriate in surgical applications due to their sterilizability. This paper describes a robust, efficient tracking algorithm capable of estimating the full state parameters of a robotic surgical instrument on the basis of projective contour modeling using a 3-D CAD model of the forceps. Thus, the proposed method does not require any artificial markers. The likelihood of the contour model was measured using edge distance transformation to evaluate the similarity of the projected CAD model to the microscopic image, followed by particle filtering to estimate the full state of the forceps. Experimental results in simulated surgical environments indicate that the proposed method is robust and time-efficient, and fulfills real-time processing requirements.
  • MARVEL: A Wireless Miniature Anchored Robotic Videoscope for Expedited Laparoscopy Authors: Castro, Cristian; Smith, Sara; Alqassis, Adham; Ketterl, Thomas; Sun, Yu; Ross, Sharona; Rosemurgy, Alexander; Savage, Peter; Gitlin, Richard
    This paper describes the design and implementation of a Miniature Anchored Robotic Videoscope for Expedited Laparoscopy (MARVEL) camera module that features wireless communications and control. This device decreases the surgical-tool bottleneck experienced by surgeons in state-of-the art Laparoscopic Endoscopic Single-Site (LESS) procedures for minimally invasive abdominal surgery. The system includes: (1) a near-zero latency wireless communications link, (2) a pan/tilt camera platform, actuated by two tiny motors that gives surgeons a full field of view inside the abdominal cavity, (3) a small wireless camera, (4) a wireless luminosity control system, and (5) a wireless human-machine interface to control the device. An in-vivo experiment on a porcine subject was carried out to test the general performance of the system. The robotic design is a proof of concept, which creates a research platform for a broad range of experiments in a range of domains for faculty and students in the Colleges of Engineering and Medicine and at Tampa General Hospital. This research is the first step in developing semi-autonomous wirelessly controllable and observable communicating and networked laparoscopic devices to enable a paradigm shift in minimally invasive surgery.
  • Motion Planning for the Virtual Bronchoscopy Authors: Rosell, Jan; Pérez, Alexander; Cabras, Paolo; Rosell, Antoni
    Bronchoscopy is an interventional medical procedure employed to analyze the interior side of the human airways, clear possible obstructions and biopsy. Using a 3D reconstruction of the tracheobronchial tree, Virtual Bronchoscopy (VB) may help physicians in the exploration of peripheral lung lesions. We are developing a haptic-based navigation system for the VB that allows the navigation within the airways using a haptic device whose permitted motions mimics those done with the real bronchoscope. This paper describes the motion planning module of the system devoted to plan a path from the trachea to small peripheral pulmonary lesions, that takes into account the geometry and the kinematic constraints of the bronchoscope. The motion planner output is used to visually and haptically guide the navigation during the virtual exploration using the haptic device. Moreover, physicians can get useful information of whether the peripheral lesions can effectively be reached with a given bronchoscope or of which is the nearest point to the lesion that can be reached.