TechTalks from event: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics


  • Machine Hearing: Audio Analysis by Emulation of Human Hearing Authors: Richard F. Lyon, Google Inc
    While many approaches to audio analysis are based on elegant mathematical models, an approach based on emulation of human hearing is becoming a strong challenger. The difference is subtle, as it involves extending such mathematically nice signalprocessing concepts as linear systems, transforms, and second-order statistics to include the messier nonlinear, adaptive, and evolved aspects of hearing. Essentially, the goal is to form representations that do a good job of capturing what a signal “sounds like”, so that we can make systems that react accordingly. Some of our recent experimental systems, such as sound retrieval from text queries, melody matching, and music recommendation, employ a four-layer machine-hearing architecture that attempts to simplify and systematize some of the methods used to emulate hearing. The peripheral level utilizes nonlinear filter cascades to model wave propagation in the nonlinear cochlea. The second level computes one or more types of auditory image, as an abstraction of what goes on in the auditory brainstem, and projecting to cortical sheets much as visual images do. The third level is where application-dependent features are extracted from the auditory images, abstractly modeling what likely happens in auditory cortex. Finally, and most abstractly, any appropriate machine-learning system is used to address the needs of an application, the brain-motivated neural network being a prototypical example. Each layer involves different disciplines, and can leverage the experiences of different fields, including hearing science, signal processing, machine vision, and machine learning.

Multichannel Audio

  • Upscaling Ambisonic Sound Scenes Using Compressed Sensing Techniques Authors: Andrew Wabnitz, Nicolas Epain, Alistair McEwan, Craig Jin
    This paper considers the application of compressed sensing to spherical acoustics in order to improve spatial sound field reconstruction. More specifically, we apply compressed sensing techniques to a set of Ambisonic sound signals to obtain a super-resolution plane-wave decomposition of the original sound field. That is to say, we investigate using the plane-wave decomposition to increase the spherical harmonic order of the Ambisonic sound scene. We refer to this as upscaling the Ambisonic sound scene. A focus of the paper is using sub-band analysis to make the plane-wave decomposition more robust. Results show that the sub-band analysis does indeed improve the robustness of the planewave decomposition when dominant overlapping sources are present or in noisy or diffuse sound conditions. Upscaling Ambisonic sound scenes allows more loudspeakers to be used for spatial sound field reconstruction, resulting in a larger sweet spot and improved sound quality.
  • Design of Transform Filter for Sound Field Reproduction Using Microphone Array and Loudspeaker Array Authors: Shoichi Koyama, Ken'ichi Furuya, Yusuke Hiwasaki, Yoichi Haneda
    In this paper, we propose a novel method of sound field reproduction using a microphone array and loudspeaker array. Our objective is to obtain the driving signal of a planar or linear loudspeaker array only from the sound pressure distribution acquired by the planar or linear microphone array. In this study, we derive a formulation of the transform from the received signals of the microphone array to the driving signals of the loudspeaker array. The transform is achieved as a mean of a filter in a spatio-temporal frequency domain. Numerical simulation results are presented to compare the proposed method with the method based on the conventional least square algorithm. The reproduction accuracies were found to be almost the same, however, the filter size and amount of calculation required for the proposed method were much smaller than those for the least square algorithm based one.
  • Robust combined crosstalk cancellation and listening-room compensation Authors: Jan Ole Jungmann, Radoslaw Mazur, Markus Kallinger, Alfred Mertins

Signal Enhancement

  • A Time-Domain Widely Linear MVDR Filter for Binaural Noise Reduction Authors: Jingdong Chen, Northwestern Polytechnical University and Jacob Benesty, INRS-EMT, University of Quebec
    This paper deals with the problem of binaural noise reduction in the time domain with a stereophonic sound system. We first form a complex signal from the stereo inputs with one channel being its real part and the other being its imaginary part. By doing so, the binaural noise reduction problem is converted to a single-channel noise reduction problem via the widely linear (WL) model. The WL estimation theory is then used to derive the minimum variance distortionless response (MVDR) noise reduction filter that can fully take advantage of the noncircularity of the complex speech signal to achieve noise reduction while preserving the desired signal (speech) and spatial information. Experiments are provided to justify the effectiveness of this MVDR filter.
  • Noise Estimation with Low Complexity for Speech Enhancement Authors: Pei Chee Yong, Sven Nordholm, and Hai Huyen Dam, Curtin University
    A noise estimation algorithm is proposed for single channel speech enhancement. By comparing the noise estimate with the short term noise and speech at every time frame, the noise estimate is efficiently updated by using a fixed stepsize. The step size is optimized based on the speech quality performance and the noise tracking capability. The proposed technique is capable of tracking noise spectrum variations, while remaining robust to the speech onsets. In addition, the noise estimator requires low computational complexity, which makes it effective for real time implementation in battery operated equipment. Simulation results show that the proposed method can achieve good speech quality and efficient noise tracking performance when compared to existing methods.
  • Application of Channel Shortening to Acoustic Channel Equalization in the Presence of Noise and Estimation Error Authors: Mark R. P. Thomas, Nikolay Gaubitch, and Patrick A. Naylor, Imperial College London
    The inverse-filtering of acoustic impulse responses (AIRs) can be achieved with existing methods provided a good estimate of the channel is available and the observed signals contain little or no noise. Such assumptions are not generally valid in practical scenarios, leading to much interest in the issue of robustness. In particular, channel shortening (CS) techniques have been shown to be more robust to channel estimation error than existing approaches. In this paper we investigate CS using the relaxed multichannel least-squares (RMCLS) algorithm in the presence of both channel error and additive noise. It is shown quantitatively that shortening the acoustic channel to a few ms duration is more robust than attempting to equalize the channel fully, giving better resultant sound quality for dereverberation. A key point of this paper is to provide an explanation for this added robustness in terms of the equalization filter gain. We provide simulation results and results for practical settings using speech recordings and room impulse response measurements from a real acoustic environment.
  • Non-Linear Acoustic Echo Cancellation using Online Loudspeaker Linearization Authors: Moctar I. Mossi, Christelle Yemdji, Nicholas Evans, EURECOM, and Christophe Beaugeant, Intel Mobile Communications
    This paper presents an approach to non-linear acoustic echo cancellation (AEC).We first present the model of a loudspeaker enclosure microphone system which is divided into two blocks: a non-linear, power filter model for the downlink path (loudspeaker and amplifiers) and a linear model for the acoustic channel and up-link path. Using this model we propose an approach that uses loudspeaker linearization and linear AEC to improve performance of an otherwise classical approach to linear AEC. The novel contribution in this paper relates to a new on-line linearization pre-processing algorithm that adapts to long-term variations in the loudspeaker characteristics. This feature constrats with fixed preprocessor aglorithms which have been reported previously.
  • A System Approach to Acoustic Echo Cancellation in Robust Hands-Free Teleconferencing Authors: Jason Wung, Ted Wada, Biing-Hwang Juang, Georgia Institute of Technology, Bowon Lee, Mitchell Trott, and Ronald Schafer, Hewlett-Packard
    This paper presents a system approach to the acoustic echo cancellation (AEC) problem in a noisy acoustic environment. We propose a method that makes use of the estimated nearend signal from a postfilter to further improve the AEC system performance. The cancellation performance is enhanced especially during strong near-end interference (e.g., double talk). Simulation results show that our stereophonic AEC based on the system approach with postfilter integration outperforms the one using the original robust AEC system by itself without postfilter integration. The improved performance is noted especially during double talk, where simulation results show that the echo return loss enhancement can be boosted by as much as 10 dB.