TechTalks from event: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Music Signal Analysis

  • Optimizing the Mapping from a Symbolic to an Audio Representation for Music-to-Score Alignment Authors: Cyril Joder, Slim Essid, and Ga¨el Richard, Telecom Paris- Tech
    A key processing step in music-to-score alignment systems is the estimation of the intantaneous match between an audio observation and the score. We here propose a general formulation of this matching measure, using a linear transformation from the symbolic domain to any time-frequency representation of the audio. We investigate the learning of the mapping for several common audio representations, based on a best-fit criterion. We evaluate the effectiveness of our mapping approach with two different alignment systems, on a large database of popular and classical polyphonic music. The results show that the learning procedure significantly improves the precision of the alignments obtained, compared to common heuristic templates used in the literature.
  • Polyphonic Pitch Tracking by Example Authors: Paris Smaragdis, University of Illinois / Adobe Systems
    We introduce a novel approach for pitch tracking of multiple sources in mixture signals. Unlike traditional approaches to pitch tracking, which explicitly attempt to detect periodicities, this approach is using a learning framework by making use of previously pitch-tagged recordings as training data to teach spectrum/pitch associations. We show how the mixture case of this task is a nearest subspace search problem which is efficiently solved by transforming it to an overcomplete sparse coding formulation. We demonstrate the use of this algorithm with real mixtures ranging from solo up to a quintet recordings.
  • Scale-Invariant Probabilistic Latent Component Analysis Authors: Romain Hennequin, Roland Badeau, and Bertrand David, Telecom ParisTech
    In this paper, we present a new method for decomposing musical spectrograms. This method is similar to shift-invariant Probabilistic Latent Component Analysis, but, when the latter works with constant Q spectrograms (i.e. with a logarithmic frequency resolution), our technique is designed to decompose standard short time Fourier transform spectrograms (i.e. with a linear frequency resolution). This makes it possible to easily reconstruct the latent signals (which can be useful for source separation).
  • A Temporally-constrained Convolutive Probabilistic Model for Pitch Detection Authors: Emmanouil Benetos and Simon Dixon, Queen Mary University of London
    A method for pitch detection which models the temporal evolution of musical sounds is presented in this paper. The proposed model is based on shift-invariant probabilistic latent component analysis, constrained by a hidden Markov model. The time-frequency representation of a produced musical note can be expressed by the model as a temporal sequence of spectral templates which can also be shifted over log-frequency. Thus, this approach can be effectively used for pitch detection in music signals that contain amplitude and frequency modulations. Experiments were performed using extracted sequences of spectral templates on monophonic music excerpts, where the proposed model outperforms a non-temporally constrained convolutive model for pitch detection. Finally, future directions are given for multipitch extensions of the proposed model.
  • Probabilistic Latent Tensor Factorization Framework for Audio Modeling Authors: Ali Taylan Cemgil, Umut Simsekli, and Yusuf Cem Subakan, Bogazici University
    This paper introduces probabilistic latent tensor factorization (PLTF) as a general framework for hierarchical modeling of audio. This framework combines practical aspects of graphical modeling of machine learning with tensor factorization models. Once a model is constructed in the PLTF framework, the estimation algorithm is immediately available. We illustrate our approach using several popular models such as NMF or NMF2D and provide extensions with simulation results on real data for key audio processing tasks such as restoration and source separation.