Machine Listening
Julius O. Smith
Associate Professor of Music and Electrical Engineering
Stanford University, CCRMA
Description
Since audio signals are interpreted by the human ear-brain system, that complex
perceptual mechanism should be simulated somehow in software for "machine
listening". In other words, to perform on par with humans, the computer
should hear and understand audio content much as humans do. Analyzing audio
accurately involves several fields: electrical engineering (spectrum analysis,
filtering, and audio transforms); psychoacoustics (sound perception); cognitive
sciences (neuroscience and artificial intelligence); acoustics (physics of sound
production); and music (harmony, rhythm, and timbre). Furthermore, audio transformations
such as pitch shifting, time stretching, and sound object filtering, should
be perceptually and musically meaningful. For best results, these transformations
require perceptual understanding of spectral models, high-level feature extraction,
and sound analysis/synthesis. Finally, structuring and coding the content of
an audio file (sound and metadata) stand to benefit from efficient compression
schemes, which discard inaudible information in the sound.
Written Requirement
The written requirement for this area will consist of a 24-hour take-home exam
to be evaluated by Professor Julius O. Smith.
Reading List
Books:
Applications
of Digital Signal Processing to Audio and Acoustics, edited by Mark Kahrs
and Karlheinz Brandenburg, Kluwer Academic Publishers, 1998.
T. Quatieri, Discrete-Time
Speech Signal Processing, principles and practice, Prentice Hall Signal
Processing Series, Alan V. Oppenheim Series Editor.
M. Bosi and R. Goldberg, Introduction to Digital Audio Coding: Basic Principles
and Audio Coding Standards, (Unpublished Manuscript).
Dafx:
Digital Audio Effects, Edited by Udo Zoelzer, Wiley, John & Sons, Incorporated,
May 2002.
B. Moore, An
Introduction to the Psychology of Hearing, Academic Press, 1997.
Musical
Signal Processing, Edited by: Curtis Roads, Stephen Pope, Aldo Piccialli,
and Giovanni De Poli, Swets & Zeitlinger Publishers, 1997.
E. Zwicker and H. Fastl, Psychoacoustics:
Facts and Models, Springer Verlag, 1999.
Theses:
J. O. Smith, Techniques for Digital Filter Design & System Identification
with Application to the Violin, PhD/EE Dissertation, Stanford University, June
1983.
S. Levine, Audio Representations
for Data Compression and Compressed Domain Processing, PhD Dissertation,
Stanford University, 1998.
T. Verma, A Perceptually Based Audio
Signal Model With Application to Scalable Audio Compression, PhD Dissertation,
Stanford University, 2000.
X. Serra, A system for sound analysis/transformation/synthesis based on a deterministic
plus stochastic decomposition, PhD Dissertation, Stanford University, Oct. 1989.
E. Sheirer, Music-Listening Systems,
PhD Dissertation, Massachusetts Institute of Technology, Media Lab, April 2000.
D. Ellis, Prediction-driven computational
auditory scene analysis, PhD Dissertation, Massachusetts Institute of Technology,
Media Laboratory, April 1996.
M. Casey, Auditory Group Theory: with
Applications to Statistical Basis Methods for Structured Audio, Ph.D. Thesis,
Massachusetts Institute of Technology, Media Laboratory, February 1998.
P. Smaragdis, Redundancy Reduction
for Computational Audition, a Unifying Approach, MIT, Media Laboratory,
May 2001.
D. Robinson, Perceptual Model
for Assessment of Coded Audio, PhD Dissertation, University of Essex, Department
of Electronic Systems Engineering, March 2002.
Relevant publications:
X. Serra, Musical Sound Modeling
with Sinusoids plus Noise, Musical Signal Processing, C. Roads et al., Editors.
Swets & Zeitlinger Publishers, 1997.
R. J. McAulay, Th. F. Quatieri, Speech analysis/synthesis based on a sinusoidal
representation, IEEE Trans. on Acoust., Speech and Signal Proc., vol ASSP-34,
pp. 744-754, 1986.
K. Brandenburg, MP3 And AAC
Explained, In Proceedings of the AES 17th International Conference, Florence,
Italy, 1999.
K. Brandenburg and H. Popp, An
introduction to MPEG Layer-3, Fraunhofer Institut fur Integrierte Schaltungen
(IIS), EBU Technical Review, June 2000.
K. Brandenburg and M. Bosi, Overview of MPEG Audio: Current and Future Standards
for Low Bit Rate Audio Coding, J. Audio Eng. Soc., Vol. 45, No. 1/2, pp. 4--21,
Jan./Feb. 1997.
E. Scheirer and Barry Vercoe, SAOL:
The MPEG-4 Structured Audio Orchestra Language, Computer Music Journal 23:2
(Summer 1999), pp 31-51.
E. Scheirer, The MPEG-4 Structured
Audio Standard, Proc. 1998 IEEE ICASSP (invited paper), Seattle, May 1998.
D. Robinson & M. Hawksford, Psychoacoustic
Models and Non-linear Human Hearing, Proceedings of the 109th Convention
of the Audio Engineering Society, Los Angeles, September 2000.
G. Todd et al., AC-3: Flexible
Perceptual Coding for Audio Transmission and Storage, Proceedings of the
96th Convention of the Audio Engineering Society, February 1994.
D. Pan, A Tutorial on Mpeg Audio
Compression, IEEE Multimedia Journal, summer 1995.