APPENDIX A
“Skeleton”

 

A score is like a skeleton.
      – John Zorn

 

 




Figure A-1: Skeleton software screenshot.


The author has developed this work within his own environment named “Skeleton.” Skeleton is both a stand-alone Mac OSX application with a simple GUI (Figure A-1), and an API primarily designed to speed up, standardize, and simplify the development of new applications dealing with the analysis of musical signals. Grounded upon fundamentals of perception and learning, the environment consist of machine listening, and machine learning tools, supported by flexible data structures and fast visualizations. It is being developed as an alternative to more generic and slower tools such as Matlab. It is composed of a set of original Objective-C frameworks, and open-source C libraries encapsulated by Objective-C frameworks. The software architecture of Skeleton is depicted in Figure A-2, and is described below:

A.1 Machine Listening

The machine listening software includes: pitch, loudness, brightness, noisiness, Bark, frequency masking, time-domain masking, outer ear, auditory spectrogram, segmentation, tatum, beat, pattern analysis, chromagram. Most of the listening software is also implemented for real-time use in the Max/MSP environment, and is available at: http://www.media.mit.edu/~tristan/.

A.2 Machine Learning

The machine learning software includes: dynamic programming, matrix manipulations, distance measures, support vector machines, artificial neural networks, cluster-weighted modeling (mixture of Gaussians), k-means, downbeat prediction, segment, beat, and pattern similarities.

A.3 Music Synthesis

The applications running in the GUI include: scrambled music, reversed music, compression, cross-synthesis, music texture, beat-matching and cross-fading.

A.4 Software

The software is written for Macintosh in the native Objective-C language. It includes a GUI built with interface builder, an audio player using Core Audio, fast and flexible displays using Open-GL, fast linear algebra with BLAST, FFT and convolutions with Altivec, read and write audio files using sndLib, MP3 decoding with LibMAD, database management with mySQL, machine learning with packages SVMlight, CWM, nodeLib. The hierarchical clustering, as well as certain graphical representations (dendrogram, downbeat, state-space reconstruction) are currently implemented in MATLAB.

A.5 Database

The software creates and connects automatically to a mySQL server, which can store and efficiently retrieve the analyzed data. Songs can be pre-analyzed and the result of their analysis is stored in the database. The various applications typically retrieve the metadata directly from the database. The database initially contains four tables, namely: SONGS, SEGMENTS, BEATS, PATTERNS with the following fields:

SONGS:
fileName, ID, path, sampleRate, numSamples, numFrames, hopSize, numSegments, numBeats, meanTempo, signature, pitchSignal, loudnessSignal, brightnessSignal, noisinessSignal, segmentSignal, barkSpectrogram
SEGMENTS:
songID, ID, startInSec, lengthInMs, sampleIndex, numSamples, loudness, phase, c0 ... c11
BEATS:
songID, ID, startInSec, tempo, sampleIndex, numSamples, segmentIndex, numSegments, phase, c0 ... c11
PATTERNS:
songID, ID, sampleIndex, numSamples, frameIndex, numFrames, segmentIndex, numSegments, beatIndex, numBeats, c0 ... c11

Each time a new song is being analyzed, it adds one new row to the SONGS table, and appends multiple rows to the other tables. It can also create a series of self-similarity matrix tables. Each of these tables contains many columns (as many as there are segments, beats, or patterns in the song) and as many rows.




Figure A-2: Skeleton software architecture.


 

I’m saying: to be continued, until we meet again. Meanwhile, keep on listening and tapping your feet.
      – Count Basie