Sound examples fromTristan Jehan's PhD thesis:
Creating Music by Listening
These sound examples were all generated with default settings, and run within one single application written in Cocoa under Mac OSX, with the Skeleton library.
This basic experiment relies on segmentation, based on an auditory spectrogram. Sounds are scrambled, and juxtaposed randomly. No cross-fading or overlap was used. The results aim to be artifact-free.
A simple structure is adopted. The segments are ordered from the last one to the first one. The audio however, is not reversed. This would be roughly equivalent to playing the score backwards.
By overlapping segments with each other, it is possible to speed up the music without processing the audio, or changing its pitch and timbre.
Slowing down the audio however is much more complex. Two approaches were implemented, one based on phase vocoding, and the other on a time-domain approach. Bugs remain. This demonstrates simply the difficulty of the problem.
A beat tracker was implemented in order to define a metric to the music, where audio segments fit. A click track was added to the original music.
Similarly to "Scrambled Music", beat segments (as opposed to sound segments) are scrambled. The sensation of beat remains while the rest of the musical parameters is randomized.
Typically sounds repeat within a song, and often generate a sensation of pattern. Audio segments are analyzed in a recursive fashion using dynamic programming. This allows us to compare timbre similarities of sounds, beats, and find musical patterns.
The similarity measure between sound segments allows us to cluster them, and compress the audio in the time-domain. Artifacts, if any, appear in the music domain as opposed to the audio domain.
When the similarity measure only takes into account the dynamics of the music, variations at resynthesis are possible, with 'interesting' unexpected spectral manipulations.
A database of sound segments is generated from the segmentation of a "source" piece. The database is used to resynthesize a "target" piece. This is somewhat equivalent to the mosaicing process in the visual domain.
There are no obvious heuristics that allow us to define where the downbeat is in all cases. Often a change of chord is enough; some other times, prior understanding of a given rhythm is necessary. We use machine learning to generalize the process.
After beat tracking, downbeat estimation, and time stretching, an obvious application consists of transitioning two arbitraty songs by doing beat-matching and cross-fading.
Music Texture (or music stretching)
Given a window of only few measures, we generate an arbitrary long musical excerpt, that never repeats, yet keeps playing like being on hold.
Music files corrupted by large and loud noise sections of several seconds are repaired automatically by resynthesizing the corrupt sections with surrounding audio segments.