For the course SOUND: PAST & FUTURE in the Spring of 2020, I worked on a piece of music, and a way of representing it for listeners on the web.
Piece
voicecoil is a fixed-media piece, partly inspired by Vladimir Ussachevsky’s beautiful Wireless Fantasy (1960), but explores synthetic signals with a much colder and harsher “digital” edge. These signals are taken apart into phrases and patterns, which are recombined through motion and superposition. Despite their range of textures, densities, and shapes, the sounds in this piece share a precise and rigid character, softened only by their relation to one another. Its duration is just under four minutes.
Representation
Notice above that the piece’s audio, hosted by Soundcloud, is represented by a waveform image, indicating the trajectory of the audio signal’s peak amplitude over the course of its duration.
We use waveform representations in audio editing all the time; often our edits correspond to the locations, shapes, and durations of signal amplitude events. For example, if you’re trying to edit background noise out of a voice part, substantial increases in amplitude likely correspond to note “onsets”, i.e. where the singer begins or resumes singing, since the voice is likely nearer to the microphone than sources of background noise.
Platforms like Soundcloud have added a dimension to the traditional (and still common) timeline display for audio listening, offering users a graphical accompaniment to the sounds they hear. This offers the same awareness of amplitude-related events, which can support “seeking” behaviors, i.e. looking for a specific moment or section based on either expectation or memory attached to visual anchors in the waveform.
I think there’s something interesting about waveforms for listening even totally linearly. They’re ambiguous figures, obscuring as much information as they reveal about the underlying content. Take this for example:
The waveform is one homogenous block, but the audio consists of changes in a parameter not tracked by a purely time-domain representation. Conversely, a spectrogram representation adds another dimension still (frequency), that would show these changes, but is largely applied as an analytical tool because of its literal and detailed nature (though you can find a number of fun spectrogram listening videos of electracoustic music here). Additionally, when applied to a longer structure, actually seeing the detail necessitates scrolling through the spectrogram more gradually, so some of the ability to see later events before hearing them is removed from the experience.
In looking for a representation strategy somewhere between these, that could offer a suggestive but not literal graphical complement to my piece, I was reminded of Rainer Wehinger’s hörpartitur for Ligeti’s Artikulation (1958).
Wehinger’s illustration is still quite an analytical, precise transcription of the piece, but it does offer an interesting and artistic interpretive layer. There have been a significant number of representation schemes since, most of which require extensive manual effort, and range from very analysis-oriented to very abstract.
For this project, I worked on building a very simple example of a system that could partially automate the production of a hörpartitur, in such a way that could function as a waveform replacement, for listening to voicecoil on the web. This representation can be found here (click anywhere on the visualization to play it), and is pictured below.
The high level process followed is described in this diagram:
The drawing part of the process aggregates some features across events (energy for the waveform, spectral centroid for the blue line), and uses primitives to indicate some events (circles for brief ones, triangles for longer ones, size indicating energy on both counts). These, and ultimately much of the process, are somewhat arbitrary, based on my own feelings about what might work well as an initial test.
I think the visual catches a few interesting moments in the audio, but also misses many and ultimately doesn’t work terribly well. I do also think that it points in an interesting direction; and my feeling is that some hybrid of partial automation, and visual parametric and gestural control (beyond shape and color, as here), could allow users to have more expressive and diverse representations of their music, for streaming online.