Traveling Microphones

Following Sonic Phenomena Around the World

This fall term (2020), I had the wonderful opportunity to serve as TA for the course ARTS@ML: Why here? What's next? at the media lab. Seeing the diversely beautiful and interesting things my collagues in the class had been building toward, I was inspired to work on a little something of my own.

I've always been interested in semi-automated, non-linear explorations of content on Freesound. There's such an amazing range of content uploaded there, but also just so much of it; it's hard to know where to begin exploring. Previously, I've worked on projects that use visual object detection to retrieve related sounds for soundscape composition (as part of which I wrote a simple Swift API), pre-aggregated sounds to distribute onto an interactive street-level map, and more.

In this case, I built on past work in which I had tried to "sample the world", so to speak. I worked to find an interesting way to explore diversity and commonality in specific sonic materials, in as many different places as I could find sounds to represent. For example, here are kitchens across the planet:

Here's one for horns, where small melodic fragments sometimes emerge:

Following Sound

Given a search term (e.g. "water" or "voice"), I query the Freesound API for sounds tagged with it recorded, or rather geotagged as being near each, geographic coordinate (integer, i.e. one-degree, increments of latitude and longitude). I select at most one sound from a limited radius at each of these points, and store its preview.

To synthesize sequences of sounds, I wanted to produce a path through the audio that allowed for a diverse trajectory, but also preserved some parsimony between successive samples. More specifically I wanted to order the sounds by their content (what they sound like) rather than by any concept (such as their coordinates).

Getting Started

Obtaining representative segments of found sounds is an important pre-processing step: concatenating whole files would have produced long pieces, whose elements' individual character would likely each fade from memory before the next has a chance to make any salient impression.

I used the simplest of all heuristics for auditory salience: peak amplitude. It's quick and easy to seek to the loudest sample and excerpt material from there This is, of course, hugely noisy and approximate; it really just retrieves something likely to be noticeable, and thus usable. The duration of the excerpt is decided by multiplying a provided base duration (e.g. 0.1 seconds) by a randomly selected low power of two, with the exponent biased toward 0 (yielding 1, i.e. the base duration). This allows a kind of stochastic syncopation when all segments are sequenced, just to keep things more interesting than a uniform rate might otherwise.

Sampling Aesthetics

Much of my favorite music relies, in some form or another, on audio sampling, or more generally the manipulation of recorded sound (some examples: Parmegiani, Aaliyah/Timbaland, Noisia, etc.). Sampling constitutes a great diversity of recombinative practices.

For the past few years, I've really enjoyed music that stretches this concept a little by using tiny segments in large numbers. In a way I think it speaks to a combination of modern conditions: we're dealing with unprecedented quantities of media and information, increasing summarization and skimming, and impossible juxtapositions of content facilitated by new interactions. This brings to mind work like John Oswald's classic Plunderphonics and Akufen's microsampling; barely identifiable chunks in rapid succession, not so small as to be rid of any perceived material or causality (e.g. Vaggione's micromontage and granular work), but not so large that they can independently be attributed to their sources. I was also pointed by a student to Noah Creshevsky's hyperrealist music, another really interesting example. I chose this as my starting point: an approach wherein each sample depends on its neighbours' contextual value to collectively produce narrative and meaning.

Sequencing Sounds

There are a few ways to create a "perceptual path" through a set of audio clips. My initial strategy involved extracting common features, specifically MFCCs, identifying the two furthest apart (by Euclidean distance), and sorting clips between them. In a second iteration, I re-formulated this as an instance of the classic Traveling Salesman Problem (TSP). The TSP, as first posed in 1934 by Hassler Whitney (later described by Julia Robinson), can be thought of as, given a list of N cities, finding the smallest tour that completes a loop visiting each city once.

TSP

Despite the use of geographic coordinates, our sound "locations" here are actually perceptual features, and we allow the geographic coordinates to be arranged as they may (yielding to perceptual connections). Treating it this way admits a number of approximate solutions, for example using stochastic optimization methods like simulated annealing. In this case, I used mlrose's implementation of the TSP with a genetic algorithm, and a subset of MFCC features. Empirically, I found this consistently worked pretty well and was also approximate enough to sometimes be quite surprising.

Once organized, all that remains is to concatenate these sounds. Given an overlap argument (between 0 and 1), I retrieve the appropriate segments, apply fades on either side, and overlap them sequentially as appropriate.

Following this synthesis, I write out a sequence of filenames with coordinates and timestamps as a CSV file.

Visualizing Distributions

Interestingly, it seems there aren't any particularly inspired tools for this kind of mapping (time-varying with video output). I followed a seemingly common procedure, which is to use cartopy to write out a sequence of individual frames, and then apply the ffmpeg concat filter to turn them into a video. The ffmpeg command also takes the sequenced audio as an input, and that's it!

Here's just one more example with the search term "music", and a little quicker:

Several more examples here. I hope to build on this work to produce more shapeable and varied expressions. I hope, though, that it suggests just a little of the potential for creativity, connection, and conversation between the many different acoustic environments we inhabit. Thanks to all in the class for a great experience, and all of your thoughts allong the way as well!

— Nikhil Singh