This is a collection of SuperCollider and Python code to create a realtime Ambisonic control system for arbitrary monophonic audio inputs. Audio I/O is built for Linux with Pipewire-jack. Using a Focusrite Scarlett 18i8 2nd Gen Pro USB audio interface, I get ~10ms latency for microphone input + Ambisonic control to headphone ouput.
The file ambisonic-panner.sc is a prototype of the full stack of things to create a demo with various hardware inputs. It consists of the following
- Hard coded Pipewire-jack nodes from the Scarlett to bus 0 and 1 in SC. Pipewire quantum settings are documented in the comments.
- Initialize a SerialOSCClient for a Monome Arc
- Encoder/Transformer/Decoder from the Ambisonic Tool Kit (ATK)
- A MIDI input device that can make note on/off events
- Binaural headphones (most over-the-ear will work). I have this AKG device.
- A microphone connected to the first input of the Scarlett
This depends on a Monome Arc for angle/azimuth/elevation/distance controls for the microphone input.
- Enc1 is angle, which is how directional the soundfield is percieved. Hard right is in front, hard left is behind.
- Enc2 is azimuth, which is where the sound is relative to your nose (roughly).
- Enc3 is elevation, which is how high the sound energy is relative to your nose(?).
- Enc4 is distance from you, which is simply the volume of the sound source.
The MIDI keyboard can play a simple polyphonic syntheziser. Speak into the microphone and move your voice around to compare its location to the synth! Fun!
- Arc LED indication
- Use the ControlSpec from SerialOSCClient for param scaling
- Create an ambisonic bus in SC for the MIDI synth so each note doesn't create a whole new soundfield pipeline
- Enable hardware controls for all channels
- Monome Grid positioning of channels by mapping a 2D plane to a 3D soundfield
- Gen AI speech synthesis with XTTS V2 model
- Gen AI music input from Facebook Musicgen open source model
- Higher order Ambisonics with Near-Field controller HOA
The Python code was a first attempt to get to the same place as the SuperCollider code, but using the Pyo DSP framework. Python also has PyTorch for gen AI model inference. I envision two controllers, one doing text-to-speech with gen AI and sending audio over a Pipewire source (or analog out from a second host with a bigger GPU) into a Pipewire sink on the same host. The other doing the Ambisonic DSP and hardware controls.