Table of Contents
The Audio Industry
To be completely honest, I had no real knowledge or the curiosity to know about this field and that it even exists, but it just happened to be the case that YouTube recommended this talk1 from CppCon 2015 by Timur Doumler.
This area intrigues me as it (as well as a lot of other fields) isn’t really “well known”. All the software that you might be using on a daily basis has a sound engine in it. From the viewing YouTube videos on your browser, streaming music on your phone, even the small QOL audio feedback you get when interacting with software on your OS has been carefully engineered to talk with the actual audio hardware. This software is key in our lives and even for livelihoods such as music production, DJ’s, musicians, artists, etc. just to name a few.
What really makes me interested is the new and upcoming “digital” instruments and synthesizers that have surfaced into production in the last few years.
Which is a 3D touch surface that has an audio engine running on an embedded system. It’s dealing with extremely high quality instrument samples that have to be played back in real-time with < 2ms latency.
Working with audio
Anyway, coming back to what I have the ability to explain, it is all about getting these key points that we are looking for:
- Fast & Efficient Digital Signal Processing
- Lock-free thread synchronization
- Cross-platform support
Hence, the obvious question is: What is the best way to go about it?
C++ is a
good choice, since most of these drivers and APIs are written in
C it is
easier to interface with, which is what I will be talking about.
There are conventions and best practices with using these APIs that suit your needs. Audio Data has a lot of ways in which it is represented, but this is the most common that I have come across.
Audio data is represented as amplitude data for each channel in time that are played back at the sample rate.
We interact with this data through some audio callbacks that usually run on a separate high priority thread. These are used to generate as well as process the audio and typically have the following signature:
The sample rate is in the order of kilohertz, commonly 44.1kHz, 98kHz, etc., which means that our audio callbacks are being called with chunks of samples (typically 32 or 1024) which are not necessarily the same size for ~100 times per second. Hence, our callbacks cannot be blocking on the CPU.
Exceeding this rate will result in audio being dropped causes an audio dropout or a glitch which is basically you can hear like as a crackle or like a silent gap in the audio. This is because the audio buffer is being filled faster than it is being consumed. It’s immediately audible and, even if you drop just one buffer you can hear that.
So here are the rules of audio code that Timur Doumler mentioned in his talk:
- Rule #0 of audio code: The audio callback waits for nothing.
- Rule #1 of audio code: You never want to cause audio callbacks to dropout
The anatomy of audio samples
Audio samples have a butt-ton of terminology associated with them. So let’s define the following terms:
- Number of channels (for stereo)
- Number of samples per buffer
- The sample rate is the number of samples per second
- The stream size can be represented as this matrix:
I gloss over a lot of explanation here, but to give the gist of it, each is present in a specified format that your OS or API supports. Some usual
data types are
int16_t and normalized ()
Just this fact causes a lot of code to be platform dependent for correctly parsing these buffers.
My visualizer - Murl
I decided to create a visualizer that can be used to visualize these audio buffers. Murl3 (Music Uptake Rendering Library) is an open source and the source code is available under the GPLv3 license.
I’m using stuff here that I have learned from a lot of places and got the idea from our beloved programming outcast Zozin4 and a lot of other stuff that will definitely make this him very angry XD.
It taught me many things, and I am happy to share it with you. I know that I can improve it a lot more, and I’ll definitely post updates in the future.
It is very simple for now, you can drag and drop any audio file, and it will visualize it.
Here5 is a demo of the visualizer compiled to WebAssembly in action:
Rto reload the fragment shader (when testing on desktop).
Spaceto play/pause or
Tapthe screen on mobile.
If you aren’t able to interact with the demo above, here is a small video6 of it visualizing a sine wave:
I would appreciate it if you would give a star to the repo3 and follow me on GitHub. I would also love to hear your feedback, suggestions, or contributions to the repo. And I hope to have made you just a little more excited the next time you are listening or recording some music.