Audio has been a neglected area of our game engine for a long time. Except for a few tweaks, we have essentially used the exact same code for all our games. A pretty simple channel pool with pitch and volume per channel and an OpenAL back-end for iOS, Windows and OSX and an OpenSL back-end for Android.

OpenAL is quite okay for iOS and OSX, but requires a third-party installation on Windows. That’s not a big issue since we’re only using Windows for development, but still annoying. A bigger issue is the lack of render-to-file support in almost every Audio API I have seen, including OpenAL and OpenSL. That’s a pretty given feature for a game developer, I can’t believe there is so little support for it. We always render out gameplay sequences directly from the game engine to use in trailers, teasers, etc. Because we render in high resolution with motion blur there is no way to capture the audio in real-time along with the video, so audio has to be manually added later in a video editing software. Obviously a very time consuming procedure. A better way would be to render out the game audio frame by frame along with the video, but this requires software mixing and direct access to the final mix - something a hardware-centric audio API cannot offer. Does anyone know if OpenAL and OpenSL actually use hardware mixing these days or do they just mix in software anyway?

I decided to write my own software mixer with a very thin hardware abstraction layer per platform. Partly because I would get the render-to-file feature basically for free, but it also opens up the possibility to add in effects like echo and reverb.

I thought for sure that the hardest part of the mixer project would be to write the actual mixer. I couldn’t be more wrong.. It turns out researching approrpriate low-level audio interfaces for each platform took way more time. There are at least four or five different APIs on Windows and just as many on OSX, yet it’s suprisingly hard to just stream raw PCM data to the left and right speaker. Extremely frustrating, because this is exactly what the driver wants in the end anyway. Why not just offer a simple callback from a high-priority system thread - feedMoreDataPlease(void* buffer, int size)? I get that some people want 3D positioning, Internet streaming and what-not, but that’s not a reason to hide low-level access for the rest of us that just want to submit a raw mix to the driver.

Cudos to Apple for exposing exactly this type of interface through the Audio Unit API (part of Core Audio). It can actually be configured to do what I’m asking for, and offers really good latency too - around 10 ms on both OSX and iOS.

Android doesn’t seem to offer any viable options to OpenSL through the NDK. Fortunately OpenSL is also reasonably sane when it comes to streaming and offers latency in the 50 ms range (plus some hidden latency within OpenSL or the drivers, unclear exactly how much). It’s not perfect, but acceptable. OpenSL is a beast to setup though. Why on earth would anyone design an API like that? And there is very little help available online except for the 600 page reference manual. I finally got it to work and I hope I never ever need to touch it again.

Windows is the only platform where I still haven’t found a decent low-level audio API. There is Media Foundation and WASAPI which I haven’t looked at because it’s Windows 7 and Vista only. DirectSound is probably the most widely used, but it doesn’t seem to offer a direct callback. Instead it relies on a user thread to feed new buffers periodically (making it virtually useless for low-latency stuff due to the horrible Windows scheduler). There is also the old waveOut interface in WinMM which at a first glance looks like a perfect match - it offers a callback when the driver needs more data, but here’s the catch - you are not allowed to feed audio data from the callback itself because it may cause the process to deadlock! You are supposed to do this from a separate thread and can only commnuicate with certain “safe” system functions. I’m totally ignoring that for the time being, feeding audio data from the callback and it seems to work great in Windows 7 at least (I suppose this is because the waveOut interface is deprecated and wrapped in fifteen layers of user-mode code at this point…). The latency with the waveOut method ended up being in the 30 ms range.

It took some experimenting to get decent performance of the software mixer, but since it all happens on a separate thread I’m not too concerned these days when even mid-range mobile phones have multiple cores…