An Average WiiU Bug: SDL Audio

Heya! It’s about time for another blog post, since I sure do write them super frequently. I’m tossing up commiting to monthy or, uh, fortmonthy updates like my mate Alex did for a while; mostly for the random small projects I churn through (there’s a broken 3DS sitting on my desk) and to talk about things like this. As you may know, I’ve been cowriting a port of SDL to the wiiu with rw, which is what this post is about! Specifically, one of the many trials of audio.

Note: I’m writing this at 23:00 on my laptop, in bed, and there’s nothing I’d like less than to make diagrams. So even though this post desperately needs them.. sorry.

Pretty early on, SDL got a nice little audio driver that can play mono 16-bit audio. This is absurdly easy to do with AX, Cafe’s audio API (Cafe is the Wii U’s OS). AX works in terms of voices - you ask the system for a voice, provide it with a data location and the offsets into that data you want to work with, set it as playing, and away you go. This will get you one channel of audio that sounds like it’s coming from both the left and right speakers, which you can then stream audio to by messing with the offsets I mentioned before (they’re very deliberately not bounds-checked).

This method worked well enough, but there was a strange bug in SDL: the Gamepad speakers would make awful screeching noises if you didn’t disable the virtual surround sound in System Settings. Eventually we figured out a way to do this in SDL’s code, and the problem was.. wallpapered over for now. That’s how the audio driver in SDL was for many months and many releases.

Let’s take a quick look at how the per-device volume, or “device mix” was set in these versions of SDL:

AXVoiceDeviceMixData drcmix = {
    .bus = {
        { .volume = 0x8000 }, //bus 0
        { .volume = 0x0000 }, //bus 1
        { .volume = 0x0000 }, //bus 2
        { .volume = 0x0000 }, //bus 3
    },
};
AXVoiceDeviceMixData tvmix = drcmix;
//...
AXSetVoiceDeviceMix(this->hidden->voice, AX_DEVICE_TYPE_DRC, 0, &drcmix);
AXSetVoiceDeviceMix(this->hidden->voice, AX_DEVICE_TYPE_DRC, 0, &tvmix);

This seems reasonable enough. We make some device mix data, delve in and set the volume on bus 0 to 50%. There’s also a “delta” property on each bus which, as far as I can tell, is used for volume sweeps and the like (though I haven’t actually looked into it, and am frequently wrong about AX!) but, since C sets struct memebers to 0 if you don’t specify them, it’s no big deal. We then duplicate this for the TV, and pass these both into the AX function for setting the device mix of the voice we’re using for SDL audio. All seems well!

However, there’s no clear way to set up stereo sound here - bus 1, 2 and 3 are used for sound effects and get mixed right into the master, with no left/right panning in sight. As far as I could tell, everything came out of both speakers, no matter what; and the only potential help - RetroArch, which does have working stereo - uses an unusal and undocumented API that doesn’t seem to be used in any other homebrew, which provided very little help. I really didn’t want to switch SDL to this API, as I had (and still have) serious doubts about where the knowledge of how it works came from. In any case, there was only one place I could check for help.

Decaf.

I do not exaggerate when I say that the work of exjam and brett19 may well be the single best reference for a Wii U homebrew developer there is (at the time of writing, anyway!~). At the time of writing, the emulator takes a HLE approach, which means they need to have code for most of Cafe’s API functions so the emulated game can call them. Code that I can read and reference! After finding and consulting decaf’s implementation of AXSetVoiceDeviceMix, we find something a little odd when it reads out the data we passed in…

for (auto c = 0; c < AXNumTvChannels; ++c) {
    for (auto b = 0; b < AXNumTvBus; ++b) {
        extras->tvVolume[deviceId][c][b].volume = ufixed_1_15_t::from_data(mixData[c].bus[b].volume);
        extras->tvVolume[deviceId][c][b].delta = ufixed_1_15_t::from_data(mixData[c].bus[b].delta);
    }
}

Here, mixData is the AXVoiceDeviceMixData we built before, and extras->tvVolume is where Decaf keeps track of, well, the volume. We see it delving into mixData, and looping over all the buses with b as the index. That’s expected, we set up four buses in the mixData struct, so all is well. However… what’s c doing? It goes from 0 to AXNumTvChannels… Channels, as in left/right audio channels? (A quick dip into decaf’s device mixing function, where the emulator mixes all the voices together and applies these volume modifiers, confirms this; as well as my assertions about the other buses from before.)

In C, treating mixData like an array by using square brackets and an index implies that there are several AXVoiceDeviceMixData structures right after each other, and the use of AXNumTvChannels (with confirmation in the mixing function) shows that each of these structures applies to a different audio channel - left, right, then the different surrounds (depending on device). However, our code only provides one - when Cafe tries to read out the other channels, looking at the memory directly after what we specified, it would be reading garbage!

I had to make a somewhat embarassing commit so that the mix data gets filled out properly - in that version, all the channels and buses are explicitly laid out, but later this got simplifed, taking advantage of the “unspecified members are 0” behaviour I mentioned before. Now that tvmix and drcmix are arrays with the correct number of AXVoiceDeviceMixData structures laid out back-to-back, the gamepad surround sound issue suddenly vanishes, and I get the control I need over left/right panning for each voice.

So, why didn’t this get caught earlier? I think the critical factor in covering this whole issue up is a small quirk in the way the older, broken code was laid out:

    //defining drcmix...
    },
};
AXVoiceDeviceMixData tvmix = drcmix;

As we now know, when Cafe sees drcmix, it first looks at the left channel, using the pointer we supplied. It then looks at the memory directly after that for the right channel data. Now, I can’t say if compiler optimisations make my idea here invalid, or some other factor was at play, but looking at the code, what comes right after drcmix?

Riding on this layout choice, both the left and right channels get set to the same value, leading me to believe the voice is actually centred. Since this only happens for the gamepad, where I’m using headphones, this is really clear. In the TV’s case, if the value the console read for the right channel (after tvmix) happened to be low or 0, I doubt I would notice - I never tested with a proper surround system, instead an average TV that has both speakers more or less on top of each other. If some other way of doing this same thing had been chosen - like calling AXSetVoiceDeviceMix right after getting drcmix set up - the right channel would have likely been different; and I probably would have found this out much earlier! It be like that sometimes, I guess.

You’ll be glad to know that once that got fixed, rw and I quickly implemented stereo audio, and in theory arbitrary-channeled audio; though that came with its own set of fun bugs and challenges that could get blog posts of their own. It’s not done yet, but you can look forward to, uh, hearing Slimers in stereo soon?