You can see the mp4 file on the left, some unimportant space in the middle, and on the right is what’s called the stack. The vast majority of programs are built using sections of code called subroutines - the idea is that you have a subroutine that performs a specific task (doing some math, allocating memory, drawing the screen) that you jump to as neccesary. One caveat of this programming model is that your program needs to know where to go back to after a subroutine has completed its task. This is solved using a stack - a special section of memory where the program stores addresses that tell a subroutine where to go when it’s finished.
Anyway! Now that the program is ready to start decoding our mp4, the stage is set for our exploit. Our mp4 has a ‘tx3g’ section, which causes the decoder to do two things:
So, what’s the problem here? Well, when figuring out how much memory to allocate, the decoder adds the size from the mp4 with another number from the mp4. By carefully controlling these two numbers, we can abuse the fact that computers have an upper limit on the size of their numbers to perform what’s called an integer overflow - to cut a long story short, the decoder ends up with a number that’s smaller than the ones it started with. The decoder will now allocate a buffer that’s too small for the data it intends to write. We got it!
Here’s that in pictures. The program allocates a section of memory in response to the tx3g section in the mp4…
Our integer overflow tricks from before mean that the program starts to write too much data, sailing beyond the end of the allocated section…
Ta-da! The program has unknowingly overwritten the stack with data from our mp4 file. If we craft this data correctly, we can make subroutines in the process of returning jump wherever we’d like instead of where they’re supposed to! From there it’s a case of ROP, JIT and finally code exec (subjects for another day).
This, of course, is an ideal case. It’s more or less what happened in the browser (as far as I know, the tx3g buffer actually ended up inside the stack) and is an example of how it works when everything goes to plan.
Crunchyroll is an example of when everything does not go to plan.
In the process of trying to apply this technique to Crunchyroll’s video player, I’ve learned of several key differences between it and the Internet Browser. For example, when Crunchyroll loads up our video (just before the buffer overflows) the memory looks like this:
Since the decoder copies data from inside our mp4, it grows along with the amount of memory we want to overwrite. This quickly reveals another problem: there’s an upper limit on how much memory we can overwrite. The app will just give up if an mp4 is too big. If we overwrite as much as we can, we’ll end up with something like this:
So close! We can’t quite get to the stack with this overflow, which means we can’t get easy code execution. Instead, we have to rely on the random guff… Not ideal. Sure, there’s still code addresses in there which we can overwrite, but it’s not nearly as reliable as the stack. There’s also the small matter of finding the stuff to overwrite in the first place, along with picking one that will be used before some other side effect of the overflow crashes everything, and dealing with the constantly-changing nature of the memory layout… It’s a lot to overcome, and we haven’t even talked about JIT yet.
Hopefully this post gave you a bit of insight into how these exploits work, and the problems one faces when trying to exploit them under weird and wonderful applications. Just because there’s a vulnerability, bug or crash, doesn’t mean it’s easy to exploit or even can be. I intend to keep working on Crunchyroll at least until 5.5.2 is hacked one way or another, though it may not be this specific vulnerability. Only time will tell.
If you have a question or comment on this post, please email me or contact me!