Fixing old bugs, without the source

John Tsiombikas nuclear@mutantstargoat.com

22 April 2014

Once upon a time, I made a special kind of demoscene production: a wedtro. Which is a kind of small demo, made as a present to some other member of the demoscene, who is getting married. This wedtro, turned out to be the buggiest piece of shit I've ever released, and it’s been bugging me for the past decade. Until today I decided to fix it.

Retrospective

The original release of this wedtro was during the summer of 2003, in the hayday of the "second coming of the Greek demoscene". Our friend Cybernoid was getting married, and we (Nuclear / The Lab and Imak / Quadra) decided to make him a wedtro as a wedding present.

Imak came to my place, with his perlin noise procedural texture generation code, while pulled in my Direct3D 8.0 3D engine, that I had used for Absence a few months earlier. I also made a couple of crappy objects in 3d studio max, and we dug up an old fasttracker module by Thor/Quadra, which seemed cute and cuddly enough for the occasion.

My 3D engine code had rotted away apparently. Probably I was making half-assed changes and attempts to add new features after the release of Absence, and the wedtro effort was plagued with bugs from the start. I seem to remember that even trying to do a fade-out at some point with a blended fullscreen quad took us a long time to make it work. On top of everything else, Raoul/nlogn also dropped by, insisting to show us games on his dreamcast for some reason (Ikaruga and Rez if I remember correctly).

We didn't manage to finish the prod before the "deadline" (wedding day), but we managed to send it to them, the day after: http://www.pouet.net/prod.php?which=10431.

Z-buffering bug

A few years later, no computer in the known universe managed to run this wedtro correctly. I first noticed the bug on an ATI card, and my first reflex was to blame their crappy drivers as usual, but it turned out that no modern graphics card was able to render it correctly.

wedtroid-before

I finally decided to look into it around 2011, and see if I can find out why it broke over the years. I had since lost the source code of the demo itself, in the NFS-symlink rm -rf fiasco of 2008, so a fix was out of the question. But I had the code of the 3D engine (or at least a version of it) still, so I could at least satisfy my curiosity.

It turns out, I had two code-paths in the high-level Object::Render code:

if(TexUnits < 4) {
    Render2TexUnits();
} else {
    Render4TexUnits();
}

These calls managed the hellish fixed-function multitexturing API, to render the Objects with all their materials and textures. Render2TexUnits would be called on graphics cards with less than 4 texture units, and would use more render passes to blend all the textures correctly, while Render4TexUnits was able to render everything in just one or two passes, on capable hardware.

Problem is, when I was writing that code, I only had a Geforce2 MX, which as was typical of that era, only had 2 texture units. So, the 4-texture-unit code path was never actually tested. And it turns out, there was a lurking copy-paste bug in there, to ruin my decade.

There was a check in the begining of both these functions: if the caller requested to draw without writing to the zbuffer, they masked zbuffer writes for the duration of the call. But while Render2TexUnits did:

void Object::Render2TexUnits() {
    if(!rendp.ZWrite) gc->SetZWrite(false);
    ... do all the setup and drawing ...
    if(!rendp.ZWrite) gc->SetZWrite(true);
}

In Render4TexUnits, the following copy-paste bug went unnoticed:

void Object::Render4TexUnits() {
    if(!rendp.ZWrite) gc->SetZWrite(false);
    ... do all the setup and drawing ...
    if(!rendp.ZWrite) gc->SetZWrite(false);
}

As soon as one object was rendered with zwrites masked, nothing from that point on could ever write to the zbuffer again. So when the blended firework particles started being rendered, with zwrites disabled, in the wedtro, the cake and everything else from that point on became messed up...

Fixing it the hard way

Fast-forward a few years (2014), and the wedtro bug still bugs me. It's bugging me for a decade now... so I decided to try and do something about it. Fuck the source code; I download a disassembler (IDA Pro freeware), make a copy of wedtro.exe and start digging.

My initial plan was to locate Object::Render and make the call to Render2TexUnits unconditional. But it's kinda tricky going through vtables to find which virtual function is which (of course I had no debug symbols in the binary, which was compiled with msvc in release mode). So I started looking for the initialization code, which was simpler to locate since it referred to a lot of error strings like "Could not create Direct3D device", etc.

As luck would have it, immediately after two successive instances of throw EngineInitException("Could not create Direct3D device"), there was a block of code pulling the relevant bits from the D3DCAPS8 structure and putting them in class variables:

gc->WindowHandle = WindowHandle;
gc->D3DDevice->GetRenderTarget(&gc->MainRenderTarget.ColorSurface);
gc->D3DDevice->GetDepthStencilSurface(&gc->MainRenderTarget.DepthStencilSurface);
gc->ContextParams = *GCParams;
gc->ZFormat = zfmt;
gc->AASamples = AASamples;
gc->ColorFormat = FinalColorFormat;
gc->MaxTextureStages = adapters[AdapterID].Capabilities.MaxSimultaneousTextures;

gc->MaxTextureStages is exactly what we needed to stay less than 4, to avoid taking the branch into Object::Render4TexUnits during drawing.

I followed the call instructions in the debugger, matching the number of arguments with the push instructions preceding them (2 pushes for 1 argument in the code, because of the hidden this pointer argument). Then I went over a rep movsd which was obviously the gc->ContextParams = *GCParams line.

I proceeded to set a breakpoint on the next instruction after that:

mov ecx, [esp+1b0h+var_194]
mov [ebp+42d0h], ecx

and ran the program to see if it is plausible that it's the gc->ZFormat = zfmt line of the C++ source. I saw that it moved the value 75, which according to the D3D docs, corresponds to the D3DFMT_D24S8 enumeration, a zbuffer format!

The layout of the member variables of any class instance in memory, follow exactly the order in which they where declared in the source. GraphicsContext, according to the "3dengine.h" header file has:

D3DFORMAT ColorFormat, ZFormat;
int AASamples;
int MaxTextureStages;

So, since ZFormat was stored at [ebp+42d0h], it follows that MaxTextureStages should be at offset 42d8h. Indeed a few lines later, we encounter the instructions:

mov eax, [edx+ecx+0d0h]
mov [ebp+42d8h], eax

Step a couple of times to reach these lines, and see what gets loaded into eax, expecting to see the value 8 for any modern graphics card, and bingo: the magic value 8 appears in eax! Manually zero-out the contents of eax and let the program continue, while praying to Odin... and victory! The beauty of a perfectly z-ordered wedding cake almost blinds me.

wedtroid patching

From that point on, everything is straightforward. Going to hex-view shows that the instruction mov eax, [edx+ecx+0d0h] corresponds to the 7 opcode bytes: 8B 84 0A D0 00 00 00. Not wanting to be buggered by complex move instructions with mod-r/m plus displacement bytes in the opcode, I decided to just replace it with xor eax,eax (setting eax to 0), which is a very simple 2-byte opcode. Going through the intel manuals, I found that xor eax,eax is encoded as: 31 C0, leaving 5 more bytes of the original instruction to be filled with NOPs (opcode 90). So I made the change with a hex editor, and the result is a fixed wedtroid executable, which everyone can enjoy :)

wedtroid fixed

Feel free to download the fixed repack of wedtroid if you want to watch it (windows only unfortunately, though wine should have no difficulty running it too).


This was initially posted in my old wordpress blog. Visit the original version to see any comments.


Discuss this post

Back to my blog