News:

As a consequence of the forum being updated and repaired, the chatbox has been lost.
However, you can still come say hi on our Discord server!

Main Menu
Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Topics - tarpman

#1
Misc. GS Hacking / Camelot's C compiler
10, January, 2021, 10:55:23 PM
Has anyone done any research into identifying or reproducing Camelot's C compiler?

I've been looking at agbcc, which I guess is supposed to be a reproduction of (or at least very close to) Nintendo's actual toolchain (apparently Cygnus GNUPro). The output is similar enough that I'm willing to believe Camelot's compiler is also a gcc, but there are some major differences too. These same patterns occur in Camelot's other games as well. I have not looked at many non-Camelot games, but the few I checked all look much more like agbcc.

I. Calls clobber r4

Camelot's code actually departs from the ARM standard by making r4 a call-clobbered (caller-save) register. This actually makes me think they modified the compiler themselves (or engaged Cygnus to do it for them) as I haven't found any commercial compiler offering this ability. Getting gcc/agbcc to do this is a trivial change in the Thumb config.

--- a/gcc/thumb.h
+++ b/gcc/thumb.h
@@ -405,7 +405,7 @@
#define CALL_USED_REGISTERS    \
{                              \
   1,1,1,1,                     \
-  0,0,0,0,                     \
+  1,0,0,0,                     \
   0,0,0,1,                     \
   1,1,1,1,1                    \
}


II. Register allocation order

I think it's fairly clear from reading the code that the low registers are allocated in reverse: r3, r2, r1, r0. This actually looks suspiciously similar to what agbcc's ARM compiler does. Prioritizing ip (r12) looks similar too, but it's not a straightforward comparison because r12 is a high register.

/* The order in which register should be allocated.  It is good to use ip
   since no saving is required (though calls clobber it) and it never contains
   function parameters.  It is quite good to use lr since other calls may
   clobber it anyway.  Allocate r0 through r3 in reverse order since r3 is
   least likely to contain a function parameter; in addition results are
   returned in r0.
   */
#define REG_ALLOC_ORDER      \
{                                   \
     3,  2,  1,  0, 12, 14,  4,  5, \
     6,  7,  8, 10,  9, 11, 13, 15, \
    16, 17, 18, 19, 20, 21, 22, 23, \
    24, 25, 26     \
}


agbcc's Thumb compiler does not define REG_ALLOC_ORDER at all, so the registers are just allocated in order.

III. Instruction scheduling

This one is harder to explain, or might be evidence that the compiler is not gcc/agbcc based.

agbcc's Thumb compiler does not support delay slots or function units for instruction scheduling. However, Camelot's compiler clearly has something of the sort. Here's a simple example, GS1 at 0x08004458:

ldr r1, =0x03001CB4
ldr r3, =0x41C64E6D
ldr r2, [r1]
add r0, r2, #0
mul r0, r3
ldr r3, =12345
add r0, r0, r3
str r0, [r1]
lsl r0, r0, #8
lsr r0, r0, #16
bx lr


My attempt at the corresponding C code, and agbcc's output (Compiler Explorer -- edited to inline the constants):

extern volatile unsigned rng_state;

unsigned short random(void) {
unsigned new_state = rng_state * 0x41C64E6D + 12345;
rng_state = new_state;
return new_state >> 8;
}


ldr r2, =0x03001CB4
ldr r1, [r2]
ldr r0, =0x41C64E6D
mul r0, r0, r1
ldr r1, =12345
add r0, r0, r1
str r0, [r2]
lsl r0, r0, #8
lsr r0, r0, #16
bx lr


The register allocation is reversed like I mentioned above (r2 → r1, r1 → r2, r0 → r3). Ignoring that, the interesting thing is the order of the first three instructions. You see that Camelot's compiler pipelined a second LDR while waiting for the first, but agbcc just lets the stall happen. This is a tiny example, but the same pattern exists throughout. Here's GS1 at 0x0808B25C:

push {r5, r6, lr}
ldr r2, =0x02000240
mov r3, #224
mov r12, r2
lsl r3, r3, #1
ldr r4, =0x0809E270
add r3, r12
mov r2, #0
ldrsh r0, [r3, r2]
ldmia r4!, {r2}


It consistently tries to insert at least one instruction (and I think it prefers two) after any load, before using the result. I don't think it's aware of memory regions -- it looks like it uses the same delay no matter whether the load is from IRAM, ERAM, or ROM.

IV. Strange literal pools

The above all make sense as performance improvements. This last one has me a bit stumped.

Usually a literal pool holds the literals in the order they're used. In some cases, though, a few literals are stored at the beginning of the pool instead. These early ones are also strange because sometimes they hold small numbers which could be encoded as immediates or synthesized -- sometimes even zero.

Here's an example from GS1 at 0x0800C1BA:

add r2, r6, #0
ldr r3, [pc, #4]
add r2, #84
strb r3, [r2, #0]
b 0x0800C230
.word 0
.word 0x0FFF


There's no need for a literal here -- mov r3, #0 is perfectly valid Thumb. The zero is also out of order: the second literal is referenced earlier, at 0x0800C172.

Using literal values this way has to be bad for performance. In this specific case the pool is nearby, but in most other cases it falls well outside of prefetch range (8 opcodes according to endrift). I assume, then, that these are compiled as placeholders, and are filled in by something else in the toolchain. The linker doesn't usually fill in values inside functions as far as I know, but if they modified the compiler, they could have modified the linker too...



Has anyone else done research in this area? Please post links!
#2
Introductions / Hello!
14, February, 2020, 01:22:21 AM
Hello, everyone. I'm tarpman, from Canada.

I've recently started disassembling and reverse-engineering Golden Sun as a fun learning project. I understand there's existing documentation out there but I'm trying to figure it out on my own as far as possible.

Early days still. My best achievement so far is writing programs to export the map and world-map backgrounds. (Just the static backgrounds, no effects or animations yet.) That's something though! Examples attached below.

I understand there's still interest in working on/improving editors for GS and GS2. I think I'd be interested in contributing to that sort of thing once I've learned more.

Looking forward to getting to know you all, and sharing my progress as I go.

Cheers!