* ObjTree OK, data imported * EnMs OK, data imported * And the spec * OK * Some minor edits * A lot of preliminary stuff * Mostly complete beginning * First draft of other functions doc * Whoops, forgot the GlobalContext pad * Draw functions (minus colour), create Data * Data * gitignore, some progress on documenting * Review comments, continue documenting * spec * Finish off documentation * undefined_syms * Add a couple of todos * One more * At least add tools for object decomp * Start conversion table stuff * Document ObjTree * Document EnMs * Add more tables to conversions * Maide's review * Review * Review * Typos and incomplete thoughts * Update vscode.md * Correct function/variable names * Review suggestions * Format * Missed one * Rename functions and format * Fix ObjTree * Update actorfixer.py, fix some variable names * Some review * Review suggestions * More review * Hopefully fix all the thisx references * Missed one
36 KiB
Beginning decompilation: the Init function and the Actor struct
Up: Contents
Open the C file and the H file with your actor's name from the appropriate directory in src/overlays/actors/
. These will be the main files we work with. We will be using EnRecepgirl (the rather forward Mayor's receptionist in the Mayor's residence in East Clock Town) as our example: it is a nice simple NPC with most of the common features of an NPC.
Each actor has associated to it a data file and one assembly file per function. During the process, we will transfer the contents of all or most of these into the main C file. VSCode's search feature usually makes it quite easy to find the appropriate files without troubling the directory tree.
Anatomy of the C file
The actor file starts off looking like:
// --------------- 1 ---------------
// --------------- 2 ---------------
#include "z_en_recepgirl.h"
#define FLAGS 0x00000009
#define THIS ((EnRecepgirl*)thisx)
// --------------- 3 ---------------
void EnRecepgirl_Init(Actor* thisx, GlobalContext* globalCtx);
void EnRecepgirl_Destroy(Actor* thisx, GlobalContext* globalCtx);
void EnRecepgirl_Update(Actor* thisx, GlobalContext* globalCtx);
void EnRecepgirl_Draw(Actor* thisx, GlobalContext* globalCtx);
// --------------- 4 ---------------
#if 0
const ActorInit En_Recepgirl_InitVars = {
ACTOR_EN_RECEPGIRL,
ACTORCAT_NPC,
FLAGS,
OBJECT_BG,
sizeof(EnRecepgirl),
(ActorFunc)EnRecepgirl_Init,
(ActorFunc)EnRecepgirl_Destroy,
(ActorFunc)EnRecepgirl_Update,
(ActorFunc)EnRecepgirl_Draw,
};
// static InitChainEntry sInitChain[] = {
static InitChainEntry D_80C106C0[] = {
ICHAIN_U8(targetMode, 6, ICHAIN_CONTINUE),
ICHAIN_F32(targetArrowOffset, 1000, ICHAIN_STOP),
};
#endif
// --------------- 5 ---------------
extern InitChainEntry D_80C106C0[];
extern UNK_TYPE D_06001384;
extern UNK_TYPE D_06009890;
extern UNK_TYPE D_0600A280;
// --------------- 6 ---------------
#pragma GLOBAL_ASM("asm/non_matchings/overlays/ovl_En_Recepgirl/EnRecepgirl_Init.s")
#pragma GLOBAL_ASM("asm/non_matchings/overlays/ovl_En_Recepgirl/EnRecepgirl_Destroy.s")
#pragma GLOBAL_ASM("asm/non_matchings/overlays/ovl_En_Recepgirl/func_80C100DC.s")
#pragma GLOBAL_ASM("asm/non_matchings/overlays/ovl_En_Recepgirl/func_80C10148.s")
#pragma GLOBAL_ASM("asm/non_matchings/overlays/ovl_En_Recepgirl/func_80C1019C.s")
#pragma GLOBAL_ASM("asm/non_matchings/overlays/ovl_En_Recepgirl/func_80C10290.s")
#pragma GLOBAL_ASM("asm/non_matchings/overlays/ovl_En_Recepgirl/func_80C102D4.s")
#pragma GLOBAL_ASM("asm/non_matchings/overlays/ovl_En_Recepgirl/EnRecepgirl_Update.s")
#pragma GLOBAL_ASM("asm/non_matchings/overlays/ovl_En_Recepgirl/func_80C10558.s")
#pragma GLOBAL_ASM("asm/non_matchings/overlays/ovl_En_Recepgirl/func_80C10590.s")
#pragma GLOBAL_ASM("asm/non_matchings/overlays/ovl_En_Recepgirl/EnRecepgirl_Draw.s")
It is currently divided into six sections as follows:
- Description of the actor. This is not present for all actors, (and indeed, is not present here) but gives a short description based on what we know about the actor already. It may be inaccurate, so feel free to correct it after you understand the actor better, or add it. It currently has the form
/*
* File: z_en_recepgirl.c
* Overlay: ovl_En_Recepgirl
* Description: Mayor's receptionist
*/
-
Specific
include
s anddefine
s for the actor. You may need to add more header files, but otherwise this section is unlikely to change. -
These are prototypes for the "main four" functions that almost every actor has. You add more functions here if they need to be declared above their first use.
-
if
'd-out section containing theInitVars
and a few other common pieces of data. This can be ignored until we import the data. -
A set of
extern
s. These refer to the data in the previous section, and, data that comes from other files, usually in the actor's corresponding object file. The latter point to addresses in the ROM where assets are stored (usually collision data, animations or display lists). Once the corresponding object files have been decompiled, these will simply be replaced by including the object file (see Object Decompilation for how this process works). These symbols have been automatically extracted from the MIPS code. There may turn out to be some that were not caught by the script, in which case they need to be placed in the file calledundefined_syms.txt
in the root directory of the project. Ask in Discord for how to do this: it is simple, but rare enough to not be worth covering here. -
List of functions. Each
#pragma GLOBAL_ASM
is letting the compiler use the corresponding assembly file while we do not have decompiled C code for that function. The majority of the decompilation work is converting these functions into C that it looks like a human wrote.
Header file
The header file looks like this at the moment:
#ifndef Z_EN_RECEPGIRL_H
#define Z_EN_RECEPGIRL_H
#include "global.h"
struct EnRecepgirl;
typedef void (*EnRecepgirlActionFunc)(struct EnRecepgirl*, GlobalContext*);
typedef struct EnRecepgirl {
/* 0x0000 */ Actor actor;
/* 0x0144 */ char unk_144[0x164];
/* 0x02A8 */ EnRecepgirlActionFunc actionFunc;
/* 0x02AC */ char unk_2AC[0x8];
} EnRecepgirl; // size = 0x2B4
extern const ActorInit En_Recepgirl_InitVars;
#endif // Z_EN_RECEPGIRL_H
The struct currently contains a variable that is the Actor
struct, which all actors use one way or another, plus other items. Currently we don't know what most of those items are, so we have arrays of chars as padding instead, just so the struct is the right size. As we understand the actor better, we will be able to gradually replace this padding with the actual variables that the actor uses.
The header file is also used to declare structs and other information about the actor that is needed by other files (e.g. by other actors): one can simply #include
the header rather than extern
ing it.
Order of decompilation
The general rule for order of decompilation is
- Start with
Init
, because it usually contains the most information about the structure of the actor. You can also doDestroy
, which is generally simpler thanInit
. - Next, decompile any other functions from the actor you have found in
Init
. You generally start with the action functions, because they return nothing and all take the same arguments,
void func_80whatever(EnRecepgirl* this, GlobalContext* globalCtx);
-
Decompile each action function in turn until you run out. Along the way, do any other functions in the actor for which you have discovered the argument types. (You are probably better doing depth-first on action functions than breadth-first: it's normally easier to follow along one branch of the actions than be thinking about several at once.)
-
After you've run out, do
Update
. This usually provides the rest of the function tree, apart from possibly some draw functions. -
Finally, do the draw functions.
The above is a rough ordering for the beginner. As you become more experienced, you can deviate from this scheme, but the general principle remains that you should work on functions that you already know something about. (This is why it's good to start on actors: they are self-contained, we already know a lot about some of the functions, and the function flow tends to be both logical and provide information about every function.)
Data
Associated to each actor is a .data
file, containing data that the actor uses. This ranges from spawn positions, to animation information, to even assets that we have to extract from the ROM. Since the structure of the data is very inconsistent between actors, automatic importing has been very limited, so the vast majority must be done manually.
There are two ways of transfering the data into an actor: we can either
- import it all naively as words (
s32
s), which will still allow it to compile, and sort out the actual types later, or - we can extern each piece of data as we come across it, and come back to it later when we have a better idea of what it is.
We will concentrate on the second here; the other is covered in the document about data. Thankfully this means we essentially don't have to do anything to the data yet. Nevertheless, it is often quite helpful to copy over at least some of the data and leave it commented out for later replacement. Data must go in the same order as in the data file, and data is "all or nothing": you cannot only import some of it.
WARNING The way in which the data was extracted from the ROM means that there are sometimes "fake symbols" in the data, which have to be removed to avoid confusing the compiler. Thankfully it will turn out that this is not the case here.
(Sometimes it is useful to import the data in the middle of doing functions: you just have to choose an appropriate moment.)
Some actors also have a .bss
file. This is just data that is initialised to 0, and can be imported immediately once you know what type it is, by declaring it without giving it a value. (bss is a significant problem for code files, but not usually for actors.)
Init
The Init function sets up the various components of the actor when it is first loaded. It is hence usually very useful for finding out what is in the actor struct, and so we usually start with it. (Some people like starting with Destroy, which is usually shorter and simpler, but gives some basic information about the actor, but Init is probably best for beginners.)
mips2c
The first stage of decompilation is done by a program called mips_to_c, often referred to as mips2c, which constructs a C interpretation of the assembly code based on reading it very literally. This means that considerable cleanup will be required to turn it into something that firstly compiles at all, and secondly looks like a human wrote it, let alone a Zelda developer from the late '90s.
The web version of mips2c can be found here. This was covered in the OoT tutorial. We shall instead use the repository. Clone the mips_to_c repository into a separate directory (we will assume on the same level as the mm/
directory). Since it's Python, we don't have to do any compilation or anything in the mips_to_c directory.
Since the actor depends on the rest of the codebase, we can't expect to get much intelligible out of mips2c without giving it some context. We make this using a Python script in the tools
directory called m2ctx.py
, so run
$ ./tools/m2ctx.py <path_to_c_file>
from the main directory of the repository. In this case, the C file is src/overlays/actors/ovl_En_Recepgirl/z_en_recepgirl.c
. This generates a file called ctx.c
in the main directory of the repository.
To get mips_to_c to decompile a function, the bare minimum is to run
$ ../mips_to_c/mips_to_c.py <path_to_function_assembly_file>
(from the root directory of mm
). We can tell mips2c to use the context file we just generated by adding --context ctx.c
. If we have data, mips2c may be able to assist with that as well.
In this case, we want the assembly file for EnRecepgirl_Init
. You can copy the path to the file in VSCode or similar, or just tab-complete it once you know the directory structure well enough: it turns out to be asm/non_matchings/overlays/ovl_En_Recepgirl/EnRecepgirl_Init.s
.
N.B. You want the file in nonmatchings
! the files in the other directories in asm/
are the unsplit asm, which can be used, but is less convenient (you would need to include the rodata, for example, and it will do the whole file at once. This is sometimes useful, but we'll go one function at a time today to keep things simple).
We shall also include the data file, which is located at data/overlays/ovl_En_Recepgirl/ovl_En_Recepgirl.data.s
. Hence the whole command will be
$ ../mips_to_c/mips_to_c.py asm/non_matchings/overlays/ovl_En_Recepgirl/EnRecepgirl_Init.s data/ovl_En_Recepgirl/ovl_En_Recepgirl.data.s --context ctx.c
? func_80C10148(EnRecepgirl *); // extern
extern FlexSkeletonHeader D_06011B60;
static void *D_80C106B0[4] = {(void *)0x600F8F0, (void *)0x600FCF0, (void *)0x60100F0, (void *)0x600FCF0};
static s32 D_80C106C8 = 0;
InitChainEntry D_80C106C0[2]; // unable to generate initializer
void EnRecepgirl_Init(EnRecepgirl *this, GlobalContext *globalCtx) {
EnRecepgirl* this = (EnRecepgirl *) thisx;
void **temp_s0;
void **phi_s0;
Actor_ProcessInitChain((Actor *) this, D_80C106C0);
ActorShape_Init(&this->actor.shape, -60.0f, NULL, 0.0f);
SkelAnime_InitFlex(globalCtx, (SkelAnime *) this->unk_144, &D_06011B60, (AnimationHeader *) &D_06009890, this + 0x188, this + 0x218, 0x18);
phi_s0 = D_80C106B0;
if (D_80C106C8 == 0) {
do {
temp_s0 = phi_s0 + 4;
temp_s0->unk-4 = Lib_SegmentedToVirtual(*phi_s0);
phi_s0 = temp_s0;
} while (temp_s0 != D_80C106C0);
D_80C106C8 = 1;
}
this->unk_2AC = 2;
if (Flags_GetSwitch(globalCtx, (s32) this->actor.params) != 0) {
this->actor.textId = 0x2ADC;
} else {
this->actor.textId = 0x2AD9;
}
func_80C10148(this);
}
Comment out the GLOBAL_ASM
line for Init
, and paste all of this into the file just underneath it:
[...]
// #pragma GLOBAL_ASM("asm/non_matchings/overlays/ovl_En_Recepgirl/EnRecepgirl_Init.s")
? func_80C10148(EnRecepgirl *); // extern
extern FlexSkeletonHeader D_06011B60;
static void *D_80C106B0[4] = {(void *)0x600F8F0, (void *)0x600FCF0, (void *)0x60100F0, (void *)0x600FCF0};
static s32 D_80C106C8 = 0;
InitChainEntry D_80C106C0[2]; // unable to generate initializer
void EnRecepgirl_Init(Actor *thisx, GlobalContext *globalCtx) {
EnRecepgirl* this = (EnRecepgirl *) thisx;
void **temp_s0;
void **phi_s0;
Actor_ProcessInitChain((Actor *) this, D_80C106C0);
ActorShape_Init(&this->actor.shape, -60.0f, NULL, 0.0f);
SkelAnime_InitFlex(globalCtx, (SkelAnime *) this->unk_144, &D_06011B60, (AnimationHeader *) &D_06009890, this + 0x188, this + 0x218, 0x18);
phi_s0 = D_80C106B0;
if (D_80C106C8 == 0) {
do {
temp_s0 = phi_s0 + 4;
temp_s0->unk-4 = Lib_SegmentedToVirtual(*phi_s0);
phi_s0 = temp_s0;
} while (temp_s0 != D_80C106C0);
D_80C106C8 = 1;
}
this->unk_2AC = 2;
if (Flags_GetSwitch(globalCtx, (s32) this->actor.params) != 0) {
this->actor.textId = 0x2ADC;
} else {
this->actor.textId = 0x2AD9;
}
func_80C10148(this);
}
[...]
Typically for all but the simplest functions, there is a lot that needs fixing before we are anywhere near seeing how close we are to the original code. You will notice that mips2c creates a lot of temporary variables. Usually most of these will turn out to not be real, and we need to remove the right ones to get the code to match.
To allow the function to find the variables, we need another correction. Half of this has already been done at the top of the file, where we have
#define THIS ((EnRecepgirl*)thisx)
To do the other half, replace the recast at the beginning of the function, before any declarations:
EnRecepgirl* this = THIS;
Now everything points to the right place, even though the argument of the function seems inconsistent with the contents.
(Again: this step is only necessary for the "main four" functions, and sometimes functions that are used by these: it relates to how such functions are used outside the actor.)
While we are carrying out initial changes, you can also find-and-replace any instances of (Actor *) this
by &this->actor
. The function now looks like this:
? func_80C10148(EnRecepgirl *); // extern
extern FlexSkeletonHeader D_06011B60;
static void *D_80C106B0[4] = {(void *)0x600F8F0, (void *)0x600FCF0, (void *)0x60100F0, (void *)0x600FCF0};
static s32 D_80C106C8 = 0;
InitChainEntry D_80C106C0[2]; // unable to generate initializer
void EnRecepgirl_Init(Actor *thisx, GlobalContext *globalCtx) {
EnRecepgirl* this = THIS;
void **temp_s0;
void **phi_s0;
Actor_ProcessInitChain(&this->actor, D_80C106C0);
ActorShape_Init(&this->actor.shape, -60.0f, NULL, 0.0f);
SkelAnime_InitFlex(globalCtx, (SkelAnime *) this->unk_144, &D_06011B60, (AnimationHeader *) &D_06009890, this + 0x188, this + 0x218, 0x18);
phi_s0 = D_80C106B0;
if (D_80C106C8 == 0) {
do {
temp_s0 = phi_s0 + 4;
temp_s0->unk-4 = Lib_SegmentedToVirtual(*phi_s0);
phi_s0 = temp_s0;
} while (temp_s0 != D_80C106C0);
D_80C106C8 = 1;
}
this->unk_2AC = 2;
if (Flags_GetSwitch(globalCtx, (s32) this->actor.params) != 0) {
this->actor.textId = 0x2ADC;
} else {
this->actor.textId = 0x2AD9;
}
func_80C10148(this);
}
(Not) dealing with Data
For now, we do not want to consider the data that mips2c has kindly imported for us: it will only get in the way when we want to rebuild the file to check for OK (diff.py
will not care, but make
will complain if it notices a symbol defined twice, and if some data is included twice the ROM will not match anyway). Therefore, put it in the #if
'd out section and add some externs with the types:
#if 0
const ActorInit En_Recepgirl_InitVars = {
ACTOR_EN_RECEPGIRL,
ACTORCAT_NPC,
FLAGS,
OBJECT_BG,
sizeof(EnRecepgirl),
(ActorFunc)EnRecepgirl_Init,
(ActorFunc)EnRecepgirl_Destroy,
(ActorFunc)EnRecepgirl_Update,
(ActorFunc)EnRecepgirl_Draw,
};
static void* D_80C106B0[4] = { (void*)0x600F8F0, (void*)0x600FCF0, (void*)0x60100F0, (void*)0x600FCF0 };
// static InitChainEntry sInitChain[] = {
static InitChainEntry D_80C106C0[] = {
ICHAIN_U8(targetMode, 6, ICHAIN_CONTINUE),
ICHAIN_F32(targetArrowOffset, 1000, ICHAIN_STOP),
};
static s32 D_80C106C8 = 0;
#endif
extern void* D_80C106B0[];
extern InitChainEntry D_80C106C0[];
extern s32 D_80C106C8;
N.B. As is covered in more detail in the document about data, the data must be declared in the same order in C as it was in the data assembly file: notice that the order in this example is En_Recepgirl_InitVars
, D_80C106B0
, D_80C106C0
, D_80C106C8
, the same as in data/ovl_En_Recepgirl/ovl_En_Recepgirl.data.s
.
In the next sections, we shall sort out the various initialisation functions that occur in Init. This actor contains several of the most common ones, but it does not have, for example, a collider. The process is similar to what we discuss below, or you can check the OoT tutorial.
Init chains
Almost always, one of the first items in Init
is a function that looks like
Actor_ProcessInitChain(&this->actor, D_80C106C0);
which initialises common properties of actor using an InitChain, which is usually somewhere near the top of the data, in this case in the variable D_80C106C0
. This is already included in the #if
'd out data at the top if the file, so we don't have to do anything for now. We can correct the mips2c output for the extern, though: I actually did this when moving the rest of the data in the previous section.
SkelAnime
This is the combined system that handles actors' skeletons and their animations. It is the other significant part of most actor structs. We see its initialisation in this part of the code:
Actor_ProcessInitChain(&this->actor, D_80C106C0);
ActorShape_Init(&this->actor.shape, -60.0f, NULL, 0.0f);
SkelAnime_InitFlex(globalCtx, (SkelAnime *) this->unk_144, &D_06011B60, (AnimationHeader *) &D_06009890, this + 0x188, this + 0x218, 0x18);
phi_s0 = D_80C106B0;
An actor with SkelAnime has three structs in the Actor struct that handle it: one called SkelAnime, and two arrays of Vec3s
, called jointTable
and morphTable
. Usually, although not always, they are next to one another.
There are two different sorts of SkelAnime, although for decompilation purposes there is not much difference between them. Looking at the prototype of SkelAnime_InitFlex
from functions.h
(or even the definition in z_skelanime.c
),
void SkelAnime_InitFlex(GlobalContext* globalCtx, SkelAnime* skelAnime, FlexSkeletonHeader* skeletonHeaderSeg,
AnimationHeader* animationSeg, Vec3s* jointTable, Vec3s* morphTable, s32 limbCount);
we can read off the types of the various arguments:
- The
SkelAnime
struct is atthis + 0x144
- The
jointTable
is atthis + 0x188
- The
morphTable
is atthis + 0x218
- The number of limbs is
0x18 = 24
(we use dec for the number of limbs) - Because of how SkelAnime works, this means that the
jointTable
andmorphTable
both have24
elements
Looking in z64animation.h
, we find that SkelAnime
has size 0x44
, and looking in z64math.h
, that Vec3s
has size 0x6
. Since 0x144 + 0x44 = 0x188
, jointTable
is immediately after the SkelAnime
, and since 0x188 + 0x6 * 0x18 = 0x218
, morphTable
is immediately after the jointTable
. Finally, 0x218 + 0x6 * 0x18 = 0x2A8
, and we have filled all the space between the actor
and actionFunc
. Therefore the struct now looks like
typedef struct EnRecepgirl {
/* 0x0000 */ Actor actor;
/* 0x0144 */ SkelAnime skelAnime;
/* 0x0188 */ Vec3s jointTable[24];
/* 0x0218 */ Vec3s morphTable[24];
/* 0x02A8 */ EnRecepgirlActionFunc actionFunc;
/* 0x02AC */ char unk_2AC[0x8];
} EnRecepgirl; // size = 0x2B4
The last information we get from the SkelAnime function is the types of two of the externed symbols: D_06011B60
is a FlexSkeletonHeader
, and D_06009890
is an AnimationHeader
. So we can change/add these at the top of the C file:
extern InitChainEntry D_80C106C0[];
extern UNK_TYPE D_06001384;
extern AnimationHeader D_06009890;
extern UNK_TYPE D_0600A280;
extern FlexSkeletonHeader D_06011B60;
As with the data, these externed symbols should be kept in increasing address order.
They are both passed to the function as pointers, so need &
to pass the address instead of the actual data. Hence we end up with
SkelAnime_InitFlex(globalCtx, &this->skelAnime, &D_06011B60, &D_06009890, this->jointTable, this->morphTable, 24);
note that this->jointTable
and this->morphTable
are arrays, so are already effectively pointers and don't need a &
.
More struct variables: a brief detour into reading some assembly
This function also gives us information about other things in the struct. The only other reference to this
(rather than this->actor
or similar) is in
this->unk_2AC = 2;
This doesn't tell us much except that at this + 0x2AC
is a number of some kind. What sort of number? For that we will have to look in the assembly code. This will probably look quite intimidating the first time, but it's usually not too bad if you use functions as signposts: IDO will never change the order of function calls, and tends to keep code between functions in roughly the same place, so you can usually guess where you are.
In this case, we are looking for this + 0x2AC
. 0x2AC
is not a very common number, so hopefully the only mention of it is in referring to this struct variable. Indeed, if we search the file, we find that the only instruction mentioning 0x2AC
is here:
/* 0000B0 80C10080 24090002 */ addiu $t1, $zero, 2
/* 0000B4 80C10084 A24902AC */ sb $t1, 0x2ac($s2)
addiu
("add unsigned immediate") adds the last two things and puts the result in the register in the first position. So this says $t1 = 0 + 2
. The next instruction, sb
("store byte") puts the value in the register in the first position in the memory location in the second, which in this case says $s2 + 0x2ac = $t1
. We can go and find out what is in $s2
is: it is set all the way at the top of the function, in this line:
/* 000008 80C0FFD8 00809025 */ move $s2, $a0
This simply copies the contents of the second register into the first one. In this case, it is copying the contents of the function's first argument into $s2
(because it wants to use it later, and the $a
registers are assumed to be cleared after a function call). In this case, the first argument is a pointer to this
(well, thisx
, but the struct starts with an Actor
, so it's the same address). So line B4
of the asm really is saving 2
into the memory location this + 0x2AC
.
Anyway, this tells us that the variable is a byte of some kind, so s8
or u8
: if it was an s16/u16
it would have said sh
, and if it was an s32/u32
it would have said sw
. Unfortunately this is all we can determine from this function: MIPS does not have separate instructions for saving signed and unsigned bytes.
At this point you have two options: guess based on statistics/heuristics, or go and look in the other functions in the actor to find out more information. The useful statistic here is that u8
is far more common than s8
, but let's look in the other functions, since we're pretty confident after finding 0x2ac
so easily in Init
. So, let us grep the actor's assembly folder:
$ grep -r '0x2ac' asm/non_matchings/overlays/ovl_En_Recepgirl/
asm/non_matchings/overlays/ovl_En_Recepgirl/EnRecepgirl_Draw.s:/* 00065C 80C1062C 921902AC */ lbu $t9, 0x2ac($s0)
asm/non_matchings/overlays/ovl_En_Recepgirl/func_80C100DC.s:/* 000114 80C100E4 908202AC */ lbu $v0, 0x2ac($a0)
asm/non_matchings/overlays/ovl_En_Recepgirl/func_80C100DC.s:/* 00012C 80C100FC A08E02AC */ sb $t6, 0x2ac($a0)
asm/non_matchings/overlays/ovl_En_Recepgirl/func_80C100DC.s:/* 000134 80C10104 A08002AC */ sb $zero, 0x2ac($a0)
asm/non_matchings/overlays/ovl_En_Recepgirl/func_80C100DC.s:/* 00015C 80C1012C 909802AC */ lbu $t8, 0x2ac($a0)
asm/non_matchings/overlays/ovl_En_Recepgirl/func_80C100DC.s:/* 000164 80C10134 A09902AC */ sb $t9, 0x2ac($a0)
asm/non_matchings/overlays/ovl_En_Recepgirl/EnRecepgirl_Init.s:/* 0000B4 80C10084 A24902AC */ sb $t1, 0x2ac($s2)
in which we clearly see lbu
("load byte unsigned"), and hence this variable really is a u8
. Hence we can add this to the actor struct too:
typedef struct EnRecepgirl {
/* 0x0000 */ Actor actor;
/* 0x0144 */ SkelAnime skelAnime;
/* 0x0188 */ Vec3s jointTable[24];
/* 0x0218 */ Vec3s morphTable[24];
/* 0x02A8 */ EnRecepgirlActionFunc actionFunc;
/* 0x02AC */ u8 unk_2AC;
/* 0x02AD */ char unk_2AD[0x7];
} EnRecepgirl; // size = 0x2B4
You might think that was a lot of work for one variable, but it's pretty quick when you know what to do. Obviously this would be more difficult with a more common number, but it's often still worth trying.
Removing some of the declarations for data that we have accounted for, the function now looks like this:
? func_80C10148(EnRecepgirl *); // extern
void EnRecepgirl_Init(Actor *thisx, GlobalContext *globalCtx) {
EnRecepgirl* this = THIS;
void **temp_s0;
void **phi_s0;
Actor_ProcessInitChain(&this->actor, D_80C106C0);
ActorShape_Init(&this->actor.shape, -60.0f, NULL, 0.0f);
SkelAnime_InitFlex(globalCtx, &this->skelAnime, &D_06011B60, &D_06009890, this->jointTable, this->morphTable, 24);
phi_s0 = D_80C106B0;
if (D_80C106C8 == 0) {
do {
temp_s0 = phi_s0 + 4;
temp_s0->unk-4 = Lib_SegmentedToVirtual(*phi_s0);
phi_s0 = temp_s0;
} while (temp_s0 != D_80C106C0);
D_80C106C8 = 1;
}
this->unk_2AC = 2;
if (Flags_GetSwitch(globalCtx, (s32) this->actor.params) != 0) {
this->actor.textId = 0x2ADC;
} else {
this->actor.textId = 0x2AD9;
}
func_80C10148(this);
}
We have one significant problem and a few minor ones left.
Casts and boolean functions
mips2c likes casting a lot: this is useful for getting types, less so when the type is changed automatically, such as in Flags_GetSwitch(globalCtx, (s32) this->actor.params)
. Also, if we look at this function's definition, we discover it will only return true
or false
, so we can remove the != 0
.
Functions called
One minor problem is what func_80C10148
is: C needs a prototype to compile it properly. mips2c has offered us ? func_80C10148(EnRecepgirl *); // extern
, but this is obviously incomplete: there's no ?
type in C! We shall guess for now that this function returns void
, for two reasons:
- It's not used as a condition in a conditional or anything
- It's not used to assign a value
To this experience will add a third reason:
3. This is probably a setup function for an actionFunc, which are usually either void (*)(ActorType*)
or void (*)(ActorType*, GlobalContext*)
.
The upshot of all this is to remove mips2c's ? func_80C10148(EnRecepgirl *); // extern
, and add a void func_80C10148(EnRecepgirl* this);
underneath the declarations for the main four functions:
void EnRecepgirl_Init(Actor* thisx, GlobalContext* globalCtx);
void EnRecepgirl_Destroy(Actor* thisx, GlobalContext* globalCtx);
void EnRecepgirl_Update(Actor* thisx, GlobalContext* globalCtx);
void EnRecepgirl_Draw(Actor* thisx, GlobalContext* globalCtx);
void func_80C10148(EnRecepgirl* this);
(we usually leave a blank line after the main four, and put all further declarations in address order).
Loops
Loops are often some of the hardest things to decompile, because there are many ways to write a loop, only some of which will generate the same assembly. mips2c has had a go at the one in this function, but it usually struggles with loops: don't expect it to get a loop correct, well, at all.
The code in question is
void **temp_s0;
void **phi_s0;
[...]
phi_s0 = D_80C106B0;
if (D_80C106C8 == 0) {
do {
temp_s0 = phi_s0 + 4;
temp_s0->unk-4 = Lib_SegmentedToVirtual(*phi_s0);
phi_s0 = temp_s0;
} while (temp_s0 != D_80C106C0);
D_80C106C8 = 1;
}
D_80C106B0
is the array that mips2c has declared above the function, a set of 8-digit hex numbers starting 0x06
. These are likely to be segmented pointers, but this is not a very useful piece of information yet. D_80C106C0
is the InitChain, though, and it seems pretty unlikely that it would be seriously involved in any sort of loop. Indeed, if you tried to compile this now, you would get an error:
cfe: Error: src/overlays/actors/ovl_En_Recepgirl/z_en_recepgirl.c, line 61: Unacceptable operand of == or !=
} while (temp_s0 != D_80C106C0);
-------------------------^
so this can't possibly be right.
So what on earth is this loop doing? Probably the best thing to do is manually unroll it and see what it's doing each time.
phi_s0 = D_80C106B0
, aka&D_80C106B0[0]
, totemp_s0 = D_80C106B0 + 4
, i.e.&D_80C106B0[1]
. But thentemp_s0->unk-4
is 4 backwards from&D_80C106B0[1]
, which is back at&D_80C106B0[0]
; the->
means to look at what is at this address, sotemp_s0->unk-4
isD_80C106B0[0]
. Equally,*phi_s0
is the thing at&D_80C106B0[0]
, i.e.D_80C106B0[0]
. So the actual thing the first pass does is
D_80C106B0[0] = Lib_SegmentedToVirtual(D_80C106B0[0]);
it then proceeds to set phi_s0 = &D_80C106B0[1]
for the next iteration.
- We go through the same reasoning and find the inside of the loop is
temp_s0 = &D_80C106B0[2];
D_80C106B0[1] = Lib_SegmentedToVirtual(D_80C106B0[1]);
phi_s0 = &D_80C106B0[2];
temp_s0 = &D_80C106B0[3];
D_80C106B0[2] = Lib_SegmentedToVirtual(D_80C106B0[2]);
phi_s0 = &D_80C106B0[3];
temp_s0 = &D_80C106B0[4];
D_80C106B0[3] = Lib_SegmentedToVirtual(D_80C106B0[3]);
phi_s0 = &D_80C106B0[4];
But now, &D_80C106B0[4] = D_80C106B0 + 4 * 4 = D_80C106B0 + 0x10
, and 0x10
after this array's starting address is D_80C106C0
, i.e. the InitChhain. Hence at this point the looping ends.
So what this loop actually does is run Lib_SegmentedToVirtual
on each element of the array D_80C106B0
.
At this point, I confess that I guessed what this loop does, and rewrote it how I would have written it, namely how one usually iterates over an array:
s32 i;
[...]
for (i = 0; i < 4; i++) {
D_80C106B0[i] = Lib_SegmentedToVirtual(D_80C106B0[i]);
}
This is a dangerous game, since there is no guarantee that what you think is the right way to write something bears any relation to either what the original was like, or more importantly, what will give the same codegen as the original. This is a significant leap, since the original appears to be using a pointer iterator!
However, this is certainly at least equivalent to the original (or at least, to what mips2c gave us: it's not infallible): we can be certain of this because we wrote the thing out in its entirety to understand it! This also allows us to eliminate one of the temps: you'll find with even simple loops mips2c will usually make two temps for the loop variable.
Hence we end up with
void func_80C10148(EnRecepgirl* this);
[...]
void EnRecepgirl_Init(Actor *thisx, GlobalContext *globalCtx) {
EnRecepgirl* this = THIS;
Actor_ProcessInitChain(&this->actor, D_80C106C0);
ActorShape_Init(&this->actor.shape, -60.0f, NULL, 0.0f);
SkelAnime_InitFlex(globalCtx, &this->skelAnime, &D_06011B60, &D_06009890, this->jointTable, this->morphTable, 24);
if (D_80C106C8 == 0) {
for (i = 0; i < 4; i++) {
D_80C106B0[i] = Lib_SegmentedToVirtual(D_80C106B0[i]);
}
D_80C106C8 = 1;
}
this->unk_2AC = 2;
if (Flags_GetSwitch(globalCtx, this->actor.params)) {
this->actor.textId = 0x2ADC;
} else {
this->actor.textId = 0x2AD9;
}
func_80C10148(this);
}
as our first guess. This doesn't look unreasonable... the question is, does it match?
Diff
Once preliminary cleanup and struct filling is done, most time spent matching functions is done by comparing the original code with the code you have compiled. This is aided by a program called diff.py
.
In order to use diff.py
with the symbol names, we need a copy of the code to compare against. In MM this is done as part of make init
, and you can regenerate the expected
directory (which is simply a known-good copy of build
directory) by running make diff-init
, which will check for an OK ROM and copy the build directory over. (Of course you need an OK ROM to do this; worst-case, you can checkout master and do a complete rebuild to get it). (You need to remake expected
if you want to diff a function you have renamed: diff.py
looks in the mapfiles for the function name, which won't work if the name has changed!)
Now, we run diff on the function name: in the main directory,
$ ./diff.py -mwo3 EnRecepgirl_Init
(To see what these arguments do, run it with ./diff.py -h
or look in the scripts documentation.)
And err, well, everything is white, so it matches. Whoops. Guess we'll cover diff.py
properly next time! (Notice that even though the diff is completely white, there are some differences in the %hi
s and %lo
s that access data, because it is now accessed with a relative address rather than an absolute one. If you have the data in the file in the right order, this shouldn't matter.)
And with that, we have successfully matched our first function.
N.B Notice that we don't yet have much idea of what this code actually does: this should be clarified by going through the rest of the actor's functions, which is discussed in the next document.