Using the -sf switch

Boomerang is still very immature. When decompiling MSVC compiled programs, we find we need to work on one procedure at a time. For this, you really need -nm (don't decompile "main") and -e <entry> to decompile from a given entry point. (<entry> can be given as an integer or in hex; "0x" is not implied.)

But this still decompiles callees of the current procedure. A few callees is OK, but if the target procedure happens to have a lot of callees, this will slow down the decompilation a lot. So we use the -sf <filename> switch to use a symbol file. In this file, you can have entries like

0x00430D10 __nodecode __incomplete void recognise();

which says that at address 0x00430d10 there is a procedure called "recognise", and don't attempt to decompile it. The __incomplete means that the signature (the set of parameter and return types) is not yet known.

If the signature is known, you can give it, like this:

0x00430530 __nodecode void scaleLines(int line, void* mem);

You can do the same for library functions:

0x00450E1E __nodecode int  CString_find(char *str);

Here the address 0x00450E1E is the address of the import table entry (a jump instruction). Note that Boomerand can't handle :: in the name as yet, so we just use an underline to separate the class and function name. You have to fix this manually with an editor after decompiling. (There are likely to be several other things that need manual editing as well.)

Now, you may know that MSVC uses the "thiscall" calling convention a lot, where C++ procedures pass the hidden "this" parameter in the ecx register. In Boomerang (so far), no parameters are hidden. We don't fully support thiscall yet, but in the meantime, you can use this syntax:

0x00447570 __nodecode __custom  __withstack(28) void
 toolbarRecord(r[25]: CWavePreview* this);

The __withstack(28) says that the stack pointer is r28; as far as I can tell, this was only ever needed for __custom functions that return void, and is not needed any more.

When you use __custom, every parameter has to be preceeded by an expression and a colon. r[25] represents register ecx, r[28] represents esp. The first stack parameter is m[r[28]+4], the second m[r[24]+8] and so on. For example:

0x00450BD2 __nodecode __custom void
CWnd_MoveWindow(r[25]: void* this, m[r[28] + 4]: int x,
m[r[28] + 8]: int y,
m[r[28] + 12]: int nWidth,
m[r[28] + 16]: int nHeight,
m[r[28] + 20]: int bRepaint);

Painful, isn't it? This will be a lot easier when we handle __thiscall.

Here is an example that returns a non void:

0x004503AA __nodecode __custom int
CFrameWnd_OnBarCheck(r[25]: void* this,
m[r[28] + 4]: int a);

You can also enter scructs and use these in signatures. For example:

typedef struct {
    void    *vt;
    char    filler[28];
    int     hWnd;
} CWnd;

typedef struct {
    CWnd    wnd;
    char    field_24[28];
    CWaveDoc* pDoc;
} CView;

typedef struct {
    CView   vw;
    char    field_44[24];
    int     bShowLine1;
    int     changeView;
} CFooView;

Note the class hierarchy here. You can paste these from a .h file; Boomerang has a C parser that's not too bad.

So then signatures can have real types, like CDC*:

0x004224B0  __cdecl void CRealTimeLine_PlotCaptions(
    CDC *pDC,
    /*POINT ptOrigin*/ int ptOrigin_x, int ptOrigin_y,
    /*SIZE sizePixelsPerTick*/ int sizePixelsPerTick_cx,
                               int sizePixelsPerTick_cy,
    int nValue,
    int bVert,
    int index);

One day we hope to be able to handle multi-word parameters, such as POINTs and SIZEs. Those get quite tricky to handle.

You can also declare globals:

/* Globals */
0x00472308 void* parBuf;
0x0047231C int x;
0x00472320 int y;
0x004721DA int bGraph1;
0x0046C598 int *validPlotGraduations;

The last one is actually an array of ints.

Of course, structures can contain floats, doubles, shorts, etc:

typedef struct {
    short nmix;
    mixture mix[3];
    float   meanSegLen;
} snape;

Here, mix is an array of 3 structures (struct mixture), and there is a float and a short. Yes, this program had unaligned structure members.

As well as __cdecl and __custom, you can use __pascal (for the Pascal calling convention, where parameters are popped off by the callee), __stdcall (synonym for __pascal).

So you could well end up with a lot of command line parameters, e.g.

./boomerang \
-sf mydll/symbols.h \
-nm -nR -dl -ds -v -o output mybinary/FOO.EXE

You don't need the -e switch when you have a -sf switch. In symbols.h, almost every function will have __nodecode on it, except the ones you want to decompile this run. Obviously, a script (or .bat) file helps to save typing. Even doing one procedure at a time, on a decent PC, it can still take several minutes to decompile a typical function.

Last modified: 19/Feb/2004: created.