SourceForge.net Logo

News from 2004-2002

This page retains some older "news", for interest and/or the historical record.

01/Dec/04: The What Boomerang can do page has been revamped.

22/Nov/04: Boomerang featured in a paper presented to the 2004 Working Conference on Reverse Engineering on the 10th November. See What Boomerang has done for more.

24/Sep/04: Some significant changes have recently been booked in. In particular, the propensity to propagate too much has been curbed with a heuristic. See the What Boomerang can do page (last example) for the cleaner output.

17/Jul/04: A binary release of version 0.1 alpha has been made, for both Linux/X86 and Windows.

7/Jul/04: Most of the Sparc test programs are working again, with the current exception of recursive tests such as Fibonacci.

28/Jun/04: Boomerang now requires the eXpat XML parser; this will be used for persistance of partial decompilations. See Making Boomerang for details.

18/Apr/04:
Added a link to the what Boomerang can do page. As you can see, Boomerang can now handle arrays.

18/Feb/04: We now have a page on using the -sf switch.

31/Jan/04: Anatoli Koutsevol has submitted a patch which permits Boomerang to compile in Visual C++ 7. During this time anonymous CVS access has been horribly broken and we apologise for the inconvenience. Other MSVC compilers have project and workspace files contributed, but there are serious STL issues; see Making Boomerang.

15/Jan/04: There was furious development during the consultation project (see below); many bugs were fixed, and improvements were made. Boomerang is still not ready for serious work, but it was helpful with the project. Boomerang now has the ability to handle structure members, using information passed in a file (-s switch).

During the project, unit and functional testing was neglected. Unit testing is back now and works properly. Functional testing has been revamped so that a given executable is tested by decompiling it, then recompiling it and comparing the text result to what is expected (simply comparing code generated by the decompiler against previous decompiler output failed for lots of annoying reasons, e.g. the name of a local variable changed). The horrible old switch code is gone, replaced by a propagation-based algorithm. The new algorithm doesn't handle as many odd cases as the old code did, but it works for most cases, and is at least maintainable.

Most of the SPARC handling is broken, so 90% of the SPARC functional tests fail. This will have to wait till time is available.

24/Nov/03:
The main authors of Boomerang have been using it for some consulting work. The clients already had some source code, but for an earlier version of their product. (If this was not the case, Boomerang would certainly not be ready for such real-world code). As a result, Boomerang has had many improvements. Booked in Makefile for the qtgui (see Making Boomerang for more details).

1 Oct 03:
We now use and require the Boehm garbage collector. Over time, we will remove code for freeing objects.

12/Aug/03: Boomerang is working well enough now to correctly decompile most of the simple test programs in the test/ directory (including the frustrating recursive fibonacci programs). To those that have been waiting for Boomerang to settle down and become usable again, the long wait is over. It's still being changed rapidly, but the design should remain fairly stable now. There is still a long way to go before Boomerang is useful for real-world programs.

5/Aug/03: The combination of global dataflow analysis and SSA didn't work out. (That paper was not accepted.) We've decided that SSA by itself has enough power to do what we need, at least in terms of dataflow analysis. And it doesn't need global analysis, saving memory requirements. The old dataflow code is gone, as is the "implicit" SSA, replaced by more standard SSA code (using dominance frontiers and all that). The global optimisation (see comments for 30/May) therefore no longer happens.
There has also been a redesign. The multiple inheritance (e.g. HLCall from Statement and RTL) has gone. Now, an RTL is a list of Statements (previously, a list of expressions (class Exp)). Assignments are no longer expressions, but statements. This has cleaned up a lot of code that iterates through statements. A lot of old commented out code has been removed as well.
We also have a theorem prover now. This is powerful enough to prove whether a register is saved, even in the presence of recursion.
It is expected that parameters and return location(s) will be working fairly well soon. BoolStatements (e.g. created from the Pentium setz instruction) work now.

30/May/03: There has been a lot of development behind the scenes. Boomerang actually works slightly worse than it did in February, but that's just temporary while we experiment with the best design. The latest idea is to combine global dataflow analysis with Static Single Assignment (SSA) form. We've come across an interesting way to represent SSA implicitly. It's so neat that we're writing a paper on its implementation.

In the meantime, Boomerang is not all that usable. One interesting result is for the SPARC twoproc program (see below for source).

void main()
    proc1();
    printf("%i\n", 7);
    return ;
}
int proc1()
    return %o0;
}
As you can see, return locations are broken; they are not a priority at present. Parameters are also gone, but not for long. Note how the global dataflow has actually propagated the entire semantics for proc1() into the printf statement in main()! Some may argue that this is not a "faithful" decompilation. This shows a fundamental misunderstanding of our goals and therefore needs to be addressed. It is often stated that the goal of a decompiler is to reconstruct the original source code of the program that is compiled into a given binary. This may well be the case, but it is not the goal of this project. We are interested in finding the simplest possible program that has equivilent functionality to a given binary. In doing this we ignore things that are not explicitly represented in the high level language, like runtime and memory usage. The above is an example of this fundamental difference of opinion on the goals of a decompiler. Here's another one. So let's hear no more about what the original program did. Who cares? What's important is that the decompiler has best utilized the semantics of the output language to present the simplest possible program. 3/February/03: The very slow dataflow is now just somewhat slow; speeded up by at least an order of magnitude. With a couple of other changes, it can now translate twoproc.c (pentium or sparc):
int proc1(int a, int b) {
    return a + b;
}
int main() {
    printf("%i\n", proc1(3, 4));
}

to this:

int main() {
int local0;
    local0 = proc1(3, 4) ;
    local0 = printf("%i\n", local0) ;
    return local0;
}
int proc1(int arg1, int arg2) {
    return arg1+arg2;
}

There is a lot of debug output; pipe to less and look at the end.

4/December/02: In response to university being finished for another year, my girlfriend being overseas for a few weeks and me coming down with a flu, I've managed to get some work done. Boomerang can now decompile hello world on both Pentium and Sparc architectures. A detailed account of this achievement is available here. The techniques used are general, therefore, this achievement represents a subset of the possible programs that Boomerang can now decompile. Investigations into branching and multi-procedure programs is next on the agenda.

8/August/02
: New code has been checked in, including a GUI written in MFC. Obviously this is for windows only (use wine, whatever), it's for prototyping purposes and will eventually be replaced/complemented with cross-platform guis. If you check out the new code on unix, please ensure that you ./configure before trying to make as I doubt you are building on the same kind of Sparc box as I am.
8/August/02: A design document outlining the (existing) internal representation has been added. You can look at it here or grab the source here. It was created using the Dia drawing tool. Some other design documents would be nice but they dont exist at present and some of us can only tolerate so much UML before we start banging our heads into our keyboards uncontrollably.

27/June/02
: Nothing more booked in, but there is quiet progress behind the scenes. One of the developers has some GUI code going; unfortunately, with our current lack of knowledge of toolkits, this is Windows only code for now. We may have a sort of console mode decompiler (same features, no GUI) that will compile on Unices.

31/May/02
: UQBT is finally released! (See this page if you are interested). See where the code is at.

17/May/02
: Frustratingly, the delay continues. We have the frontend of UQBT modified and updated for boomerang, and it is decoding instructions now. The next stage is removing source machine features like delay slots.

23/Apr/02
: There is a very little code booked into CVS. Code from UQBT, which should provide a good part of at least the frontend of the decompiler framework, was delayed but should be released shortly.

SourceForge.net Logo

Last modified: 05/Jul/06: Moved 2004 news to here