Frequently Asked Questions1. Isn't decompilation impossible, or at least a waste of time?
2. Isn't decompilation illegal, or at least immoral?
3. How good will the decompiled code be?
4. How easy is it to use? I just give it the binary file, right?
5. Just tell me where to find an automatic decompiler.
6. What about obfuscated binary code?
7. Is decompiling from binaries as easy as decompiling Java or Visual Basic programs?
8. Won't the availability of a good decompiler threaten the software industry?
9. How do I go about contributing to Boomerang?
10. Your anonymous CVS access appears to be broken or anoyingly slow. What can I do?
11. How do I use the fancy -sf switch? What goes into that symbol file?
12. Where do I download the sourcecode/binaries?
13. What's the command to check out a copy of Boomerang using CVS?
14. Why can't I decompile /bin/ls?
15. I've just built Boomerang under Linux. Why does it just segfault all the time?
16. Why do you use that ancient version of flex++ instead of the modern GNU flex++?
17. Why am I suddenly getting link errors after updating (Linux)?
18. Why am I suddenly getting link errors after updating (Windows)?
19. Why do I keep getting some message about "no slash in argv"?
20. What sort of binary files can I decompile with Boomerang?
21. Does Boomerang need files in special directories?
22. Are there other machine code decompilers I can try?
23. Boomerang has been running a long time. Is it in an infinite loop?
24. Why do I get "The application failed to initialize properly (0xc0000022)"?
1. Isn't decompilation impossible, or at least a waste of time?The completely automatic decompilation of arbitrary binary files to equivalent source files in any language is equivalent to the halting problem, so we will never find an algorithm to achieve it. But then, many tasks such as routing printed circuit boards are impossible in general too, but they are so useful that we do them anyway, and put up with some manual tweaking to get a good (not necessaily perfect) result. There are a few of us that believe that a paractical decompiler is possible with the current state of the art. Boomerang is an attempt to prove that assertion.
Note that disassembling a file well enough to be able to assemble the generated file is almost as difficult, and yet disassmeblers are considered useful tools. Very few people use them at present (at least, this is what I believe) to generate assemblable code. One of the reasons for this is that apart from being hard work, the result isn't portable to another architecture. This is where a decompiler is useful: it's just as much work as disassembling, but the output is hopefully portable (with perhaps a bit more effort for each target architecture).
Those that compare decompilation to reconstructing an egg after it has been broken are wrong; a lot of information is lost in compilation (actually, most of it is lost in the assembling and linking processes), but not that much.
See also the Wiki DeCompilation page Is it possible?
2. Isn't decompilation illegal, or at least immoral?This is a big issue. Certainly, decompiling commercial code for other than a few purposes (interoperability, checking for malware, etc) is illegal. The legal situation may have changed in the USA with the Digital Millenium Act, and similar acts in countries that follow the US lead in this. However, there are plenty of legal uses. The three main ones are:
- the recovery of lost source code (some estimates put all lost source code world wide at 5%).
- reengineering of legacy code (where the old code is in a language that is no longer supported on modern architectures, for example). Where there is some source code, source to source translation would almost certainly be preferable.
- Checking for malware (viruses, back doors, etc). In this case, it's used as a glorified disassembler, but the effort of looking through 100 lines of C is presumably less than looking through 500 lines of assembler. Plus, the checker doesn't have to be aware of the details of the source machine.
See also the Wiki: DeCompilation pages Why Decompilation and Legality of Decompilation.
3. How good will the decompiled code be?If you want the original source code back again, then you will certainly be disappointed. The only meaningful variable names, and useful comments, will be ones that you enter yourself, or (in the distant future) perhaps from some nifty analysis that might be able to identify some design patterns from binary code.
So the real answer is "as good as you want it, but you'll have to put in a lot of work to get it to that stage".
4. How easy is it to use? I just give it the binary file, right?For small programs, you may be able to just give the name of a binary file, and out will come a source file in some high level language (initially C, but there may be other choices later). However, there will be no (or minimal) comments, the variable names (depending on how much debugging information is in the input binary) will mostly be meaningless (var1, var2, that sort of thing). Most of the gotos should be gone, though you may find some for loops emitted as while loops, or other details of the control flow structure altered (but still valid).
If you put a lot of effort into it, you should be able to emit well documented code (to your understanding of how it works; it can't think for you), with decent names for variables and functions, structures and arrays as they would have been in the original program, and perhaps even classes recovered.
Depending on what's in the source file, there may be many tricky problems to solve. It may even be that decompiling this particular input file is impossible, or beyond your ability to achieve. For example, a particular register jump or call may fail to be automatically recognised; in that case, the quality of the output depends on your ability to identify to the decompiler where the jump or call might end up at. (This decision is undecidable in general, but a clever person could in principle solve any arbitrarily difficult example).
5. Just tell me where to find an automatic decompiler.A number of people believe that they can just find a free decompiler that will recreate their source code for their binary file, much like they can find a free decompiler for almost any Java program or applet. Well, it just doesn't exist at present. The best you can do at present is to use a good disassembler, and even that is a lot more work than many people realise. See the AutomaticDecompiler wiki page.
However, decompilers are getting better. See also the answer to Q22.
6. What about obfuscated binary code?If the input file is obfuscated (deliberately arranged to make decompilation difficult), then obviously it's harder again. In my opinion (yet to be tested), no obfuscation scheme is impossible to defeat, and in fact I think that any obfuscation will be about as successful as the copy protection schemes of the 1980s (i.e. not very successful at all).
Of course, that's just my opinion. The reality of the matter at present is that Boomerang can't even decompile non-obsfucated programs with a very high degree of success. Trivial obsfucation techniques, like encrypting the whole program and decrypting it in memory, are beyond our intended aims for this project (self modifying code is evil, we don't claim to be able to decompile it and we don't really ever intend to). The more sophisticated code obsfucation techniques, like polymorphic encryptors/decryptors, are a level of magnitude harder. Maybe one day someone will write a "deobsfucator" for binary programs and you'll be able to run that over your binary before you try decompiling it, but it won't be us.
7. Is decompiling from binaries as easy as decompiling Java or Visual Basic programs?No. Java bytecode programs have a lot of information in them, such as the name (including class name) of every method. In fact, about all that is missing in a bytecode program are the comments, and the names (and types) of local variables. Visual Basic programs, for some reason that some regard as suspicious, have even more information in the executable file (more than seems to be needed, for example). There are a number of successful Java and Visual Basic Decompilers, both free and commercial, that do a good job of recovering a usable source code for the program.
Binary files are a totally different situation. Usually, there is no debugging information in them. However, dynamically linked library functions are usually referenced by name, so they are available. Often, library function parameter types are known, so there is more information. Statically linked library functions can sometimes be recognised using pattern matching. Binary files can also include difficult to handle instructions, such as register (or memory indirect) jumps and calls. Overall there is far less information in a binary file; this is why decompilation of binary files has stayed in its infancy for so long.
8. Won't the availability of a good decompiler threaten the software industry?I doubt it very much. First of all, it's a lot of work to decompile a program, and that won't change unless artificial intelligence improves dramatically (and then it would probably be easier to get a computer program to write a new program than to decompile an old one!) Secondly, source code isn't everything. A high level representation of a program isn't the same as understanding that program (well, it may be if you put a very large effort into the decompilation).
9. How do I go about contributing to Boomerang?First, and most obviously, you should check the source code out of cvs, compile it and try using it for a while. I'm sure you'll find plenty of things that are "broken" or not done the way you would like. Also read the things to be done page. If you would like to fix something we would be interested in hearing from you, after you've fixed it. For new developers we request that you submit a patch. Patches should be submitted by email and should include:
- A meaningful subject (very short description of patch)
- A long (paragraph) description of what was wrong and what is now better (and now broken)
- Change Log: A short description of what was changed.
- Your contact information ( Name/Handle and e-mail )
- The patch in diff -u format.
Your changes (and the items above describing them) should be for one thing. Don't try to submit a patch to us with all your favourite little hacks because we won't even look at it. If in the course of using Boomerang you find and fix 10 bugs we would much prefer 10 patches than 1 big one. Essentially, the smaller your patch and the better your description of your patch, the more likely we are to accept it.
Should you submit a patch to us which has what we consider glaring omissions or can be done better, we may request that you rework it. This is more relevant to new features than it is to bug fixes. Obviously we're not going to accept bug fixes whatever their state but if you say you've fixed a bug we'll at least look at it! On the other hand, new features that are abandoned by the programmer as soon as they're submitted will not be accepted.
Submit enough patches and we'll add you to the project's developers list and give you CVS write access. Note that we won't accept patches from you forever. If you intend on contributing a lot please keep in mind that we will eventually ask you to become a trusted developer (and use CVS). If at that point you refuse we'll have no choice but to stop accepting your patches.
10. Your anonymous CVS access appears to be broken or anoyingly slow.
Be aware that the anonymous CVS server was different to the one the
developers use up till about June 2004, so that changes took some time
to appear at the
anonymous server after they were made to the developer's server.
The slowness appears to have been a problem at the begining of 2004, so I looked for a solution. An account was created with readonly access; this annoyed the admins and doesn't seem to be necessary any more. Sourceforge have been working very hard to fix problems like this, so please give anonymous CVS a try.
If you want we can add your sourceforge account to our list of developers and give you read only access to the CVS repository. If you want to check something in you can send us a patch (see previous question) and we'll check it in for you. This also means that we can easily turn a developer into a trusted developer, simply by turning on CVS write access (whereas if you continue with grabbing the source with anonymous CVS you'll need to do at least one brand new check out with SSH before you can check in).
11. I use the fancy -sf switch? What goes into that symbol file?See this page: using the -sf switch. A few other switches are documented there too.
12. Where do I download the sourcecode/binaries?There are no source downloads at present, and there will only be very infrequent binary releases for some time. We think that this is best, considering that Boomerang is still in the alpha stage. (In other words, we don't want to give potential users the false impression that most of the bugs are worked out of the product as yet.) So to access the source code, you need to use CVS. For Linux and Unix, just use the cvs client. For Windows, use a CVS client such as Tortoise CVS. To browse one or two source files, you can also use the Sourceforge Web CVS facilility. (NOTE: URL will change in May 2006). There is Sourceforge help on CVS as well.
You deserve to be able to cooperate openly and freely with other people who use software, to learn how the software works, or to hire your favorite programmer to fix it when it breaks. That's why we supply source code and we put a minimum of restrictions on what you can do with that source code.
13. What's the command to check out a copy of Boomerang using CVS?The definitive details are at the Sourceforge help on CVS page. For anonymous access (most users):
% cvs -d:pserver:firstname.lastname@example.org:/cvsroot/boomerang login
% cvs -z3 -d:pserver:email@example.com:/cvsroot/boomerang co -d dirname boomerang
You will be asked for a password; just press enter. The -d dirname is optional; if not given, it defaults to boomerang. For developers with a username (if you are a developer on any other Sourceforge project, you may have read only access already):
% cvs -z3 -d :ext:firstname.lastname@example.org:/cvsroot/boomerang co -d dirname boomerang
You obviously need internet access, and have CVS set up
to use SSH (Secure SHell) as the transport mechanism. For Linux, this
% export CVS_RSH=ssh
Once you have done this and you change into the checked out
you don't need the long -d option any more, e.g.
% cvs update db/exp.cpp
% cvs update
% cvs status boomerang.cpp
to respectively update one file, update all files, and get the status for one file.
Warning: the RSA host key for 'boomerang.cvs.sourceforge.net' differs from the key for the IP address '184.108.40.206'
Offending key for IP in /home/login_name/.ssh/known_hosts:15
Matching host key in /home/login_name/.ssh/known_hosts:16
Are you sure you want to continue connecting (yes/no)?
you'll soon get sick of typing "yes" all the time. This happens because Sourceforge have more than one server with the same name, and ssh is by default paranoid about fingerprints changing. You can avoid it by creating or appending to the file ~/.ssh/config with this content:
Make sure you chmod 600 ~/.ssh/config to set the permissions that ssh demands. You still need both entries in your ~/.ssh/known_hosts file, and you will still be notified if the fingerprint disagrees with all in that file.
14. Why can't I decompile /bin/ls? Isn't it tiny?Actually, no. It has 82 procedures, and is actually quite complex. As you try larger and larger programs on Boomerang, the chances of nothing going wrong approaces zero. The point where it starts approaching zero is unforunately still just outside the "toy program" area.
Yes, it is possible to decompile larger programs, and we have done it. But it took us 3 months to decompile a small (but important) part of it. We were fixing and improving Boomerang along the way, and we decompiled it one procedure at a time. If you really want to look at some code from /bin/ls, you'll have to use the -sf switch as documented elsewhere, and it will be a lot of work.
You can't just type ./boomerang /bin/ls and expect the results to compile, sorry. At present, it segfaults. Maybe in a a year or so you could expect it not to segfault, and perhaps the output to compile with a little editing. It will likely even then be difficult to understand, compared to the original source code.
15. I've just built Boomerang under Linux. Why does it just segfault all the time?This only happens rarely, and the real cause is still a mystery. However, one reader who had this problem was able to run Boomerang after typing this command:
% export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/boomerang/lib
where /path/to/bomerang is the absolute path to the boomerang executable, e.g. /home/emmerik/boomerang. This command tells the dynamic loader where to find the Bomerang dynamically linked files, such as libBinaryFile.so. This should not be needed if you don't monkey around with paths. Note that Boomerang needs several directories at runtime, so if you move it to another directory, in addition to this command, you also need soft links to these directories: frontend, signatures, and transformations.
16. Why do you use that ancient version of flex++, instead of the modern GNU flex++?Flex++ is not the problem; the problem is that there is no GNU Bison++ (as of this writing, mid 2004). (Bison is the parser, derived from yacc, flex is just the scanner, or lexical tokeniser, derived from lex). It's convenient to declare the parser as a class, be able to use all C++ features in the .y files, and so on. You can't generate Expressions (objects derived from class Exp) in a C-only parser (without a lot of pain).
So instead, we use the Coetmeur version of Bison, called Bison++, that was based on a then-current version of GNU Bison. For whatever reason, Coetmeur never sent his changes back to the GNU source code, or if he did, they haven't been merged in. Bison++ is getting a little old now, but it has all the features needed by Boomerang. (There's a handy project for someone with a bit of time to contribute: merge Coetmeur's changes into GNU's latest source code for Bison, and make any changes necessary to GNU flex/flex++.)
The version of flex++ that we specify is designed to work with Bison++. While the modern GNU flex/flex++ has C++ capabilities, I don't think that they are compatible with Coetmeur's Bison++ (though I haven't tried it). It doesn't seem worth it to use the latest GNU Flex++ if you still have to track down and compile the Coetmeur Bison++.Update: there is a parser that would appear to be suitable now: BisonC++. There just hasn't been time to test it and integrate it into Boomerang if suitable. Looking for a nice little project?
17. Why am I suddenly getting link errors after updating (Linux)?It's probably because Makefile.in has changed. In order for changes to flow from Makefile.in to Makefile, you have to run ./configure, or the shortcut ./config.status . (The latter is safe as long as your environment hasn't changed, like installing a new package).
The Makefiles and other configured files (e.g. include/config.h) are not booked in, because they are generated from other files, and are tailored to your particular configuration.
When a new module (.o file) is added to Boomerang, and the Makefile is not up to date, there will usually be a large number of link-time errors, since a whole bunch of functions will not be known by the linker.
18. Why am I suddenly getting link errors after updating (Windows)?It's probably because boomerang.vcproj has changed. This file is kept in the win32make/ directory. You probably are loading the project file from elsewhere. So you could copy the file from win32make/ to whereever you are loading from (probably the top level Boomerang directory), or you could load the project file directly from the win32make/ directory.
Note that if you do either of these two things, you will lose any changes that you made for your environment, e.g. some optimisations, or the path to some file, etc. The project file was really only meant to get you started. The only time you need to update the project file is when a new source file is added. Recently (July 2004), the source file db/xmlprogparser was added. If you checked out Boomerang before this file existed, then updated to the latest source files, you will get many link time errors to do with the Cluster (and other) class(es). You could do a search for one of the missing procedures (e.g. search for "getOutPath") to find the name of the missing source file, and simply add it to the project. Or search for the .cpp file with the latest timestamp.
19. Why do I keep getting some message about "no slash in argv"?This is fixed in the latest source code, available using CVS. The older binary release is simply looking for a slash (or reverse slash) in the name of the executable, e.g. use ".\boomerang switches..." instead of just "boomerang switches...". Unix users do this automatically, because of the security risk of having the current directory in the path.
20. What sort of binary files can I decompile with Boomerang?Boomerang currently (as of December 2004) has three front ends: Pentium, Sparc, and PPC (Power Performance Computing). The PPC front end is still being developed; it doesn't do much more than "hello world" as yet. Until other front ends are developed, it will not be possible to decompile other processors, e.g. MC68000, PIC, etc. Boomerang is currently rather 32 bit specific, so a frontend for say Itanium is unlikely without a major rewrite. A 68K (MC68000) front end could be adapted from existing booked in code with moderate effort. The pentium frontend is 32-bit protected mods specific; it will not handle real-mode code, or 16 bit code (though modifications for either or both of these would be possible with some effort).
Having a front end is not the whole story; you also need a loader for the appropriate binary file format (BFF). Boomerang has loaders for the ELF and Win32 formats. A loader for the Palm BFF could be converted from existing code without a great deal of effort.
So in practical terms, the files that can be read right now are:
- Linux/x86 and Solaris/x86 programs
- Solaris/Sparc and presumably Linux/Sparc programs
- Windows 32-bit programs (Console and non console mode programs, but not DOS or Windows 3.1 programs)
Visual Basic programs are a different situation. With Visual Studio 2003 (and presumably later) these can apparently only be compiled to .NET. If this is the case, you may be able to decompile it with a .NET decompiler that has a VB option (e.g. Reflection). VB5 and VB6 had the option of compiling to native code or to Pcode. For VB programs compiled natively, it may be possible to decompile with Boomerang (you will get C, not Basic code), but the quality will be very low. (Most likely it will just crash.) With some effort, e.g. adding many signatures, it may be possible to follow the logic of the program in C. Early versions of Visual Basic compiled to Pcode. If you have such a binary and need to decompile it, search for one of the VB specific decompilers.
If the binary file has been obfuscated, compressed, or made tamper resistant, Boomerang at present will not be able to decompile it. Whether a future strong decompiler will be able to handle binaries that don't want to be decompiled is an open question. For now Boomerang finds most real-world unobfuscated programs challenging enough; see Q6.
21. Does Boomerang need files in special directories?Yes. At runtime, it needs to read specifications, transformation rules, signatures, and it needs to load dynamic libraries (.dll or .so files) from certain paths. Under all platforms other than Windows with the Microsoft compiler, run Boomerang from the top of the Boomerang directory tree (i.e. the directory with boomerang.cpp in it). Under the Microsoft compiler, Boomerang is in the Debug or Release directory, but the "current directory" is still the parent of Debug/. If you make an icon to run Boomerang from, make sure this root directory is in the "start in" directory (this becomes the "current directory" when Boomerang is run).
If you move Boomerang to another machine or another directory, take along at least these directories: frontend/, transformations/, signatures/, and lib/. (To save space, you could remove all but the three .ssl files from frontend, i.e. keep only frontend/machine/pentium/pentium.ssl, frontend/machine/sparc/sparc.ssl, and frontend/machine/ppc/ppc.ssl.
22. Are there other machine code decompilers I can try?Yes, there are actually a few now:
- Reverse Engineering Compiler (REC) has been around for a while. After a four year break, the author is again developing it and has released a Windows GUI. Reads binary files compiled for several architectures in several load file formats. Output is C-like.
- Anatomizer is a Japanese decompiler for Win32 executables, hosted on Windows. Source code not available.
- The Andromeda decompiler decompiles Win32 executables to C and C++. Windows only. Only a demo is available at present, but the demo looks very impressive. No source code is available.
- exetoc is a decompiler for Win32 executables hosted on Windows. There is a GUI, source code is available, and its initial release in May 2005 looks quite impressive.
- There is also the Desquirr plugin for IDA Pro.
23. Boomerang has been running a long time. Is it in an infinite loop?There is a known problem with Boomerang, which mostly happens with ad-hoc type analysis (the default), where it can infinitely loop when the program being decompiled has what we have dubbed "phi loops". (There is code to avoid the loops, but I still get reports). Decompilation is slower than compilation, but by a factor of something like 3-5, perhaps 10 at the most. In practical terms, on a reasonable machine (at least 256M of memory, 512M+ preferred), if Boomerang takes more than about half an hour, it is likely in an infinite loop. One way to tell if it is locked up is if the size of the log file doesn't change for a minute or so. (This is especially true if you turn on verbose mode, but this may generate a quite large (several tens of megabytes) log file).
It is hoped that when the ad-hoc type analysis is removed, this problem will go away. The problem is that the data flow based type analysis, which will replace it (and you can try your luck with it using -Td or the GUI settings) is not well tested as yet.
Update mid 2006: there are now loop counters in several places in Boomerang, so any excessive looping should now be reported. You can now get through all the executables in the windows\system32 directory of a typical Windows XP installation.
24. Why do I get "The application failed to initialize properly (0xc0000022)"?This is a Windows problem, and is very poorly documented. It really means that you don't have permissions, usually execute permission, to run the executable or one of the DLLs that it requires. This can happen when you use for example gc_cpp.dll from CVS; we can't tell CVS to set the execute permission on certain files. Windows operating systems based on Windows NT have ACL permissions for all files, but by default there is no easy way to change or even to see these permissions. Even worse, on Windows XP Home Edition, the little support that there is in XP Pro is not even present!
The easiest way to see and change the permissions on a Windows file (that I have found) is to use
Cygwin's ls -l and chmod +x commands.
You can also use cygcheck to find the DLLs that an executable depends on.
In Windows XP Pro, you can turn on the
security tab in Windows explorer (the file explorer) by setting this registry key to 0:
If you are running XP Home Edition and you don't have Cygwin, you could try this command:
cacls gc_cpp.dll /g none:f
Last modified 22/Jun/06: cacls command; Q24; ~/.ssh/config file; cvs server URL change