Computer criminals are always ready to compromise weakness in the system with their hostile codes. Computer Viruses, Worms, Trojans, Malwares, you name it. Often these programs are custom compiled and not widely distributed. Because of this, anti-virus software won’t detect their presences.
In this article we will try to outlines the process of reverse engineering hostile codes. Hostile codes mean any process running on a system that is not authorized by the system administrators. However, our scope will be limited. This article is not intended to be an in-depth tutorial, rather a description of the tools and steps involved.
There are many tools which can be used for reverse engineering. Reverse engineering can be done in both Unix and Windows platform. However, Unix is still the ideal platform in my opinion. If you are installing Cygwin on Windows, you can emulate Unix environment and do what you can do in Unix.
Going to Windows route will cost lot of money to us where as most of solutions are all free and open source.
Some useful commands based on their categories:
- Disk Image Tool – To create disk image, convert and copy a file byte-to-byte. Useful to perform analysis on a compromised system’s disk without affecting the integrity of evidence of the intrusion. Solution in this category: dd.
- File Type Identifier – Identifies the type of a file. Identification should rely on magic number used in some part of the file, rather than rely on file extension. Solution in this category: file.
- String Identifier – Outputs readable strings from file. Strings might reside in data section or event code section. Solution in this category: strings
- Hex Editor – Read and edit binary files. Solution in this category: okteta, HexRay.
- Checksum – Creates a unique checksum for comparison purpose. Solution in this category: md5sum, sha1sum, sha256sum.
- Diff Tools – Show differences between files. Solution in this category: diff.
- Files Monitors – Show all open files and sockets by process. Solution in this category: lsof
- Packet Sniffer – Sniffing network packet and traffic to/from machine. Solution in this category: tcpdump, wireshark.
- String Search – Search for strings within a file. Solution in this category: grep.
- Packer/Unpacker – see Packer section.
- Decompiler – see Decomilation.
- Disassembler – see Disassembler.
Malwares are often compressed with an executable packer. This not only makes the code more compact, but also prevets much of the internal string data from being viewed. The most commonly used packer is UPX, which can compress Linux of Windows binaries. Other solutions are available, but they are typically Windows only packer. Good thing is, UPX provide manual decompression to restore the original image. This saves lot of times.
In an ordinary executable, running the “strings” command or examining the malwares with hexeditor should show many readable and complete strings in the file. If we see random characters or mostly truncated and scattered pieces of text, most likely the executables has been packed. Find string “UPX” somewhere in the file to confirm UPX packer involved here. You may want to deal with one of the many other executable packers.
Some malwares might be written in an interpreted or semi-interpreted language such as .NET, Java, etc. You can consider yourself being lucky. There are tools available to decompile these languages to varying degrees.
- .NET – Microsoft flagship platform for programming. Some decompiler exists around such as ILSpy, dotPeek, etc.
- Visual Basic – More precisely, Visual Basic before the era of .NET. Visual Basic application is assembled into so called PCode. One of Visual Basic PCode decompiler is P32Dasm.
- Java – Next infamous cross platform. There is an excellent decompiler jad. Several other known decompilers exist such as: JD, Mocha, JEB Decompiler (for android APK).
- Delphi – The Pascal in different way. Delphi is once become de facto Rapid Application Development standard. Several decompiler exists, such as DeDe, DE Decompiler, Interactive Delphi Reconstructor.
Some popular interpreted language can be compiled to native codes. And also, there are some tools for decompiling them. While malware engineered using these language are rarely seen, it’s good choice to keep the tools on our arsenal.
- Perl – Use exe2perl.
- Ruby – There exists Ruby Decompiler
If malwares are written in native codes using compiled language, there is chance we can decompile it. Hex-Rays decompiler is one of tool serves this purpose. The Hex Rays decompiler is a plug-ing for IDA Pro, so we should have IDA Pro first. Another option available is Boomerang Decompiler.
However please keep in mind that there is no guarantee that decompilation will return the code as is.
Native code decompiler might exists, however as said before decompilation won’t give guarantee the code returned as is. Our next option is disassembler. These tools work by disassembly the executables into assembly code. For Unix we can use objdump and some of its wrapper like dasm. For windows we can use W32dasm. Some multi platform tool exists, such as IDA Pro and Radare2. These programs will disassemble our code then match up strings in the data segment to where they are used in the program, as well show us separation between subroutines.
Deadlisting can be quite valuable, but we still want to debug the code, especially if the malware is communication via network sockets. Debuggers give us access to the memory and temporary variables stored in the program, as well as all data it is sending and receiving from socket communication.
On Unix land, there is gdb debugger. Under Windows, the choices are far more varied, but most tutorials on reverse engineering under Win32 land use OllyDbg and SoftICE.
Running hostile codes must be done with more precautions, even under debugger. Never debug malwares on production machine or network. Ideally, a lab network specially created for this is recommended. Here is the minimal network configuration recommended:
The debug system should have a clean install of whatever Operating System the malware is intended for. The Firewall is used to protect the network from unwanted incident from outside. Ensure that you firewall all outbound connections, allowing only the Trojan’s control connection through. If you don’t want the master controller to know your lab network is running the Trojan, you can set up services to mimic the resources the Trojan needs, such as an IRC or FTP/TFTP server. The third machine on the network is sniffer which emulate the service and also acts to capture the network traffic generated by the malwares.
First, we skim the code and search for particular interesting function used. We look for key function such as Winsock and file I/O calls. Then we search the occurrence or where are our key-functions invoked. Let the debugger breakpoints on them. There we can interrupt the flow of the program and examine memory and CPU registers at that point.
Running the Code
One of the case we want to inspect is how the malwares communicates with other. Or maybe how the malware communicates with its controller. Often, sniffing the network traffic will be sufficient. However, many newer Trojans are incorporating encryption into their network traffic, making network sniffing useless. However, with some cleverness we can grab the messages from memory before they are encrypted. By setting a breakpoint on the “send” socket library call, we can interrupt the code just prior to the packet being sent. Then, by getting a stack trace, we can see where we are in the program.
Another thing we should consider is the payload or what the malwares do. Poking through file system and access interesting system calls might interest us.