When reversing embedded code, it is often the case that completely different devices are built around a common code base, either due to code re-use by the vendor, or through the use of third-party software; this is especially true of devices running the same Real Time Operating System.
For example, I have two different routers, manufactured by two different vendors, and released about four years apart. Both devices run VxWorks, but the firmware for the older device included a symbol table, making it trivial to identify most of the original function names:
The older device with the symbol table is running VxWorks 5.5, while the newer device (with no symbol table) runs VxWorks 5.5.1, so they are pretty close in terms of their OS version. However, even simple functions contain a very different sequence of instructions when compared between the two firmwares:
Of course, binary variations can be the result of any number of things, including differences in the compiler version and changes to the build options.
Despite this, it would still be quite useful to take the known symbol names from the older device, particularly those of standard and common subroutines, and apply them to the newer device in order to facilitate the reversing of higher level functionality.
Existing Solutions
The IDB_2_PAT plugin will generate FLIRT signatures from the IDB with a symbol table; IDA’s FLIRT analysis can then be used to identify functions in the newer, symbol-less IDB:
With the FLIRT signatures, IDA was able to identify 164 functions, some of which, like
Of course, FLIRT signatures will only identify functions that start with the same sequence of instructions, and many of the standard POSIX functions, such as
Because FLIRT signatures only examine the first 32 bytes of a function, there are also many signature collisions between similar functions, which can be problematic:
;--------- (delete these lines to allow sigmake to read this file) ; add '+' at the start of a line to select a module ; add '-' if you are not sure about the selection ; do nothing if you want to exclude all modules div_r 54 B8C8 00000000000000000085001A0000081214A00002002010210007000D2401FFFF ldiv_r 54 B8C8 00000000000000000085001A0000081214A00002002010210007000D2401FFFF proc_sname 00 0000 0000102127BDFEF803E0000827BD0108................................ proc_file 00 0000 0000102127BDFEF803E0000827BD0108................................ atoi 00 0000 000028250809F52A2406000A........................................ atol 00 0000 000028250809F52A2406000A........................................ PinChecksum FF 5EB5 00044080010440213C046B5F000840403484CA6B010400193C0ECCCC35CECCCD wps_checksum1 FF 5EB5 00044080010440213C046B5F000840403484CA6B010400193C0ECCCC35CECCCD wps_checksum2 FF 5EB5 00044080010440213C046B5F000840403484CA6B010400193C0ECCCC35CECCCD _d_cmp FC 1FAF 0004CD02333907FF240F07FF172F000A0006CD023C18000F3718FFFF2419FFFF _d_cmpe FC 1FAF 0004CD02333907FF240F07FF172F000A0006CD023C18000F3718FFFF2419FFFF _f_cmp A0 C947 0004CDC2333900FF241800FF173800070005CDC23C19007F3739FFFF0099C824 _f_cmpe A0 C947 0004CDC2333900FF241800FF173800070005CDC23C19007F3739FFFF0099C824 m_get 00 0000 00803021000610423C04803D8C8494F0................................ m_gethdr 00 0000 00803021000610423C04803D8C8494F0................................ m_getclr 00 0000 00803021000610423C04803D8C8494F0................................ ...
Alternative Signature Approaches
Examining the functions between the two VxWorks firmwares shows that there are a small fraction (about 3%) of unique subroutines that are identical between both firmware images:
Signatures can be created over the entirety of these functions in order to generate more accurate fingerprints, without the possibility of collisions due to similar or identical function prologues in unrelated subroutines.
Still other functions are very nearly identical, as exemplified by the following functions which only differ by a couple of instructions:
A simple way to identify these similar, but not identical, functions in an architecture independent manner is to generate “fuzzy” signatures based only on easily identifiable actions, such as memory accesses, references to constant values, and function calls.
In the above function for example, we can see that there are six code blocks, one which references the immediate value
A bit more reliable metric is unique string references, such as this one in
Likewise, unique constants can also be used for function identification, particularly subroutines related to crypto or hashing:
Even identifying functions whose names we don’t know can be useful. Consider the following code snippet in
This unidentified function calls
Alternative Signatures in Practice
I wrote an IDA plugin to automate these signature techniques and apply them to the VxWorks 5.5.1 firmware:
This identified nearly 1,300 functions, and although some of those are probably incorrect, it was quite successful in locating many standard POSIX functions:
Like any such automated process, this is sure to produce some false positives/negatives, but having used it successfully against several RTOS firmwares now, I’m quite happy with it (read: “it works for me”!).
Interesting technique – would be good to try this for some other common base firmwares.
It’s raised something I’ve been wondering though – why are bzero() and bcopy() still in fairly common use on embedded systems?
I’ve used it against other RTOS’s including eCos and SuperTask! and it’s worked pretty well. I’ve also used it against some statically compiled Linux binaries too. Of course, as with any such tool, its level of effectiveness is determined primarily by the similarity between the two code bases you’re comparing, but I’m working on improving the plugin’s effectiveness.
As for bzero/bcopy, I can’t say that I’ve really seen them used that much in embedded systems myself, but most RTOS’s will at least include them for backwards compatibility and to ease the porting of code.
Nice approach !
I worked on this problem too, ending on a dynamic fuzzing approach. Did you look at ?
Your method seems more appropriate for similar binaries 🙂
Oh cool! I was considering doing something similar, but it looks like you’ve saved me the work. 🙂
I tried something similar for working with Cisco IOS images. I mostly used string references and function calls. It mostly works, but I’ve only really tried it on Cisco IOS images, so I can’t say much for how useful it is to other firmware images.
https://github.com/jeffball55/cisco_tools/blob/master/find_func_names.py
We currently use BinDiff to perform a similar task. This plugin looks cool, but outside of saving the $400 license fee for BinDiff, what does this plugin do that existing tools don’t already do?
Don’t get me wrong. I appreciate the work you put in to making this available to the community. I am just trying to understand how I might make the most effective use of this tool in our existing process.
Thanks!
I haven’t used BinDiff myself, but I don’t think this plugin would have any advantages over BinDiff. If you look at the BinDiff user manual, it basically does all this and more, so I doubt it would miss things that this plugin would find.
Sometimes, BinDiff can not identify some identical functions, PatchDiff2 works more effective and credible.
Only $200 these days: http://www.zynamics.com/software.html
Pingback: Rizzo - A Code Signature Plugin for IDA
This tool is literally the best thing. This is infinetly better than the Flare signature tool I was trying to use. Thank you very much 😀