A Code Signature Plugin for IDA

When reversing embedded code, it is often the case that completely different devices are built around a common code base, either due to code re-use by the vendor, or through the use of third-party software; this is especially true of devices running the same Real Time Operating System.

For example, I have two different routers, manufactured by two different vendors, and released about four years apart. Both devices run VxWorks, but the firmware for the older device included a symbol table, making it trivial to identify most of the original function names:

VxWorks Symbol Table

VxWorks Symbol Table

The older device with the symbol table is running VxWorks 5.5, while the newer device (with no symbol table) runs VxWorks 5.5.1, so they are pretty close in terms of their OS version. However, even simple functions contain a very different sequence of instructions when compared between the two firmwares:

strcpy from the VxWorks 5.5 firmware

strcpy from the VxWorks 5.5 firmware

strcpy from the VxWorks 5.5.1 firmware

strcpy from the VxWorks 5.5.1 firmware

Of course, binary variations can be the result of any number of things, including differences in the compiler version and changes to the build options.

Despite this, it would still be quite useful to take the known symbol names from the older device, particularly those of standard and common subroutines, and apply them to the newer device in order to facilitate the reversing of higher level functionality.

Existing Solutions

The IDB_2_PAT plugin will generate FLIRT signatures from the IDB with a symbol table; IDA’s FLIRT analysis can then be used to identify functions in the newer, symbol-less IDB:

Functions identified by IDA FLIRT analysis

Functions identified by IDA FLIRT analysis

With the FLIRT signatures, IDA was able to identify 164 functions, some of which, like os_memcpy and udp_cksum, are quite useful.

Of course, FLIRT signatures will only identify functions that start with the same sequence of instructions, and many of the standard POSIX functions, such as printf and strcmp, were not found.

Because FLIRT signatures only examine the first 32 bytes of a function, there are also many signature collisions between similar functions, which can be problematic:

;--------- (delete these lines to allow sigmake to read this file)
; add '+' at the start of a line to select a module
; add '-' if you are not sure about the selection
; do nothing if you want to exclude all modules

div_r                                               54 B8C8 00000000000000000085001A0000081214A00002002010210007000D2401FFFF
ldiv_r                                              54 B8C8 00000000000000000085001A0000081214A00002002010210007000D2401FFFF

proc_sname                                          00 0000 0000102127BDFEF803E0000827BD0108................................
proc_file                                           00 0000 0000102127BDFEF803E0000827BD0108................................

atoi                                                00 0000 000028250809F52A2406000A........................................
atol                                                00 0000 000028250809F52A2406000A........................................

PinChecksum                                         FF 5EB5 00044080010440213C046B5F000840403484CA6B010400193C0ECCCC35CECCCD
wps_checksum1                                       FF 5EB5 00044080010440213C046B5F000840403484CA6B010400193C0ECCCC35CECCCD
wps_checksum2                                       FF 5EB5 00044080010440213C046B5F000840403484CA6B010400193C0ECCCC35CECCCD

_d_cmp                                              FC 1FAF 0004CD02333907FF240F07FF172F000A0006CD023C18000F3718FFFF2419FFFF
_d_cmpe                                             FC 1FAF 0004CD02333907FF240F07FF172F000A0006CD023C18000F3718FFFF2419FFFF

_f_cmp                                              A0 C947 0004CDC2333900FF241800FF173800070005CDC23C19007F3739FFFF0099C824
_f_cmpe                                             A0 C947 0004CDC2333900FF241800FF173800070005CDC23C19007F3739FFFF0099C824

m_get                                               00 0000 00803021000610423C04803D8C8494F0................................
m_gethdr                                            00 0000 00803021000610423C04803D8C8494F0................................
m_getclr                                            00 0000 00803021000610423C04803D8C8494F0................................

...

Alternative Signature Approaches

Examining the functions between the two VxWorks firmwares shows that there are a small fraction (about 3%) of unique subroutines that are identical between both firmware images:

bcopy from the VxWorks 5.5 firmware

bcopy from the VxWorks 5.5 firmware

bcopy from the VxWorks 5.5.1 firmware

bcopy from the VxWorks 5.5.1 firmware

Signatures can be created over the entirety of these functions in order to generate more accurate fingerprints, without the possibility of collisions due to similar or identical function prologues in unrelated subroutines.

Still other functions are very nearly identical, as exemplified by the following functions which only differ by a couple of instructions:

A function from the VxWorks 5.5 firmware

A function from the VxWorks 5.5 firmware

The same function, in the VxWorks 5.5.1 firmware

The same function, from the VxWorks 5.5.1 firmware

A simple way to identify these similar, but not identical, functions in an architecture independent manner is to generate “fuzzy” signatures based only on easily identifiable actions, such as memory accesses, references to constant values, and function calls.

In the above function for example, we can see that there are six code blocks, one which references the immediate value 0xFFFFFFFF, one which has a single function call, and one which contains two function calls. As long as no other functions match this “fuzzy” signature, we can use these unique metrics to identify this same function in other IDBs. Although this type of matching can catch functions that would otherwise go unidentified, it also has a higher propensity for false positives.

A bit more reliable metric is unique string references, such as this one in gethostbyname:

gethostbyname string xref

gethostbyname string xref

Likewise, unique constants can also be used for function identification, particularly subroutines related to crypto or hashing:

Constant 0x41C64E6D used by rand

Constant 0x41C64E6D used by rand

Even identifying functions whose names we don’t know can be useful. Consider the following code snippet in sub_801A50E0, from the VxWorks 5.5 firmware:

Function calls from sub_801A50E0

Function calls from sub_801A50E0

This unidentified function calls memset, strcpy, atoi, and sprintf; hence, if we can find this same function in other VxWorks firmware, we can identify these standard functions by association.

Alternative Signatures in Practice

I wrote an IDA plugin to automate these signature techniques and apply them to the VxWorks 5.5.1 firmware:

Output from the Rizzo plugin

Output from the Rizzo plugin

This identified nearly 1,300 functions, and although some of those are probably incorrect, it was quite successful in locating many standard POSIX functions:

Functions identified by Rizzo

Functions identified by Rizzo

Like any such automated process, this is sure to produce some false positives/negatives, but having used it successfully against several RTOS firmwares now, I’m quite happy with it (read: “it works for me”!).

Bookmark the permalink.

11 Responses to A Code Signature Plugin for IDA

  1. Cybergibbons says:

    Interesting technique – would be good to try this for some other common base firmwares.

    It’s raised something I’ve been wondering though – why are bzero() and bcopy() still in fairly common use on embedded systems?

    • Craig says:

      I’ve used it against other RTOS’s including eCos and SuperTask! and it’s worked pretty well. I’ve also used it against some statically compiled Linux binaries too. Of course, as with any such tool, its level of effectiveness is determined primarily by the similarity between the two code bases you’re comparing, but I’m working on improving the plugin’s effectiveness.

      As for bzero/bcopy, I can’t say that I’ve really seen them used that much in embedded systems myself, but most RTOS’s will at least include them for backwards compatibility and to ease the porting of code.

  2. commial says:

    Nice approach !

    I worked on this problem too, ending on a dynamic fuzzing approach. Did you look at ?

    Your method seems more appropriate for similar binaries 🙂

  3. jeffball says:

    I tried something similar for working with Cisco IOS images. I mostly used string references and function calls. It mostly works, but I’ve only really tried it on Cisco IOS images, so I can’t say much for how useful it is to other firmware images.

    https://github.com/jeffball55/cisco_tools/blob/master/find_func_names.py

  4. Scott says:

    We currently use BinDiff to perform a similar task. This plugin looks cool, but outside of saving the $400 license fee for BinDiff, what does this plugin do that existing tools don’t already do?

    Don’t get me wrong. I appreciate the work you put in to making this available to the community. I am just trying to understand how I might make the most effective use of this tool in our existing process.

    Thanks!

  5. Pingback: Rizzo - A Code Signature Plugin for IDA

  6. CT says:

    This tool is literally the best thing. This is infinetly better than the Flare signature tool I was trying to use. Thank you very much 😀

Leave a Reply

Your email address will not be published. Required fields are marked *