fn_fuzzy: Fast Multiple Binary Diffing Triage with IDA

PRESENTATION SLIDES (PDF)

IDA Pro is the de facto disassembler for malware reverse engineers. The program saves their findings, like function names, into a corresponding database file (IDB). When analyzing new malware variants, the findings can be imported by comparing previously analyzed IDBs allowing analysts to focus on new functions.

However with multiple IDBs, the task of importing the databases is not straightforward or easy. Experienced reverse engineers have hundreds if not thousands of IDBs and typically don’t remember the code that they analyzed a few years ago. It is because of this that a tool to identify the most similar and analyzed IDBs quickly is needed.

BinDiff and Diaphora are great binary diffing tools to compare IDBs. I was able to automate BinDiff operations by writing a wrapper script. Unfortunately the process was slow (about 100 samples comparison took 300-800 secs in a virtualized machine for analysis). Kam1n0 provides an one-to-many comparison capability. Though it is able to precisely detect an original function of the highly-obfuscated one, 20 samples comparison took over 1 hour in the same VM. What I’d like to use is a fast and light-weight binary diffing tool for large IDBs – Enter fn_fuzzy! It calculates two kinds of fuzzy hashes for each function of IDBs:

ssdeep hash value of code bytes – Relocation (fixup) bytes, direct memory reference data and other ignorable ones are excluded in the calculation.
Machoc hash value of call flow graph – Machoc value is used to correct the result by ssdeep hash when the function code bytes are small or generated polymorphically.

All hashes are saved into one database file then used for comparison. We can import function names and prototypes from all IDBs to the target at one time.

MAIN CONFERENCE

Location: Track 2 Date: May 9, 2019 Time: 11:30 am - 12:30 pm

Takahiro Haruyama