Skip to content

Commit d1fcaa5

Browse files
author
Daniel Plohmann (jupiter)
committed
Disassembler now emits an object instead of JSON report - preparing PyPI launch
1 parent f63abad commit d1fcaa5

32 files changed

+886
-432
lines changed

LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Copyright (c) 2018, Daniel Plohmann and Steffen Enders
1+
Copyright (c) 2018-2020, Daniel Plohmann and Steffen Enders
22

33
All rights reserved.
44

README.md

Lines changed: 46 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -7,32 +7,70 @@ As input, arbitrary memory dumps (ideally with known base address) can be proces
77
The output is a collection of functions, basic blocks, and instructions with their respective edges between blocks and functions (in/out).
88
Optionally, references to the Windows API can be inferred by using the ApiScout method.
99

10-
To get an impression how to work with the library, check the demo script:
10+
## Installation
1111

12-
* analyze.py -- example usage: perform disassembly and optionally store results in JSON to a given output path.
12+
With version 1.2.0, we have finally simplified things by moving to PyPI!
13+
So installation now is as easy as:
14+
15+
```
16+
$ pip install smda
17+
```
18+
19+
## Usage
20+
21+
A typical workflow using SMDA could like this:
22+
23+
```
24+
>>> from smda.Disassembler import Disassembler
25+
>>> disassembler = Disassembler()
26+
>>> report = disassembler.disassembleFile("/bin/cat")
27+
>>> print(report)
28+
0.777s -> (architecture: intel.64bit, base_addr: 0x00000000): 86 functions
29+
>>> for fn in report.getFunctions():
30+
... print(fn)
31+
... for ins in fn.getInstructions():
32+
... print(ins)
33+
...
34+
0x00001720: (-> 1, 1->) 3 blocks, 7 instructions.
35+
0x00001720: ( 4883ec08) - sub rsp, 8
36+
0x00001724: (488b05bd682000) - mov rax, qword ptr [rip + 0x2068bd]
37+
0x0000172b: ( 4885c0) - test rax, rax
38+
0x0000172e: ( 7402) - je 0x1732
39+
0x00001730: ( ffd0) - call rax
40+
0x00001732: ( 4883c408) - add rsp, 8
41+
0x00001736: ( c3) - ret
42+
0x00001ad0: (-> 1, 4->) 1 blocks, 12 instructions.
43+
[...]
44+
>>> json_report = report.toDict()
45+
```
46+
47+
There is also a demo script:
48+
49+
* analyze.py -- example usage: perform disassembly on a file or memory dump and optionally store results in JSON to a given output path.
1350

1451
The code should be fully compatible with Python 2 and 3.
1552
Further explanation on the innerworkings follow in separate publications but will be referenced here.
1653

17-
To take full advantage of SMDA's capabilities, optionally install:
54+
To take full advantage of SMDA's capabilities, make sure to (optionally) install:
1855
* lief
1956
* pdbparse (currently as fork from https://github.com/VPaulV/pdbparse to support Python3)
2057

2158
## Version History
2259

23-
* 2020-04-28: Several improvements, including: x64 jump table handling, better data flow handling for calls using registers and tailcalls, extended list of common prologues based on much more groundtruth data, extended padding instruction list for gap function discovery, adjusted weights in candidate priority score, filtering code areas based on section tables, using exported symbols as candidates, new function output metadata: confidence score based on instruction mnemonic histogram, PIC hash based on escaped binary instruction sequence
60+
* 2020-04-29: v1.2.0 - Restructured config.py into smda/SmdaConfig.py to similfy usage and now available via PyPI! The smda/Disassembler.py now emits a report object (smda.common.SmdaReport) that allows direct (pythonic) interaction with the results - a JSON can still be easily generated by using toDict() on the report.
61+
* 2020-04-28: v1.1.0 - Several improvements, including: x64 jump table handling, better data flow handling for calls using registers and tailcalls, extended list of common prologues based on much more groundtruth data, extended padding instruction list for gap function discovery, adjusted weights in candidate priority score, filtering code areas based on section tables, using exported symbols as candidates, new function output metadata: confidence score based on instruction mnemonic histogram, PIC hash based on escaped binary instruction sequence
2462
* 2020-03-10: Various minor fixes and QoL improvements.
2563
* 2019-08-20: IdaExporter is now handling failed instruction conversion via capstone properly.
2664
* 2019-08-19: Minor fix for crashes caused by PDB parser.
27-
* 2019-08-05: SMDA can now export reports from IDA Pro (requires capstone to be available for idapython).
65+
* 2019-08-05: v1.0.3 - SMDA can now export reports from IDA Pro (requires capstone to be available for idapython).
2866
* 2019-06-13: PDB symbols for functions are now resolved if given a PDB file using parameter "-d" (THX to @VPaulV).
2967
* 2019-05-15: Fixed a bug in PE mapper where buffer would be shortened because of misinterpretation of section sizes.
30-
* 2019-01-28: ELF symbols for functions are now resolved, if present in the file. Also "-m" parameter changed to "-p" to imply parsing instead of just mapping (THX: @VPaulV).
68+
* 2019-02-14: v1.0.2 - ELF symbols for functions are now resolved, if present in the file. Also "-m" parameter changed to "-p" to imply parsing instead of just mapping (THX: @VPaulV).
3169
* 2018-12-12: all gcc jump table styles are now parsed correctly.
3270
* 2018-11-26: Better handling of multibyte NOPs, ELF loader now provides base addr.
3371
* 2018-09-28: We now have functional PE/ELF loaders.
34-
* 2018-07-09: Performance improvements.
35-
* 2018-07-01: Initial Release.
72+
* 2018-07-09: v1.0.1 - Performance improvements.
73+
* 2018-07-01: v1.0.0 - Initial Release.
3674

3775

3876
## Credits

analyze.py

Lines changed: 16 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -4,24 +4,21 @@
44
import os
55
import re
66

7-
import config
7+
from smda.SmdaConfig import SmdaConfig
88
from smda.Disassembler import Disassembler
99

10-
LOGGER = logging.getLogger(__name__)
11-
12-
1310
def parseBaseAddrFromArgs(args):
1411
if args.base_addr:
1512
parsed_base_addr = int(args.base_addr, 16) if args.base_addr.startswith("0x") else int(args.base_addr)
16-
LOGGER.info("using provided base address: 0x%08x %d", parsed_base_addr, parsed_base_addr)
13+
logging.info("using provided base address: 0x%08x %d", parsed_base_addr, parsed_base_addr)
1714
return parsed_base_addr
1815
# try to infer base addr from filename:
1916
baddr_match = re.search(re.compile("0x(?P<base_addr>[0-9a-fA-F]{8,16})$"), args.input_path)
2017
if baddr_match:
2118
parsed_base_addr = int(baddr_match.group("base_addr"), 16)
22-
LOGGER.info("Parsed base address from file name: 0x%08x %d", parsed_base_addr, parsed_base_addr)
19+
logging.info("Parsed base address from file name: 0x%08x %d", parsed_base_addr, parsed_base_addr)
2320
return parsed_base_addr
24-
LOGGER.warning("No base address recognized, using 0.")
21+
logging.warning("No base address recognized, using 0.")
2522
return 0
2623

2724

@@ -42,20 +39,27 @@ def readFileContent(file_path):
4239

4340
ARGS = PARSER.parse_args()
4441
if ARGS.input_path:
45-
REPORT = {}
42+
SMDA_REPORT = None
4643
INPUT_FILENAME = ""
4744
if os.path.isfile(ARGS.input_path):
45+
# optionally create and set up a config, e.g. when using ApiScout profiles for WinAPI import usage discovery
46+
config = SmdaConfig()
47+
config.API_COLLECTION_FILES = {
48+
"win_7": os.sep.join([config.PROJECT_ROOT, "data", "apiscout_win7_prof-n_sp1.json"])
49+
}
4850
DISASSEMBLER = Disassembler(config)
4951
print("now analyzing {}".format(ARGS.input_path))
5052
INPUT_FILENAME = os.path.basename(ARGS.input_path)
5153
if ARGS.parse_header:
52-
REPORT = DISASSEMBLER.disassembleFile(ARGS.input_path, pdb_path=ARGS.pdb_path)
54+
SMDA_REPORT = DISASSEMBLER.disassembleFile(ARGS.input_path, pdb_path=ARGS.pdb_path)
5355
else:
5456
BUFFER = readFileContent(ARGS.input_path)
5557
BASE_ADDR = parseBaseAddrFromArgs(ARGS)
56-
REPORT = DISASSEMBLER.disassembleBuffer(BUFFER, BASE_ADDR)
57-
if REPORT and os.path.isdir(ARGS.output_path):
58+
SMDA_REPORT = DISASSEMBLER.disassembleBuffer(BUFFER, BASE_ADDR)
59+
SMDA_REPORT.filename = os.path.basename(ARGS.input_path)
60+
print(SMDA_REPORT)
61+
if SMDA_REPORT and os.path.isdir(ARGS.output_path):
5862
with open(ARGS.output_path + os.sep + INPUT_FILENAME + ".smda", "w") as fout:
59-
json.dump(REPORT, fout, indent=1, sort_keys=True)
63+
json.dump(SMDA_REPORT.toDict(), fout, indent=1, sort_keys=True)
6064
else:
6165
PARSER.print_help()

config.py

Lines changed: 0 additions & 35 deletions
This file was deleted.

export.py

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,8 @@
22
import logging
33
import os
44

5-
import config
65
from smda.Disassembler import Disassembler
76

8-
LOGGER = logging.getLogger(__name__)
9-
107

118
def detectBackend():
129
backend = ""
@@ -25,11 +22,11 @@ def detectBackend():
2522
if __name__ == "__main__":
2623
BACKEND, VERSION = detectBackend()
2724
if BACKEND == "IDA":
28-
DISASSEMBLER = Disassembler(config, backend=BACKEND)
25+
DISASSEMBLER = Disassembler(backend=BACKEND)
2926
REPORT = DISASSEMBLER.disassembleBuffer(None, None)
3027
output_path = idautils.GetIdbDir()
3128
with open(output_path + ".smda", "wb") as fout:
3229
json.dump(REPORT, fout, indent=1, sort_keys=True)
33-
LOGGER.info("Output saved to: %s.smda", output_path)
30+
logging.info("Output saved to: %s.smda", output_path)
3431
else:
3532
raise Exception("No supported backend found.")

setup.py

Lines changed: 20 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,35 @@
11
# -*- coding: utf-8 -*-
22

33
from setuptools import setup, find_packages
4-
import config
5-
6-
7-
with open('README.rst') as f:
8-
README = f.read()
94

105
with open('LICENSE') as f:
116
LICENSE = f.read()
127

8+
with open("README.md", "r") as fh:
9+
long_description = fh.read()
10+
1311
setup(
1412
name='smda',
15-
version=config.VERSION,
13+
# note to self: always change this in config as well.
14+
version='1.2.0',
1615
description='A recursive disassmbler optimized for CFG recovery from memory dumps. Based on capstone.',
17-
long_description=README,
16+
long_description_content_type="text/markdown",
17+
long_description=long_description,
1818
author='Daniel Plohmann',
1919
author_email='[email protected]',
2020
url='https://github.com/danielplohmann/smda',
2121
license=LICENSE,
22-
packages=find_packages(exclude=('tests', 'docs'))
22+
packages=find_packages(exclude=('tests', 'docs')),
23+
data_files = [
24+
('', ['LICENSE']),
25+
],
26+
classifiers=[
27+
"Development Status :: 4 - Beta",
28+
"License :: OSI Approved :: BSD License",
29+
"Operating System :: OS Independent",
30+
"Programming Language :: Python :: 2.7",
31+
"Programming Language :: Python :: 3",
32+
"Topic :: Security",
33+
"Topic :: Software Development :: Disassemblers",
34+
],
2335
)

smda/Disassembler.py

Lines changed: 35 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -7,22 +7,28 @@
77
from .intel.IntelDisassembler import IntelDisassembler
88
from .ida.IdaExporter import IdaExporter
99
from smda.utility.FileLoader import FileLoader
10+
from smda.SmdaConfig import SmdaConfig
11+
from smda.common.BinaryInfo import BinaryInfo
12+
from smda.common.SmdaReport import SmdaReport
1013

1114
class Disassembler(object):
1215

13-
def __init__(self, config, backend="intel"):
16+
def __init__(self, config=None, backend="intel"):
17+
if config is None:
18+
config = SmdaConfig()
1419
self.config = config
1520
self.disassembler = None
1621
if backend == "intel":
17-
self.disassembler = IntelDisassembler(config)
22+
self.disassembler = IntelDisassembler(self.config)
1823
elif backend == "IDA":
19-
self.disassembler = IdaExporter(config)
20-
self.disassembly = None
24+
self.disassembler = IdaExporter(self.config)
2125
self._start_time = None
2226
self._timeout = 0
27+
# cache the last DisassemblyResult
28+
self.disassembly = None
2329

2430
def _getDurationInSeconds(self, start_ts, end_ts):
25-
return (self.analysis_end_ts - self.analysis_start_ts).seconds + ((self.analysis_end_ts - self.analysis_start_ts).microseconds / 1000000.0)
31+
return (end_ts - start_ts).seconds + ((end_ts - start_ts).microseconds / 1000000.0)
2632

2733
def _callbackAnalysisTimeout(self):
2834
if not self._timeout:
@@ -32,70 +38,43 @@ def _callbackAnalysisTimeout(self):
3238

3339
def disassembleFile(self, file_path, pdb_path=""):
3440
loader = FileLoader(file_path, map_file=True)
35-
base_addr = loader.getBaseAddress()
36-
bitness = loader.getBitness()
3741
file_content = loader.getData()
38-
code_areas = loader.getCodeAreas()
42+
binary_info = BinaryInfo(file_content)
43+
binary_info.file_path = file_path
44+
binary_info.base_addr = loader.getBaseAddress()
45+
binary_info.bitness = loader.getBitness()
46+
binary_info.code_areas = loader.getCodeAreas()
3947
start = datetime.datetime.utcnow()
4048
try:
41-
self.disassembler.setFilePath(file_path)
42-
self.disassembler.addPdbFile(pdb_path, base_addr)
43-
self.disassembler.setCodeAreas(code_areas)
44-
disassembly = self.disassemble(file_content, base_addr, bitness=bitness, timeout=self.config.TIMEOUT)
45-
print(disassembly)
46-
report = self.getDisassemblyReport(disassembly)
47-
report["metadata"]["filename"] = os.path.basename(file_path)
49+
self.disassembler.addPdbFile(binary_info, pdb_path)
50+
smda_report = self._disassemble(binary_info, timeout=self.config.TIMEOUT)
4851
except Exception as exc:
4952
print("-> an error occured (", str(exc), ").")
50-
report = {"status":"error", "meta": {"traceback": traceback.format_exc(exc)}, "execution_time": self._getDurationInSeconds(start, datetime.datetime.utcnow())}
51-
return report
53+
smda_report = self._createErrorReport(start, exc)
54+
return smda_report
5255

5356
def disassembleBuffer(self, file_content, base_addr, bitness=None):
5457
start = datetime.datetime.utcnow()
5558
try:
56-
self.disassembler.setFilePath("")
57-
disassembly = self.disassemble(file_content, base_addr, bitness, timeout=self.config.TIMEOUT)
58-
print(disassembly)
59-
report = self.getDisassemblyReport(disassembly)
60-
report["metadata"]["filename"] = ""
59+
binary_info = BinaryInfo(file_content)
60+
binary_info.base_addr = base_addr
61+
binary_info.bitness = bitness
62+
smda_report = self._disassemble(binary_info, timeout=self.config.TIMEOUT)
6163
except Exception as exc:
6264
print("-> an error occured (", str(exc), ").")
63-
report = {"status":"error", "meta": {"traceback": traceback.format_exc(exc)}, "execution_time": self._getDurationInSeconds(start, datetime.datetime.utcnow())}
64-
return report
65+
smda_report = self._createErrorReport(start, exc)
66+
return smda_report
6567

66-
def disassemble(self, binary, base_addr, bitness=None, timeout=0):
68+
def _disassemble(self, binary_info, timeout=0):
6769
self._start_time = datetime.datetime.utcnow()
6870
self._timeout = timeout
69-
self.disassembly = self.disassembler.analyzeBuffer(binary, base_addr, bitness, self._callbackAnalysisTimeout)
70-
return self.disassembly
71+
self.disassembly = self.disassembler.analyzeBuffer(binary_info, self._callbackAnalysisTimeout)
72+
return SmdaReport(self.disassembly)
7173

72-
def getDisassemblyReport(self, disassembly=None):
73-
report = {}
74-
if disassembly is None:
75-
if self.disassembly is not None:
76-
disassembly = self.disassembly
77-
else:
78-
return {}
79-
stats = DisassemblyStatistics(disassembly)
80-
report = {
81-
"architecture": disassembly.architecture,
82-
"base_addr": disassembly.base_addr,
83-
"bitness": disassembly.bitness,
84-
"buffer_size": len(disassembly.binary),
85-
"code_areas": disassembly.code_areas,
86-
"disassembly_errors": disassembly.errors,
87-
"execution_time": disassembly.getAnalysisDuration(),
88-
"identified_alignment": disassembly.identified_alignment,
89-
"metadata" : {
90-
"message": "Analysis finished regularly.",
91-
"family": "",
92-
"filename": "",
93-
"timestamp": datetime.datetime.utcnow().strftime("%Y-%m-%dT%H-%M-%S"),
94-
},
95-
"sha256": hashlib.sha256(disassembly.binary).hexdigest(),
96-
"smda_version": self.config.VERSION,
97-
"status": disassembly.getAnalysisOutcome(),
98-
"summary": stats.calculate(),
99-
"xcfg": disassembly.collectCfg(),
100-
}
74+
def _createErrorReport(self, start, exception):
75+
report = SmdaReport()
76+
report.smda_version = self.config.VERSION
77+
report.status = "error"
78+
report.execution_time = self._getDurationInSeconds(start, datetime.datetime.utcnow())
79+
report.message = traceback.format_exc(exception)
10180
return report

0 commit comments

Comments
 (0)