danielplohmann
diff --git a/‎LICENSE
Lines changed: 1 addition & 1 deletion b/‎LICENSE
Lines changed: 1 addition & 1 deletion
diff --git a/‎README.md
Lines changed: 46 additions & 8 deletions b/‎README.md
Lines changed: 46 additions & 8 deletions
diff --git a/‎analyze.py
Lines changed: 16 additions & 12 deletions b/‎analyze.py
Lines changed: 16 additions & 12 deletions
diff --git a/‎config.py
Lines changed: 0 additions & 35 deletions b/‎config.py
Lines changed: 0 additions & 35 deletions
diff --git a/‎export.py
Lines changed: 2 additions & 5 deletions b/‎export.py
Lines changed: 2 additions & 5 deletions
diff --git a/‎setup.py
Lines changed: 20 additions & 8 deletions b/‎setup.py
Lines changed: 20 additions & 8 deletions
diff --git a/‎smda/Disassembler.py
Lines changed: 35 additions & 56 deletions b/‎smda/Disassembler.py
Lines changed: 35 additions & 56 deletions
@@ -1,4 +1,4 @@
-Copyright (c) 2018, Daniel Plohmann and Steffen Enders
+Copyright (c) 2018-2020, Daniel Plohmann and Steffen Enders
 
 All rights reserved.
 
 
@@ -7,32 +7,70 @@ As input, arbitrary memory dumps (ideally with known base address) can be proces
 The output is a collection of functions, basic blocks, and instructions with their respective edges between blocks and functions (in/out).
 Optionally, references to the Windows API can be inferred by using the ApiScout method.
 
-To get an impression how to work with the library, check the demo script:
+## Installation
 
-* analyze.py -- example usage: perform disassembly and optionally store results in JSON to a given output path.
+With version 1.2.0, we have finally simplified things by moving to PyPI!  
+So installation now is as easy as:
+
+```
+$ pip install smda
+```
+
+## Usage
+
+A typical workflow using SMDA could like this:
+
+```
+>>> from smda.Disassembler import Disassembler
+>>> disassembler = Disassembler()
+>>> report = disassembler.disassembleFile("/bin/cat")
+>>> print(report)
+ 0.777s -> (architecture: intel.64bit, base_addr: 0x00000000): 86 functions
+>>> for fn in report.getFunctions():
+...     print(fn)
+...     for ins in fn.getInstructions():
+...         print(ins)
+...
+0x00001720: (->   1,    1->)   3 blocks,    7 instructions.
+0x00001720: (      4883ec08) - sub rsp, 8
+0x00001724: (488b05bd682000) - mov rax, qword ptr [rip + 0x2068bd]
+0x0000172b: (        4885c0) - test rax, rax
+0x0000172e: (          7402) - je 0x1732
+0x00001730: (          ffd0) - call rax
+0x00001732: (      4883c408) - add rsp, 8
+0x00001736: (            c3) - ret 
+0x00001ad0: (->   1,    4->)   1 blocks,   12 instructions.
+[...]
+>>> json_report = report.toDict()
+``` 
+
+There is also a demo script:
+
+* analyze.py -- example usage: perform disassembly on a file or memory dump and optionally store results in JSON to a given output path.
 
 The code should be fully compatible with Python 2 and 3.
 Further explanation on the innerworkings follow in separate publications but will be referenced here.
 
-To take full advantage of SMDA's capabilities, optionally install:
+To take full advantage of SMDA's capabilities, make sure to (optionally) install:
  * lief 
  * pdbparse (currently as fork from https://github.com/VPaulV/pdbparse to support Python3)
 
 ## Version History
 
- * 2020-04-28: Several improvements, including: x64 jump table handling, better data flow handling for calls using registers and tailcalls, extended list of common prologues based on much more groundtruth data, extended padding instruction list for gap function discovery, adjusted weights in candidate priority score, filtering code areas based on section tables, using exported symbols as candidates, new function output metadata: confidence score based on instruction mnemonic histogram, PIC hash based on escaped binary instruction sequence
+ * 2020-04-29: v1.2.0 - Restructured config.py into smda/SmdaConfig.py to similfy usage and now available via PyPI! The smda/Disassembler.py now emits a report object (smda.common.SmdaReport) that allows direct (pythonic) interaction with the results - a JSON can still be easily generated by using toDict() on the report.
+ * 2020-04-28: v1.1.0 - Several improvements, including: x64 jump table handling, better data flow handling for calls using registers and tailcalls, extended list of common prologues based on much more groundtruth data, extended padding instruction list for gap function discovery, adjusted weights in candidate priority score, filtering code areas based on section tables, using exported symbols as candidates, new function output metadata: confidence score based on instruction mnemonic histogram, PIC hash based on escaped binary instruction sequence
  * 2020-03-10: Various minor fixes and QoL improvements.
  * 2019-08-20: IdaExporter is now handling failed instruction conversion via capstone properly.
  * 2019-08-19: Minor fix for crashes caused by PDB parser.
- * 2019-08-05: SMDA can now export reports from IDA Pro (requires capstone to be available for idapython).
+ * 2019-08-05: v1.0.3 - SMDA can now export reports from IDA Pro (requires capstone to be available for idapython).
  * 2019-06-13: PDB symbols for functions are now resolved if given a PDB file using parameter "-d" (THX to @VPaulV).
  * 2019-05-15: Fixed a bug in PE mapper where buffer would be shortened because of misinterpretation of section sizes.
- * 2019-01-28: ELF symbols for functions are now resolved, if present in the file. Also "-m" parameter changed to "-p" to imply parsing instead of just mapping (THX: @VPaulV).
+ * 2019-02-14: v1.0.2 - ELF symbols for functions are now resolved, if present in the file. Also "-m" parameter changed to "-p" to imply parsing instead of just mapping (THX: @VPaulV).
  * 2018-12-12: all gcc jump table styles are now parsed correctly. 
  * 2018-11-26: Better handling of multibyte NOPs, ELF loader now provides base addr.
  * 2018-09-28: We now have functional PE/ELF loaders.
- * 2018-07-09: Performance improvements.
- * 2018-07-01: Initial Release.
+ * 2018-07-09: v1.0.1 - Performance improvements.
+ * 2018-07-01: v1.0.0 - Initial Release.
 
 
 ## Credits
 
@@ -4,24 +4,21 @@
 import os
 import re
 
-import config
+from smda.SmdaConfig import SmdaConfig
 from smda.Disassembler import Disassembler
 
-LOGGER = logging.getLogger(__name__)
-
-
 def parseBaseAddrFromArgs(args):
     if args.base_addr:
         parsed_base_addr = int(args.base_addr, 16) if args.base_addr.startswith("0x") else int(args.base_addr)
-        LOGGER.info("using provided base address: 0x%08x %d", parsed_base_addr, parsed_base_addr)
+        logging.info("using provided base address: 0x%08x %d", parsed_base_addr, parsed_base_addr)
         return parsed_base_addr
     # try to infer base addr from filename:
     baddr_match = re.search(re.compile("0x(?P<base_addr>[0-9a-fA-F]{8,16})$"), args.input_path)
     if baddr_match:
         parsed_base_addr = int(baddr_match.group("base_addr"), 16)
-        LOGGER.info("Parsed base address from file name: 0x%08x %d", parsed_base_addr, parsed_base_addr)
+        logging.info("Parsed base address from file name: 0x%08x %d", parsed_base_addr, parsed_base_addr)
         return parsed_base_addr
-    LOGGER.warning("No base address recognized, using 0.")
+    logging.warning("No base address recognized, using 0.")
     return 0
 
 
@@ -42,20 +39,27 @@ def readFileContent(file_path):
 
     ARGS = PARSER.parse_args()
     if ARGS.input_path:
-        REPORT = {}
+        SMDA_REPORT = None
         INPUT_FILENAME = ""
         if os.path.isfile(ARGS.input_path):
+            # optionally create and set up a config, e.g. when using ApiScout profiles for WinAPI import usage discovery
+            config = SmdaConfig()
+            config.API_COLLECTION_FILES = {
+                "win_7": os.sep.join([config.PROJECT_ROOT, "data", "apiscout_win7_prof-n_sp1.json"])
+            }
             DISASSEMBLER = Disassembler(config)
             print("now analyzing {}".format(ARGS.input_path))
             INPUT_FILENAME = os.path.basename(ARGS.input_path)
             if ARGS.parse_header:
-                REPORT = DISASSEMBLER.disassembleFile(ARGS.input_path, pdb_path=ARGS.pdb_path)
+                SMDA_REPORT = DISASSEMBLER.disassembleFile(ARGS.input_path, pdb_path=ARGS.pdb_path)
             else:
                 BUFFER = readFileContent(ARGS.input_path)
                 BASE_ADDR = parseBaseAddrFromArgs(ARGS)
-                REPORT = DISASSEMBLER.disassembleBuffer(BUFFER, BASE_ADDR)
-        if REPORT and os.path.isdir(ARGS.output_path):
+                SMDA_REPORT = DISASSEMBLER.disassembleBuffer(BUFFER, BASE_ADDR)
+                SMDA_REPORT.filename = os.path.basename(ARGS.input_path)
+            print(SMDA_REPORT)
+        if SMDA_REPORT and os.path.isdir(ARGS.output_path):
             with open(ARGS.output_path + os.sep + INPUT_FILENAME + ".smda", "w") as fout:
-                json.dump(REPORT, fout, indent=1, sort_keys=True)
+                json.dump(SMDA_REPORT.toDict(), fout, indent=1, sort_keys=True)
     else:
         PARSER.print_help()
@@ -2,11 +2,8 @@
 import logging
 import os
 
-import config
 from smda.Disassembler import Disassembler
 
-LOGGER = logging.getLogger(__name__)
-
 
 def detectBackend():
     backend = ""
@@ -25,11 +22,11 @@ def detectBackend():
 if __name__ == "__main__":
     BACKEND, VERSION = detectBackend()
     if BACKEND == "IDA":
-        DISASSEMBLER = Disassembler(config, backend=BACKEND)
+        DISASSEMBLER = Disassembler(backend=BACKEND)
         REPORT = DISASSEMBLER.disassembleBuffer(None, None)
         output_path = idautils.GetIdbDir()
         with open(output_path + ".smda", "wb") as fout:
             json.dump(REPORT, fout, indent=1, sort_keys=True)
-            LOGGER.info("Output saved to: %s.smda", output_path)
+            logging.info("Output saved to: %s.smda", output_path)
     else:
         raise Exception("No supported backend found.")
@@ -1,23 +1,35 @@
 # -*- coding: utf-8 -*-
 
 from setuptools import setup, find_packages
-import config
-
-
-with open('README.rst') as f:
-    README = f.read()
 
 with open('LICENSE') as f:
     LICENSE = f.read()
 
+with open("README.md", "r") as fh:
+    long_description = fh.read()
+
 setup(
     name='smda',
-    version=config.VERSION,
+    # note to self: always change this in config as well.
+    version='1.2.0',
     description='A recursive disassmbler optimized for CFG recovery from memory dumps. Based on capstone.',
-    long_description=README,
+    long_description_content_type="text/markdown",
+    long_description=long_description,
     author='Daniel Plohmann',
     author_email='[email protected]',
     url='https://github.com/danielplohmann/smda',
     license=LICENSE,
-    packages=find_packages(exclude=('tests', 'docs'))
+    packages=find_packages(exclude=('tests', 'docs')),
+    data_files = [
+        ('', ['LICENSE']),
+    ],
+    classifiers=[
+            "Development Status :: 4 - Beta",
+            "License :: OSI Approved :: BSD License",
+            "Operating System :: OS Independent",
+            "Programming Language :: Python :: 2.7",
+            "Programming Language :: Python :: 3",
+            "Topic :: Security",
+            "Topic :: Software Development :: Disassemblers",
+        ],
 )
@@ -7,22 +7,28 @@
 from .intel.IntelDisassembler import IntelDisassembler
 from .ida.IdaExporter import IdaExporter
 from smda.utility.FileLoader import FileLoader
+from smda.SmdaConfig import SmdaConfig
+from smda.common.BinaryInfo import BinaryInfo
+from smda.common.SmdaReport import SmdaReport
 
 class Disassembler(object):
 
-    def __init__(self, config, backend="intel"):
+    def __init__(self, config=None, backend="intel"):
+        if config is None:
+            config = SmdaConfig()
         self.config = config
         self.disassembler = None
         if backend == "intel":
-            self.disassembler = IntelDisassembler(config)
+            self.disassembler = IntelDisassembler(self.config)
         elif backend == "IDA":
-            self.disassembler = IdaExporter(config)
-        self.disassembly = None
+            self.disassembler = IdaExporter(self.config)
         self._start_time = None
         self._timeout = 0
+        # cache the last DisassemblyResult
+        self.disassembly = None
 
     def _getDurationInSeconds(self, start_ts, end_ts):
-        return (self.analysis_end_ts - self.analysis_start_ts).seconds + ((self.analysis_end_ts - self.analysis_start_ts).microseconds / 1000000.0)
+        return (end_ts - start_ts).seconds + ((end_ts - start_ts).microseconds / 1000000.0)
 
     def _callbackAnalysisTimeout(self):
         if not self._timeout:
@@ -32,70 +38,43 @@ def _callbackAnalysisTimeout(self):
 
     def disassembleFile(self, file_path, pdb_path=""):
         loader = FileLoader(file_path, map_file=True)
-        base_addr = loader.getBaseAddress()
-        bitness = loader.getBitness()
         file_content = loader.getData()
-        code_areas = loader.getCodeAreas()
+        binary_info = BinaryInfo(file_content)
+        binary_info.file_path = file_path
+        binary_info.base_addr = loader.getBaseAddress()
+        binary_info.bitness = loader.getBitness()
+        binary_info.code_areas = loader.getCodeAreas()
         start = datetime.datetime.utcnow()
         try:
-            self.disassembler.setFilePath(file_path)
-            self.disassembler.addPdbFile(pdb_path, base_addr)
-            self.disassembler.setCodeAreas(code_areas)
-            disassembly = self.disassemble(file_content, base_addr, bitness=bitness, timeout=self.config.TIMEOUT)
-            print(disassembly)
-            report = self.getDisassemblyReport(disassembly)
-            report["metadata"]["filename"] = os.path.basename(file_path)
+            self.disassembler.addPdbFile(binary_info, pdb_path)
+            smda_report = self._disassemble(binary_info, timeout=self.config.TIMEOUT)
         except Exception as exc:
             print("-> an error occured (", str(exc), ").")
-            report = {"status":"error", "meta": {"traceback": traceback.format_exc(exc)}, "execution_time": self._getDurationInSeconds(start, datetime.datetime.utcnow())}
-        return report
+            smda_report = self._createErrorReport(start, exc)
+        return smda_report
 
     def disassembleBuffer(self, file_content, base_addr, bitness=None):
         start = datetime.datetime.utcnow()
         try:
-            self.disassembler.setFilePath("")
-            disassembly = self.disassemble(file_content, base_addr, bitness, timeout=self.config.TIMEOUT)
-            print(disassembly)
-            report = self.getDisassemblyReport(disassembly)
-            report["metadata"]["filename"] = ""
+            binary_info = BinaryInfo(file_content)
+            binary_info.base_addr = base_addr
+            binary_info.bitness = bitness
+            smda_report = self._disassemble(binary_info, timeout=self.config.TIMEOUT)
         except Exception as exc:
             print("-> an error occured (", str(exc), ").")
-            report = {"status":"error", "meta": {"traceback": traceback.format_exc(exc)}, "execution_time": self._getDurationInSeconds(start, datetime.datetime.utcnow())}
-        return report
+            smda_report = self._createErrorReport(start, exc)
+        return smda_report
 
-    def disassemble(self, binary, base_addr, bitness=None, timeout=0):
+    def _disassemble(self, binary_info, timeout=0):
         self._start_time = datetime.datetime.utcnow()
         self._timeout = timeout
-        self.disassembly = self.disassembler.analyzeBuffer(binary, base_addr, bitness, self._callbackAnalysisTimeout)
-        return self.disassembly
+        self.disassembly = self.disassembler.analyzeBuffer(binary_info, self._callbackAnalysisTimeout)
+        return SmdaReport(self.disassembly)
 
-    def getDisassemblyReport(self, disassembly=None):
-        report = {}
-        if disassembly is None:
-            if self.disassembly is not None:
-                disassembly = self.disassembly
-            else:
-                return {}
-        stats = DisassemblyStatistics(disassembly)
-        report = {
-            "architecture": disassembly.architecture,
-            "base_addr": disassembly.base_addr,
-            "bitness": disassembly.bitness,
-            "buffer_size": len(disassembly.binary),
-            "code_areas": disassembly.code_areas,
-            "disassembly_errors": disassembly.errors,
-            "execution_time": disassembly.getAnalysisDuration(),
-            "identified_alignment": disassembly.identified_alignment,
-            "metadata" : {
-                "message": "Analysis finished regularly.",
-                "family": "",
-                "filename": "",
-                "timestamp": datetime.datetime.utcnow().strftime("%Y-%m-%dT%H-%M-%S"),
-            },
-            "sha256": hashlib.sha256(disassembly.binary).hexdigest(),
-            "smda_version": self.config.VERSION,
-            "status": disassembly.getAnalysisOutcome(),
-            "summary": stats.calculate(),
-            "xcfg": disassembly.collectCfg(),
-        }
+    def _createErrorReport(self, start, exception):
+        report = SmdaReport()
+        report.smda_version = self.config.VERSION
+        report.status = "error"
+        report.execution_time = self._getDurationInSeconds(start, datetime.datetime.utcnow())
+        report.message = traceback.format_exc(exception)
         return report
Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-Copyright (c) 2018, Daniel Plohmann and Steffen Enders`
	`1`	`+Copyright (c) 2018-2020, Daniel Plohmann and Steffen Enders`
`2`	`2`
`3`	`3`	`All rights reserved.`
`4`	`4`