Skip to content

Commit 812722f

Browse files
author
Daniel Plohmann (jupiter)
committed
fixed bug and made ApiScout ready for PyPI
1 parent eca11b0 commit 812722f

File tree

8 files changed

+123
-90
lines changed

8 files changed

+123
-90
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,3 +63,4 @@ ENV/
6363
# other
6464
dbs/*.csv
6565
config.py
66+
.pylintrc

Makefile

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,18 @@
11
init:
22
pip install -r requirements.txt
3-
3+
package:
4+
rm -rf dist/*
5+
python3 setup.py sdist
6+
publish:
7+
python3 -m twine upload dist/*
8+
pylint:
9+
python3 -m pylint --rcfile=.pylintrc apiscout
410
test:
5-
nosetests tests
11+
python3 -m nose
12+
test-coverage:
13+
python3 -m nose --with-coverage --cover-erase --cover-html-dir=./coverage-html --cover-html --cover-package=apiscout
614
clean:
715
find . | grep -E "(__pycache__|\.pyc|\.pyo$\)" | xargs rm -rf
16+
rm -rf .coverage
17+
rm -rf coverage-html
18+
rm -rf dist/*

README.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# ApiScout
2+
3+
This project aims at simplifying Windows API import recovery.
4+
As input, arbitrary memory dumps for a known environment can be processed (please note: a reference DB has to be built first, using apiscout/db_builder).
5+
The output is an ordered list of identified Windows API references with some meta information, and an ApiVector fingerprint.
6+
7+
* scout.py -- should give a good outline on how to work with the library.
8+
* ida_scout.py -- is a convenience GUI wrapper for use in IDA Pro.
9+
* match.py -- demonstrates how ApiVectors can be matched against each other and collections of fingerprints.
10+
* collect.py -- builds a database of WinAPI fingerprints (ApiVectors) that can be used for matching.
11+
* export.py -- generates ApiQR diagrams that visualize ApiVectors.
12+
* update.py -- pull the most recent ApiVector DB from Malpedia (requires Malpedia account / API token).
13+
14+
The code should be fully compatible with Python 2 and 3.
15+
There is a blog post describing ApiScout in more detail: http://byte-atlas.blogspot.com/2017/04/apiscout.html.
16+
Also, another blog post explaining how ApiVectors are constructed and stored: https://byte-atlas.blogspot.com/2018/04/apivectors.html.
17+
We also presented a paper at Botconf 2018 that describes the ApiScout methodology in-depth, including an evaluation over Malpedia: https://journal.cecyf.fr/ojs/index.php/cybin/article/view/20/23
18+
19+
## Version History
20+
21+
* 2020-06-30: v1.1.0 - Now using LIEF for import table parsing. Fixed bug which would not produce ApiVectors when using import table parsing. ApiScout is now also available through PyPI.
22+
* 2020-03-03: Added a script to pull the most recent ApiVector DB from Malpedia (requires Malpedia account / API token).
23+
* 2020-03-02: Ported to IDA 7.4 (THX to @jenfrie).
24+
* 2020-02-18: DB Builder is now compatible up to Python 3.7 (THX to @elanfer).
25+
* 2019-10-08: Workaround for broken filtering of the API view in IDA 7.3 (THX to @enzok for pointing this out).
26+
* 2019-08-22: Fixed a bug where missing type info in IDA would lead to a crash (now gives an error message instead).
27+
* 2019-08-20: Added self-filter to eliminate pointers to own memory image that could be mistakenly treated as API references.
28+
* 2019-06-06: Added support for proper type reconstruction for annotated APIs in IDA Pro (THX to @FlxP0c)
29+
* 2019-05-15: Added numpy support for vector calculations (based on implementation provided by @garanews - THX!)
30+
* 2019-05-15: Fixed a bug in PE mapper where buffer would be shortened because of misinterpretation of section sizes.
31+
* 2019-01-23: QoL improvements: automated data folder deployment when used as module, logger initialization (THX to @jdval)
32+
* 2018-08-23: Fixed a bug in PE mapper where the PE header would be overwritten by (empty) section data.
33+
* 2018-08-21: Added functionality that allows to use import table information instead of crawling for references.
34+
* 2018-07-31: Fixed convenience functions to create/export vectors from/to lists and dicts, added test coverage.
35+
* 2018-07-23: WARNING: Change in Apivector format -- Introduced sorted ApiVectors which are even more space efficient (20%+).
36+
* 2018-06-25: Fixed incompatibility with IDA Pro 7.0+ (THX to @nazywam!)
37+
* 2018-05-23: Added further semantic context groups (THX to Quoscient.io)
38+
* 2018-03-27: Heuristic estimation of Windows API reference counts added
39+
* 2018-03-06: ApiQR visualization of vector results (C-1024)
40+
* 2017-11-28: Added own import table parser to enrich result information
41+
* 2017-08-24: Multi-Segment support in IDA Pro (THX to @nazywam!)
42+
* 2017-05-31: Added Windows 7 SP1 64bit import DB (compatible to Malpedia)
43+
44+
## Credits
45+
46+
The idea has previously gone through multiple iterations until reaching this refactored release.
47+
Thanks to Thorsten Jenke and Steffen Enders for their previous endeavours and evaluating a proof-of-concept of this method.
48+
More thanks to Steffen Enders for his work on the visualization of ApiQR diagrams.
49+
Also thanks to Ero Carrera for pefile and Elias Bachaalany for the IDA Python AskUsingForm template. :)
50+
Additionally many thanks to Andrea Garavaglia for his performance benchmarks that lead to drastic speedups in the applied matching!
51+
52+
53+
Pull requests welcome! :)

README.rst

Lines changed: 0 additions & 55 deletions
This file was deleted.

apiscout/ApiScout.py

Lines changed: 34 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,12 @@
3030
import operator
3131
import logging
3232

33+
try:
34+
import lief
35+
except:
36+
print("lief is not installed! We recommend installing lief to improve import table parsing capabilities of ApiScout!")
37+
lief = None
38+
3339
from .ImportTableLoader import ImportTableLoader
3440
from .ApiVector import ApiVector
3541
from .PeTools import PeTools
@@ -54,6 +60,7 @@ def __init__(self, db_filepath=None):
5460
if db_filepath:
5561
self.loadDbFile(db_filepath)
5662
self._apivector = ApiVector()
63+
self.loadWinApi1024()
5764

5865
def loadDbFile(self, db_filepath):
5966
api_db = {}
@@ -88,7 +95,10 @@ def loadDbFile(self, db_filepath):
8895
LOG.info("loaded %d exports from %d DLLs (%s) with %d potential collisions.", num_apis_loaded, len(api_db["dlls"]), api_db["os_name"], num_collisions)
8996
self.api_maps[api_db["os_name"]] = api_map
9097

91-
def loadWinApi1024(self, winapi1024_filepath):
98+
def loadWinApi1024(self, winapi1024_filepath=None):
99+
if winapi1024_filepath is None:
100+
this_dir = os.path.abspath(os.path.join(os.path.dirname(__file__)))
101+
winapi1024_filepath = this_dir + os.sep + "data" + os.sep + "winapi1024v1.txt"
92102
self._apivector = ApiVector(winapi1024_filepath)
93103

94104
def _resolveApiByAddress(self, api_map_name, absolute_addr):
@@ -177,20 +187,30 @@ def _getCodeReferences(self, binary):
177187
return references
178188

179189
def evaluateImportTable(self, binary, is_unmapped=True):
190+
self._binary_length = len(binary)
180191
results = {"import_table": []}
181-
mapped_binary = binary
182-
if is_unmapped:
183-
LOG.debug("Mapping unmapped binary before processing")
184-
mapped_binary = PeTools.mapBinary(binary)
185-
bitness = PeTools.getBitness(mapped_binary)
186-
self._import_table = None
187-
self._parseImportTable(mapped_binary)
188-
references = self._getCodeReferences(mapped_binary)
189-
for offset, import_entry in sorted(self._import_table.items()):
190-
ref_count = 1
191-
if bitness:
192-
ref_count = 1 + references[bitness][offset] if offset in references[bitness] else 1
193-
results["import_table"].append((offset + self.load_offset, 0, import_entry["dll_name"].lower() + "_0x0", import_entry["name"], bitness, True, ref_count))
192+
if lief:
193+
lief_binary = lief.parse(bytearray(binary))
194+
bitness = 32 if lief_binary.header.machine == lief.PE.MACHINE_TYPES.I386 else 64
195+
for imported_library in lief_binary.imports:
196+
for func in imported_library.entries:
197+
if func.name:
198+
results["import_table"].append((func.iat_address + self.load_offset, 0xFFFFFFFF, imported_library.name.lower() + "_0x0", func.name, bitness, True, 0))
199+
else:
200+
# fallback using the old method and out own import table parser
201+
mapped_binary = binary
202+
if is_unmapped:
203+
LOG.debug("Mapping unmapped binary before processing")
204+
mapped_binary = PeTools.mapBinary(binary)
205+
bitness = PeTools.getBitness(mapped_binary)
206+
self._import_table = None
207+
self._parseImportTable(mapped_binary)
208+
references = self._getCodeReferences(mapped_binary)
209+
for offset, import_entry in sorted(self._import_table.items()):
210+
ref_count = 1
211+
if bitness:
212+
ref_count = 1 + references[bitness][offset] if offset in references[bitness] else 1
213+
results["import_table"].append((offset + self.load_offset, 0xFFFFFFFF, import_entry["dll_name"].lower() + "_0x0", import_entry["name"], bitness, True, ref_count))
194214
return results
195215

196216
def crawl(self, binary):

requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,4 @@ nose
22
Pillow
33
numpy
44
requests
5+
lief

scout.py

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -44,10 +44,6 @@ def get_all_db_files():
4444
return [db_dir + fn for fn in os.listdir(db_dir) if fn.endswith(".json")]
4545

4646

47-
def get_winapi1024_path():
48-
return get_this_dir() + os.sep + "apiscout" + os.sep + "data" + os.sep + "winapi1024v1.txt"
49-
50-
5147
def get_base_addr(args):
5248
if args.base_addr:
5349
return int(args.base_addr, 16) if args.base_addr.startswith("0x") else int(args.base_addr)
@@ -88,8 +84,6 @@ def main():
8884
db_paths = get_all_db_files()
8985
for db_path in db_paths:
9086
scout.loadDbFile(db_path)
91-
# load WinApi1024 vector
92-
scout.loadWinApi1024(get_winapi1024_path())
9387
# scout the binary
9488
results = {}
9589
if args.import_table_only:

setup.py

Lines changed: 21 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -2,23 +2,31 @@
22

33
from setuptools import setup, find_packages
44

5-
6-
with open('README.rst') as f:
7-
README = f.read()
8-
9-
with open('LICENSE') as f:
10-
LICENSE = f.read()
5+
with open("README.md", "r") as fh:
6+
long_description = fh.read()
117

128
setup(
139
name='apiscout',
14-
version='1.0.1',
15-
description='Windows API recovery.',
16-
long_description=README,
10+
version='1.1.0',
11+
description='A library for Windows API usage recovery and similarity assessment with focus on memory dumps.',
12+
long_description_content_type="text/markdown",
13+
long_description=long_description,
1714
author='Daniel Plohmann',
1815
author_email='[email protected]',
1916
url='https://github.com/daniel-plohmann/apiscout',
20-
license=LICENSE,
21-
# packages=find_packages(exclude=('tests', 'docs')),
22-
packages = ["apiscout"],
23-
package_data={"apiscout": ["data/winapi1024v1.txt"]},
17+
license="BSD 2-Clause",
18+
packages=find_packages(exclude=('tests', 'dbs')),
19+
package_data={'apiscout': ['data/winapi1024v1.txt', 'data/winapi_contexts.csv', 'data/html_frame.html']},
20+
data_files=[
21+
('', ['LICENSE']),
22+
],
23+
classifiers=[
24+
"Development Status :: 4 - Beta",
25+
"License :: OSI Approved :: BSD License",
26+
"Operating System :: OS Independent",
27+
"Programming Language :: Python :: 2.7",
28+
"Programming Language :: Python :: 3",
29+
"Topic :: Security",
30+
"Topic :: Software Development :: Disassemblers",
31+
],
2432
)

0 commit comments

Comments
 (0)