api_key_detector

Neural Network Based, Automatic API Key Detector

A Multilayer-Perceptron-based system, able to identify API Key strings with an accuracy of over 99%.

For technical details, check out my thesis (Automatic extraction of API Keys from Android applications) and, in particular, Chapter 3 of the work.

Requirements

Python 3.5+
Modules in requirements.txt (use pip3 to install)

pip install -r requirements.txt

Installation

$ git clone https://github.com/alessandrodd/api_key_detector.git
$ pip3 install -r api_key_detector/requirements.txt
$ python3 -m api_key_detector

Example Library Usage

>>> from api_key_detector import detector
>>> test = ["justsomething", "reallynothingimportant", "AizaSyDtEV5rwG_F1jvyj6WVlOOzD2vZa8DEpLE","eqwioqweioqiwoe"]
>>> detector.detect_api_keys(test)
[False, False, True, False]
>>> detector.filter_api_keys(test)
['AizaSyDtEV5rwG_F1jvyj6WVlOOzD2vZa8DEpLE']

Commandline Usage

A commandline interface can be used to test the library functionalities

usage: api_key_detector [-h] [--debug] [--test] [--entropy] [--sequentiality]
                   [--gibberish] [--charset-length] [--words-percentage]
                   [--string STRING | -a | -e | -s | -g]
                   [--generate-training-set] [--plot-training-set]
                   [--api-key-files API_KEY_FILES [API_KEY_FILES ...]]
                   [--generic-text-files GENERIC_TEXT_FILES [GENERIC_TEXT_FILES ...]]
                   [--output-file DUMP_FILE] [--filter-apikeys]
                   [--detect-apikeys]

A python program that detects API Keys

optional arguments:
  -h, --help            show this help message and exit
  --debug               Print debug information
  --test                Test mode; calculates all features for strings in
                        stdin
  --entropy             Calculates the charset-normalized Shannon Entropy for
                        strings in stdin
  --sequentiality       Calculates the Sequentiality Index for strings in
                        stdin
  --gibberish           Calculates the Gibberish Index for strings in stdin
  --charset-length      Calculates the Induced Charset Length for strings in
                        stdin
  --words-percentage    Calculates the percentage of dictionary words for each
                        string in stdin
  --string STRING       Calculate all features for a single string, to be used
                        in conjunction with --test.
  -a                    Sort in alphabetical order, ascending
  -e                    Sort by entropy, ascending
  -s                    Sort by sequentiality, ascending
  -g                    Sort by gibberish index, ascending

  --generate-training-set
                        Generate training set for string classifier. Needs
                        --api-key-files, --generic-text-files and --output-
                        file to be specified
  --plot-training-set   Generate a 3d scatterplot for the training set. Needs
                        --api-key-files, --generic-text-files and --output-
                        file to be specified
  --api-key-files API_KEY_FILES [API_KEY_FILES ...]
                        List of files containing valid api-key examples, one
                        for each line
  --generic-text-files GENERIC_TEXT_FILES [GENERIC_TEXT_FILES ...]
                        List of files containing generic text examples that
                        DOESN'T contain any Api Key
  --output-file DUMP_FILE
                        Where to output the training set file

  --filter-apikeys      Filter potential apikeys from strings in stdin.
  --detect-apikeys      Detect potential apikeys from strings in stdin.

Config File Explained

config.json

dump => Where to save the trained Neural Network. Delete it to retrain the algorithm

min_key_length => The minimum length of an API Key

blacklists => Txt files containing strings (one for each line) that should never be considered as API Keys

wordlists => Txt files containing real words (one for each line), used to detect words inside strings

word_content_threshold => If a potential API Key string is made of a fraction of word_content_threshold real words, the API Key is discarded

api_learnsets => Txt files containing API Keys (one for each line), used to train the Neural Network

text_learnsets => Txt files containing generic strings (no API Keys, one for each line), used to train the Neural Network

good_test => Same as api_learnsets, but used to test the Neural Network

bad_test => Same as text_learnsets, but used to test the Neural Network

re_train => If true, than the Neural Network gets re-trained during initialization

logging => Used to config logging capabilities, see here

gibberish_detector/config.json

dump => Where to save the trained algorithm. Delete it to retrain the algorithm

learnsets => Txt files containing language-specific text (e.g. books), used to calculate the transition probabilities from each letter to another

good_test => Txt files containing syntactically correct sentences, used to test the algorithm

bad_test => Txt files containing gibberish, used to test the algorithm

re_train => If true, than the transition probabilities gets re-computed during initialization

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
datasets		datasets
gibberish_detector		gibberish_detector
my_tools		my_tools
test_results		test_results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
__main__.py		__main__.py
charset.py		charset.py
classifier_singleton.py		classifier_singleton.py
classifiers_test.py		classifiers_test.py
config.py		config.py
config.yml		config.yml
dataset_plotter.py		dataset_plotter.py
default_conf.json		default_conf.json
detector.py		detector.py
detector_config.py		detector_config.py
entropy.py		entropy.py
log_config.json		log_config.json
neuralnets_test.py		neuralnets_test.py
requirements.txt		requirements.txt
sequentiality.py		sequentiality.py
string_classifier.pki		string_classifier.pki
string_classifier.py		string_classifier.py
strings_filter.py		strings_filter.py
strings_filter_singleton.py		strings_filter_singleton.py
words_finder.py		words_finder.py
words_finder_singleton.py		words_finder_singleton.py

License

alessandrodd/api_key_detector

Folders and files

Latest commit

History

Repository files navigation

api_key_detector

Requirements

Installation

Example Library Usage

Commandline Usage

Config File Explained

config.json

gibberish_detector/config.json

About

Topics

Resources

License

Stars

Watchers

Forks

Languages