|
1 |
| -# The Attacker IP Prioritizer (AIP) |
2 |
| -# version 2.1.0 (2022) |
| 1 | +# Attacker IP Prioritization (AIP) Tool |
| 2 | +The Attacker IP Prioritization (AIP) is a tool to generate IP blocklists based on network traffic captured from honeypot networks. Originally designed to create the blocklists for the [Stratosphere Blocklist Generation project](https://mcfp.felk.cvut.cz/publicDatasets/CTU-AIPP-BlackList/), it aims to generate an IoT-friendly blocklist. With the advent of 5G, IoT devices will be directly connected to the Internet instead of being protected by a router's firewall. Therefore we need blocklists that are small and portable and designed to block those IPs that are targeting IoT devices. The main models used to this end are the Prioritize Consistent and the Prioritize New. |
3 | 3 |
|
4 | 4 |
|
5 |
| -# The Idea |
| 5 | +Eventually, the project evolved, aiming to test new blocklists generation models beyond the PN and PC. The actual codebase allows a fast developing and testing of those new models, providing a common interface to access the attacks from several sensors deployed on the Public Internet, and a common set of metrics to compare the output of the models. |
6 | 6 |
|
7 |
| -The Attacker IP Prioritizer (AIP) algorithm aims to generate a IoT friendly blocklist. With the advent of 5G, many IoT devices are going to be directly connected to the internet instead of being protected by a routers firewall. Therefore we need blocklists that are small and portable, and designed to blocklist IPs that are targeting IoT. The IPs of interest, from a statistics point of view, should have a couple of recognizable features: |
8 | 7 |
|
9 |
| -First, they should be attacking more often than other IPs. In terms of our collected training data, we increase the priority of the IPs that attack more. |
| 8 | +Given a honeypot network in your organization, it should be easy to use AIP to generate your own local blocklists based on the traffic reaching the honeypots. |
10 | 9 |
|
11 |
| -Second, IPs should attack consistently. Namely, IPs should have a higher daily average of attacks and its standard deviation should be lower. |
| 10 | + |
12 | 11 |
|
13 |
| -Third, the average duration of the attacks should be longer. This is simply because larger and more advanced botnets are more organized and thorough, thus meaning they need to try more things once they get into our honeypots, thus increasing the length of their events. |
| 12 | +## Using the framework |
14 | 13 |
|
15 |
| -Fourth, IPs should be currently active. An IP that was last seen a few months ago would have its priority decreased in our list. |
| 14 | +## Creating new models |
16 | 15 |
|
17 |
| -Fifth, the number of bytes transferred and the number of packets sent and received will be greater. |
18 |
| - |
19 |
| -All five of these traits need to be included in the sorting process of AIP and each of them needs to be weighted since they are not of equal importance. Therefore, there is a need to build a prioritization algorithm that receives data flows and outputs information built on top of these six characteristics. |
20 |
| - |
21 |
| -# Data Source |
22 |
| - |
23 |
| -The program accepts a directory that contains data files from each day. You assign a directory for the program to look in every time it runs, and it checks if there are any new files to process. If there are, it processes the new files and remembers the names of the new files so that it does not process it the next time it runs. |
24 |
| - |
25 |
| -In terms of file format, it accepts a .csv file that has one IP per line, with each of the following data inputs for each IP on that line, separated by commas: |
26 |
| - |
27 |
| - Amount of events - Meaning the total connections to our honeypots originating from the given IP |
28 |
| - |
29 |
| - Total Duration - How long did this IP connect for the total of its events |
30 |
| - |
31 |
| - Average duration - The average length in seconds of all the connections per IP |
32 |
| - |
33 |
| - Amount of Bytes - Total bytes sent and received |
34 |
| - |
35 |
| - Average number of bytes - For bytes transferred in each connection per IP |
36 |
| - |
37 |
| - Total packets - Of all the connections per IP |
38 |
| - |
39 |
| - Average packets - Average packets sent per connection |
40 |
| - |
41 |
| - Last event time - UNIX time of the last time the IP tried to connect to something in the last 24 hours |
42 |
| - |
43 |
| - First event time - UNIX time of the first time the IP tried to connect in the last 24 hours |
44 |
| - |
45 |
| -For example, a single line in the file could look like this: |
46 |
| - |
47 |
| -"IPv4 Addrss",26049,"7415310","284.6","41808957","1605.0",284577,"10.92","157899154","1578968762.519" |
48 |
| - |
49 |
| -# The AIP Algorithm |
50 |
| - |
51 |
| -The AIP algorithm takes each of the flows from the input and uses its data to calculate eight values for each IP. The first seven values from the input data remain unchanged, number of events, total duration, average duration, number of bytes, the average number of bytes, total packets and average packets. However, the first event time and the number of events are used to calculate the average number of events per day the IP has had since it was first seen by the program, giving us a total of eight features as input for our algorithm. |
52 |
| - |
53 |
| -For each IP, each of the eight values is updated using the data from the current day and then saved to a file, called the absolute file. The absolute data file contains the values for all the IPs seen since the program was started. |
54 |
| - |
55 |
| -The next step is to feed the absolute data file, which has been updated with the last 24 hours of events, into the rating program. The rating program assigns each of the eight values a specific weight. These weights control the effect each value will have on the final score. The sum of all weights is one. |
56 |
| - |
57 |
| -Each feature is multiplied by its weight and then summed with the rest, as in a basic linear combination. Then the sum is multiplied by a time modifier. The program currently has three different modules each with its own time modifier, one prioritizing historically aggressive IPs, one prioritizing newer aggressive IPs and one only dealing with IPs seen in the last 24 hours. |
58 |
| - |
59 |
| -# Documentation |
60 |
| - |
61 |
| -Click [here](AIP-How-To-Guide.md) to check some examples of how the tool is used and the data models. |
| 16 | +## Data from the TOM |
0 commit comments