DSTI - Python project group

📖 Introduction

This project focuses on cybersecurity data analysis, specifically preparing a network attack log for machine learning applications. The goal is to classify attack, identify attack type, and develop models to detect anomalies or malicious activity.

DATASET

Dataset: cybersecurity_attacks.csv with 40,000 entries and 25 columns.

Data Content: Contains details such as source/destination IP, ports, protocols, packet lengths, and security alerts.

Initial Analysis: Checked data types, missing values, and data distribution. Some columns showed high cardinality (IP, payload data), while others had low variance and were deemed uninformative (e.g., Firewall Logs, IDS/IPS Alerts).

📊 Exploratory Data Analysis (EDA): Cleaning and analyzing network logs.

Anomaly Detection (IQR Method): Identified outliers in destination ports, packet lengths, and anomaly scores.

Correlation Analysis: Weak relationships among numerical features suggest limited linear dependencies.

Attack Type Distribution: Balanced distribution of attack types (DDoS, Malware, Intrusion).

Geolocation Analysis: No significant regional differences in network traffic patterns.

🔍 Feature Engineering: Extracting relevant features for model training.

Handling Missing Values: Imputed missing values, excluding uniform columns.

Timestamp Conversion: Converted timestamps, removed duplicates, and extracted time-based features.

Feature Engineering: Created new features such as Is Source Private, Is Night Traffic, Destination Port Category to enhance predictive capability.

🤖 Machine Learning Modeling: Building models to classify attacks type.

XGBoost: Best-performing model (~40% accuracy), though still low.

Random Forest: Inferior to XGBoost, especially in distinguishing attack types.

Logistic Regression: Simple model with the lowest accuracy (~33.8%).

💻 Streamlit App: A web interface for visualizing attack data and model predictions.

Tool Used: Streamlit for building an interactive web interface.

📹 Video Demo

🚀 How to Run the Project

Install the required Python libraries by running :
```
pip install -r requirements.txt
```

Run EDA Notebook

jupyter notebook ./EDA+ML/Cybersecurity_Analysis_Modeling-1.ipynb

🚀 Run the Streamlit app, follow these steps:

Install Streamlit (if you haven't already):
```
pip install streamlit
```
Navigate to the directory where your app.py file is located.
Run the app with the following command
```
streamlit run app.py
```
The app will open automatically in your default browser at http://localhost:8501. If not, open it manually.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
EDA+ML		EDA+ML
web interaction		web interaction
.DS_Store		.DS_Store
Project1_PythonML_A24.pdf		Project1_PythonML_A24.pdf
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DSTI - Python project group

📖 Introduction

DATASET

📹 Video Demo

🚀 How to Run the Project

🚀 Run the Streamlit app, follow these steps:

About

Uh oh!

Releases

Packages

Languages

quangtrung108/dsti_python_project

Folders and files

Latest commit

History

Repository files navigation

DSTI - Python project group

📖 Introduction

DATASET

📹 Video Demo

🚀 How to Run the Project

🚀 Run the Streamlit app, follow these steps:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages