Skip to content

Uncovering Trends Through Big Data Analytics Using Apache Spark and Azure Databricks

Notifications You must be signed in to change notification settings

Alireza-Foroughi-uk/BigData-Analytics-Spark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Big Data & Infrastructure Project 🌍📊🔍

Welcome to "Analyzing Crime and Education Data through a Data Lake Environment", a powerhouse project blending big data analytics with real-world impact! 🎯 Crafted by Alireza Foroughi at Ulster University’s London Campus, this project dives deep into the interplay of crime, education, and income using cutting-edge tools. 💡✨ Project Overview 📋

Mission? 🕵️‍♂️
Unravel the tangled web of crime and socioeconomic trends using big data! This research explores how education and income shape crime rates, guiding smarter resource allocation and policy-making. 🌱
Tech Stack: 🛠️
    Python 🐍
    Apache Spark 🔥
    Azure Databricks ☁️
    Azure Blob Storage 📦

Goal: 🎯 Identify patterns to boost safety and development, with a spotlight on regional differences like Asia vs. Europe! 🌏

Why This Matters 🌟

The Problem: 💔
Crime and education are key indicators of societal health. Understanding their link can transform communities by targeting low-income, high-crime areas with education boosts.
The Impact: 🌍
With insights into correlations (e.g., lower education = higher crime), this project paves the way for data-driven policies to build safer, more equitable societies! 🛡️

How It Works ⚙️

Data Powerhouse: 📥
Sourced crime, income, and education datasets from Kaggle—rich, reliable, and ready for action!
Data Lake Setup: 🌊
Azure Blob Storage teamed up with Databricks to create a seamless data pipeline, storing and processing massive datasets like a pro.
Spark Magic: 🔥
Apache Spark’s PySpark handled preprocessing, cleaning (bye-bye nulls!), and aggregation, making sense of millions of data points.
Visual Insights: 📈
Created stunning visuals—scatter plots, box plots, bar charts, and heatmaps—to reveal trends like Asia’s crime variability and Europe’s stability.

Key Findings: 🔎

Higher crime ties to lower education across regions.
Income heavily influences crime, with education as a key mitigator.
Population size? Less of a factor than income and education!

Tech Highlights 🌟

Scalability: 🔥 PySpark’s distributed computing handled large datasets with ease.
Integration: Azure Databricks and Blob Storage created a smooth, efficient workflow.
Visualization: Matplotlib and Seaborn turned raw data into actionable insights.

Results & Insights 📊

Correlations: 📉
Negative link between education and crime; positive link between low income and crime. Heatmaps proved it!
Regional Gems: 🌏
Asia shows higher crime variability than Europe, hinting at uneven education development.
Policy Wins: 🏆
Recommendations for targeted education investments in high-crime, low-income zones—let’s make a difference!

References 📚

Kaggle Datasets: Crime Rate, Income & Education.
Azure Docs: Blob Storage.
Databricks: Getting Started.
Apache Spark: Docs.

About

Uncovering Trends Through Big Data Analytics Using Apache Spark and Azure Databricks

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages