🏢 SQL Data Warehouse

This project demonstrates a comprehensive SQL Data Warehousing solution, built from scratch using Azure Data Studio and SQL Server Express. It simulates an enterprise data pipeline by ingesting raw data from source systems, performing structured transformations, and delivering business-ready data for reporting and analytics using Medallion Architecture. It reflects industry best practices in Data Engineering, ETL design, and SQL-based data modeling.

📖 Project Overview

This project involves:

Data Architecture: Designing a Modern Data Warehouse Using Medallion Architecture Bronze, Silver, and Gold layers.
ETL Pipelines: Extracting, transforming, and loading data from source systems into the warehouse.
Data Modeling: Developing fact and dimension tables optimized for analytical queries.

🚀 Project Requirements

Objective

Develop a modern data warehouse using SQL Server to consolidate sales data, enabling analytical reporting and informed decision-making.

Specifications

✅ Data Sources: Import data from two source systems (ERP and CRM) provided as CSV files.
✅ Data Quality: Cleanse and resolve data quality issues prior to analysis.
✅ Integration: Combine both sources into a single, user-friendly data model designed for analytical queries.
✅ Scope: Focus on the latest dataset only; historization of data is not required.
✅ Documentation: Provide clear documentation of the data model to support both business stakeholders and analytics teams.
✅ Business Use Cases:
- Customer segmentation and behavior analysis
- Product performance tracking
- Sales trends

🛠️ Tools & Technologies

This project utilizes the following tools and technologies for building, managing, and analyzing the SQL data warehouse:

Tool / Technology	Badge	Description
SQL Server Express		Relational database engine used to store and manage the data warehouse.
Azure Data Studio		SQL editor used for database development and management.
T-SQL (Transact-SQL)		SQL dialect for defining transformations, querying, and manipulating data.
Git & GitHub		Version control and project repository for managing code and documentation.
Star Schema Design		Dimensional modeling technique used for analytical querying.
Medallion Architecture		Bronze, Silver, and Gold layers for raw, cleaned, and business-ready data.
draw.io (diagrams.net)		Used to design architectural diagrams and data flow visuals.
Notion		Project planning and documentation hub for tracking milestones and tasks.

🏗️ Data Architecture

The project follows a Medallion Architecture consisting of three key layers: Bronze, Silver, and Gold layers:

✅ Bronze Layer: Stores raw data as-is from the source systems. Data is ingested from CSV Files into SQL Server Database.
✅ Silver Layer: This layer includes data cleansing, standardization, and normalization processes to prepare data for analysis.
✅ Gold Layer: Houses business-ready data modeled into a star schema required for reporting and analytics.

📊 Impact: Data Quality Metrics

This project demonstrates practical data profiling and cleansing, transforming messy raw data (Bronze) into clean, trusted analytics-ready data (Silver). The table below shows a breakdown of key data quality checks and transformations performed:

Table	Issue	Records Affected	Resolution
crm_cust_info	Duplicates	5	Duplicates removed, kept latest by `cst_create_date`
	Null customer IDs	3	Nulls excluded
	Unwanted spaces in names	32	Trimmed first & last names
	Inconsistent genders (`F`, `M`, `NULL`)	4,577 (`NULL`)	Standardized to `Male`/`Female`; NULL -> `N/A`
	Inconsistent marital status (`S`, `M`, `NULL`)	6 (`NULL`)	Standardized to `Single`/`Married`; NULL -> `N/A`
crm_prd_info	Null product costs	2	Set to `0`
	Invalid product lines (`M`, `R`, `S`, `T`, `NULL`)	17 (`NULL`)	Mapped codes to descriptive lines; `NULL` → `N/A`
	Invalid date orders	200	Recalculated end dates based on next start date
crm_sales_details	Null/invalid order dates	19	Fixed or set to `NULL`
	Data inconsistencies in sales, quantity, and price	35	Recalculated sales & price as needed
erp_cust_az12	Invalid birthdates	31	Invalid dates set to `NULL`
	Gender inconsistencies	1,484	Standardized to `Male`/`Female`; blanks -> `N/A`
erp_loc_a101	Country synonyms & blanks	~8,385	Merged synonyms (`US`, `USA` -> `United States`, `DE` -> `Germany`); blanks -> `N/A`
erp_px_cat_g1v2	Carriage returns	varied	Cleaned with `REPLACE()`

✅ Total records impacted:

18k+ CRM customers
18k+ ERP customers
60k+ sales transactions
13 country variations unified

✨ Gold Layer: Final Star Schema

The Gold Layer produces a clean, trusted star schema for analytics:

Gold View	Description
`gold.dim_customers`	Combines CRM & ERP data for unified customer dimension. Adds surrogate `customer_key`.
`gold.dim_products`	Enriches CRM product info with ERP product categories. Adds surrogate `product_key`.
`gold.fact_sales`	Combines sales transactions with product & customer dimensions for a consistent fact table.

📌 Key design: Uses ROW_NUMBER() to generate surrogate keys, fallback logic for missing values, and joins for dimension enrichment.

📂 Repository Structure

SQL-Data-Warehouse/
│
├── datasets/                           # Raw datasets used for the project (ERP and CRM data)
│
├── docs/                               # Project documentation and architecture details
│   ├── data_architecture.png           # Image file for the project's architecture
│   ├── data_catalog.md                 # Catalog of datasets, including field descriptions and metadata
│   ├── data_flow.png                   # Image file for the data flow diagram
│   ├── data_integration.png            # Image file for the data integration diagram
│   ├── data_model.png                  # Image file for data model (star schema)
│   ├── naming-conventions.md           # Consistent naming guidelines for tables, columns, and files
│
├── scripts/                            # SQL scripts for ETL and transformations
│   ├── bronze/                         # Scripts for extracting and loading raw data
│   ├── silver/                         # Scripts for cleaning and transforming data
│   ├── gold/                           # Scripts for creating analytical models
│   ├── database_init.sql               # Script for creating database and schemas
│
├── tests/                              # Test scripts and quality files
│
├── README.md                           # Project overview and instructions
├── LICENSE                             # License information for the repository
└── .gitignore                          # Files and directories to be ignored by Git

⚙️ How to Run This Project

Clone the Repository

git clone https://github.com/gloryodeyemi/SQL-Data-Warehouse.git
cd SQL-Data-Warehouse

Set Up SQL Server Environment

Install SQL Server Express and Azure Data Studio (if not already installed).

Initialize Database

Run scripts/database_init.sql to initialize the database and schemas.

Run Bronze Layer

Create Bronze tables using scripts/bronze/bronze_ddl.sql.
Load Bronze table
```
EXEC bronze.load_bronze_proc;
```

Run Silver Layer

Create Silver tables using scripts/silver/silver_ddl.sql.
Load Silver table
```
EXEC silver.load_silver_proc;
```

Run Gold Layer

Create Gold VIEW using scripts/gold/gold_ddl.sql to generate business-ready data for analytics and reporting.

Explore Data

Use the star schema in the Gold layer for analytical queries and reporting.

🧪 Testing & Validation

Data quality checks scripts in the tests/ folder ensure:
- Data consistency, accuracy, and standardization by checking for:
  - Null or duplicate primary keys.
  - Unwanted spaces in string fields.
  - Data standardization and consistency.
  - Invalid date ranges and orders.
  - Data consistency between related fields.
- Uniqueness of surrogate keys in dimension tables.
- Referential integrity between fact and dimension tables.
- Validation of relationships in the data model for analytical purposes.

🔮 Future Work

This project lays the foundation for a robust and scalable data warehouse. Future enhancements could include:

📊 SQL-Based Analytics

Develop advanced SQL queries to extract business insights such as:
- Customer segmentation
- Sales trends
- Product performance
- Revenue by country
📈 Integration with BI Tools

Connect the Gold layer to Business Intelligence tools like:
- Power BI
- Tableau
- Metabase
...to create interactive dashboards and self-service analytics for stakeholders.
🛠️ Automation & Scheduling

Use SQL Server Agent or external orchestration tools (e.g., Airflow, Azure Data Factory) to automate ETL pipelines and data refreshes.
🔐 Role-Based Access Control (RBAC)

Enforce security policies and access levels depending on user roles (data analyst, data engineer, etc.)
📦 Data Export APIs

Build export mechanisms for downstream systems and data consumers.

🛡️ License

This project is licensed under the MIT License. You are free to use, modify, and share this project with proper attribution.

🌟 About Me

Hi there! I'm Glory Odeyemi, a Data Engineer & Analyst!

Let's stay in touch! Feel free to connect with me on the following platforms:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏢 SQL Data Warehouse

📖 Project Overview

🚀 Project Requirements

Objective

Specifications

🛠️ Tools & Technologies

🏗️ Data Architecture

📊 Impact: Data Quality Metrics

✨ Gold Layer: Final Star Schema

📂 Repository Structure

⚙️ How to Run This Project

🧪 Testing & Validation

🔮 Future Work

🛡️ License

🌟 About Me

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
datasets		datasets
docs		docs
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

gloryodeyemi/SQL-Data-Warehouse

Folders and files

Latest commit

History

Repository files navigation

🏢 SQL Data Warehouse

📖 Project Overview

🚀 Project Requirements

Objective

Specifications

🛠️ Tools & Technologies

🏗️ Data Architecture

📊 Impact: Data Quality Metrics

✨ Gold Layer: Final Star Schema

📂 Repository Structure

⚙️ How to Run This Project

🧪 Testing & Validation

🔮 Future Work

🛡️ License

🌟 About Me

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages