Skip to content

rpatid10/ADF-Databricks-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 

Repository files navigation

ADF-Databricks-Project

Technologies Used In Project:

  1. HTTP Source (Git): To fetch data from the source repository.
  2. Azure Data Factory (ADF): For seamless data transfer and orchestration.
  3. Microsoft Azure: As the foundational cloud platform.
  4. Databricks & PySpark: For data transformation and advanced processing.
  5. Azure Datalate : To store raw and transformed data.
  6. Azure Synapse Analytics: To build and manage a robust data warehouse.
  7. Power BI: For creating interactive dashboards and visualisations.

Other Key Concepts & Learnings:

  1. Databricks File System (DBFS)
  2. Databricks Utilities: To streamline operations and manage resources.
  3. Delta Tables : CRUD Operations & Internals
  4. Delta Table Optimization: Leveraged techniques like Z-Order By and Vacuum to optimize performance and manage stale files.
  5. Versioning & Time Travel: Explored historical data states for insights and debugging.
  6. Incremental Loading with Auto Loader: handling of streaming and batch data.
  7. Workflow Design: Designed scalable workflows for job orchestration.
  8. Databricks Jobs: Scheduled and managed jobs seamlessly for automation.
  9. Unity Catalog
  10. Resouce Group, Resource,Storage,Container,Microsoft Entra Id (Service Principle),IAM Role,Managed Identity,Compute Creation,Hive managed and external tables creation, Dynamic file loading using Iteration and loops in ADF etc.

Original DataSet Link: https://www.kaggle.com/datasets/ukveteran/adventure-works

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published