Skip to content

Latest commit

 

History

History
33 lines (22 loc) · 1.45 KB

README.md

File metadata and controls

33 lines (22 loc) · 1.45 KB

ADF-Databricks-Project

Technologies Used In Project:

  1. HTTP Source (Git): To fetch data from the source repository.
  2. Azure Data Factory (ADF): For seamless data transfer and orchestration.
  3. Microsoft Azure: As the foundational cloud platform.
  4. Databricks & PySpark: For data transformation and advanced processing.
  5. Azure Datalate : To store raw and transformed data.
  6. Azure Synapse Analytics: To build and manage a robust data warehouse.
  7. Power BI: For creating interactive dashboards and visualisations.

Other Key Concepts & Learnings:

  1. Databricks File System (DBFS)
  2. Databricks Utilities: To streamline operations and manage resources.
  3. Delta Tables : CRUD Operations & Internals
  4. Delta Table Optimization: Leveraged techniques like Z-Order By and Vacuum to optimize performance and manage stale files.
  5. Versioning & Time Travel: Explored historical data states for insights and debugging.
  6. Incremental Loading with Auto Loader: handling of streaming and batch data.
  7. Workflow Design: Designed scalable workflows for job orchestration.
  8. Databricks Jobs: Scheduled and managed jobs seamlessly for automation.
  9. Unity Catalog
  10. Resouce Group, Resource,Storage,Container,Microsoft Entra Id (Service Principle),IAM Role,Managed Identity,Compute Creation,Hive managed and external tables creation, Dynamic file loading using Iteration and loops in ADF etc.

Original DataSet Link: https://www.kaggle.com/datasets/ukveteran/adventure-works