ADF Universal Framework

Overview

The ADF Universal Framework is an open-source project designed to provide a comprehensive and flexible solution for building scalable and efficient data integration workflows using Azure Data Factory (ADF).
Whether you are dealing with data ingestion, transformation, or loading, this framework aims to streamline your ETL processes and empower data engineers and developers with a set of powerful capabilities.
And integrated various solutions, optimized and adjusted for the best outcome. Appreciate the contributions from open-source contributors.

This project primarily encompasses the following aspects:

ADF Universal Orchestrator Framework
ADF Universal Task Solution
CI/CD Solution For ADF Universal Solution
DataOps For The Modern Data Warehouse

Components:

The solution uses these components:

Component	Link
Azure Data Factory (ADF)	Azure Data Factory
Azure Databricks	Azure Databricks
Azure Data Lake Storage (ADLS)	Azure Data Lake Storage
Azure Synapse Analytics	Azure Synapse Analytics
Azure Key Vault	Azure Key Vault
Azure DevOps	Azure DevOps
Power BI	Power BI
Azure SQL Database	Azure SQL Database
Microsoft Purview	Microsoft Purview
Azure Key Vault	Azure Key Vault
Self-Hosted IR	Self-Hosted IR
Self-Hosted Agent	Self-Hosted Agent

Getting Started

To get started with the ADF Universal Framework, please refer to the documentation for detailed instructions, examples, and best practices.

High Level Architecture

ADF Universal Orchestrator Framework

Back to Top ⬆

ADF master framework is the main portal to control the workflow and dependencies for all task pipeline

Orchestrator Framework Capabilities

Metadata Management:
- Offer metadata storage and management to trace the sources, processing, and destinations of data.
- Support data lineage and impact analysis to help understand and manage data workflows.
Task Scheduling and Execution:
- Feature a robust task scheduling engine capable of executing data flow tasks according to a defined schedule.
- Provide monitoring and logging capabilities to track task execution status and performance.
Parameterization and Configuration:
- Allow parameterization of tasks and data flows to enhance reusability and flexibility.
- Provide configuration options for dynamic adjustments based on environment and requirements.
Error Handling and Fault Tolerance:
- Have a robust error-handling mechanism to capture and manage errors occurring in data flows.
- Support fault tolerance mechanisms, allowing for task retries and recovery after failures.
Security and Authentication:
- Integrate authentication and authorization mechanisms to ensure data security.
- Support encryption, access control, and protection of sensitive information.
Monitoring and Alerting:
- Provide real-time monitoring and alerting capabilities to track task performance and runtime status.
- Integrate logging and auditing features to assist in issue troubleshooting and compliance requirements.
Scalability and Customization:
- Demonstrate good scalability, integrating with third-party tools and services.
- Provide custom activity and plugin mechanisms to adapt to diverse business requirements.
Version Control and Collaboration:
- Support version control for managing and tracking changes in data workflows.
- Provide collaboration and team development features to facilitate collaborative work among multiple team members.

ADF Universal Task Solution

Back to Top ⬆
ADF task framework is aiming to build common pipeline which makes developer can use it easily by config metadata.
This pipeline should different kind of ingestion and data processing

Task Framework Capabilities

Data Connection and Source/Destination Adapters:
- Ability to connect to various data stores and source systems, including relational databases, NoSQL databases, and cloud storage.
- Provide a wide range of data source and destination adapters to support different data formats and protocols.
Data Flow Processing:
- Support data transformation, cleansing, and processing to meet business requirements.
- Offer a rich set of data processing activities such as data splitting, merging, aggregation, filtering, and more.
- Support multiple interfaces, such as Azure Synapse and Azure Databricks
Parameterization and Configuration:
- Allow parameterization of tasks and data flows to enhance reusability and flexibility.
- Provide configuration options for dynamic adjustments based on environment and requirements.
Metadata Management:
- Offer metadata storage and management to trace the sources, processing, and destinations of data.
- Support data lineage and impact analysis to help understand and manage data workflows.
Version Control and Collaboration:
- Support version control for managing and tracking changes in data workflows.
- Provide collaboration and team development features to facilitate collaborative work among multiple team members.

CI/CD For ADF Universal Framework Solution

Back to Top ⬆

CI/CD lifecycle

A development data factory is created and configured with Azure Repos Git. All developers should have permission to author Data Factory resources like pipelines and datasets.
A developer creates a feature branch to make a change. They debug their pipeline runs with their most recent changes.
After a developer is satisfied with their changes, they create a pull request from their feature branch to the main or collaboration branch to get their changes reviewed by peers.
After a pull request is approved and changes are merged in the main branch, the changes get published to the development factory.
When the team is ready to deploy the changes to a test or UAT (User Acceptance Testing) factory, the team goes to their Azure Pipelines release and deploys the desired version of the development factory to UAT.
This deployment takes place as part of an Azure Pipelines task and uses Resource Manager template parameters to apply the appropriate configuration.
After the changes have been verified in the test factory, deploy to the production factory by using the next task of the pipelines release.

Note: Only the development factory is associated with a git repository.
The test and production factories shouldn't have a git repository associated with them and should only be updated via an Azure DevOps pipeline or via a Resource Management template.

CI/CD flow

Each user makes changes in their private branches.
Push to master isn't allowed. Users must create a pull request to make changes.
The Azure DevOps pipeline build is triggered every time a new commit is made to master. It validates the resources and generates an ARM template as an artifact if validation succeeds.
The DevOps Release pipeline is configured to create a new release and deploy the ARM template each time a new build is available.

Walkthrough of CICD in Azure Data Factory

Git Release Workflow

We follow below release workflow, more details please read this documentation

DataOps For The Modern Data Warehouse

Back to Top ⬆

Architecture

CI/CD for DataOps

Contribution

Back to Top ⬆
Contributions to the project are welcome! If you have ideas for improvements, feature requests, or bug reports, feel free to open an issue or submit a pull request.
Let's collaborate to make data integration with Azure Data Factory more efficient and scalable!

Version Life Cycle

Back to Top ⬆
ADF Universal Framework version life cycle:

Version	Current Patch/Minor	State	First Release	Limited Support	EOL/Terminated
2	2.1.0	Supported	Jun 30, 2024	TBD	TBD
1.4	1.4.3	EOL	May 31, 2024	Dec 31, 2024	Dec 31, 2024
1.3	1.3.0	EOL	Apr 30, 2024	Dec 31, 2024	Dec 31, 2024
1.2	1.2.5	EOL	Mar 31, 2024	Dec 31, 2024	Dec 31, 2024
1.1	1.1.1	EOL	Feb 28, 2024	Dec 31, 2024	Dec 31, 2024

Reference Link:

Back to Top ⬆

Improvement features

Actual difficulties encountered：

Debugging Azure copy activity delimiters
Skip the use of row count
IR configuration
Configuration and usage of key vault
Use the Get metadata component to determine the existence and date time of the file
Parameterized universal pipeline

Expected functional points：

The pipeline is more universal, and all configured items are configured in the control table
Customize the content of sending emails
Use only one main call to complete the pipeline call
The running status of the pipeline can be recorded in the log table
Able to monitor the running status of the pipeline that needs to be executed in real-time
Implementation of pipeline error rerun mechanism

Name		Name	Last commit message	Last commit date
Latest commit History 154 Commits
adf_metadata_db_sql_proj		adf_metadata_db_sql_proj
adf_universal_orchestrator_framework		adf_universal_orchestrator_framework
adf_universal_task_proj		adf_universal_task_proj
agent		agent
cicd_proj		cicd_proj
databricks_ci_cd		databricks_ci_cd
docs		docs
images		images
LICENSE		LICENSE
README.md		README.md
azure-pipelines.yml		azure-pipelines.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ADF Universal Framework

Overview

Components:

Getting Started

High Level Architecture