Skip to content

Fabric CI/CD Option 1 Git based deployments #60

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Mar 19, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
15ca53e
Add files via upload
rsayegh Mar 18, 2025
8a57537
Delete accelerators/CICD/Option 1 - Read me.md
rsayegh Mar 18, 2025
28de9d1
Create Option 1 - Read me.md
rsayegh Mar 18, 2025
fe1bfb7
Add files via upload
rsayegh Mar 18, 2025
fb13e92
Rename accelerators/CICD/Git-base-deployments/branch-out-new-workspac…
rsayegh Mar 18, 2025
3dff456
Rename accelerators/CICD/Git-base-deployments/deploy-to-test.png to a…
rsayegh Mar 18, 2025
b5f71b9
Rename accelerators/CICD/Git-base-deployments/git-based-deployment.pn…
rsayegh Mar 18, 2025
8d94b7b
Rename accelerators/CICD/Git-base-deployments/workspace-url.png to ac…
rsayegh Mar 18, 2025
a47210e
Rename accelerators/CICD/Git-base-deployments/workpace-git-enablement…
rsayegh Mar 18, 2025
f711d01
Rename accelerators/CICD/Git-base-deployments/variable-group-permissi…
rsayegh Mar 18, 2025
39035ef
Rename accelerators/CICD/Git-base-deployments/repository-structure.pn…
rsayegh Mar 18, 2025
a3a4a11
Rename accelerators/CICD/Git-base-deployments/main-branch-policy.png …
rsayegh Mar 18, 2025
f37cc8e
Add files via upload
rsayegh Mar 18, 2025
efedaa9
Create CI CD Workspace - Read me.md
rsayegh Mar 18, 2025
9abad76
Add files via upload
rsayegh Mar 18, 2025
426983a
Add files via upload
rsayegh Mar 18, 2025
f514cc9
Create git_update.py
rsayegh Mar 18, 2025
ebb71f8
Add files via upload
rsayegh Mar 18, 2025
e423a5f
Merge branch 'microsoft:main' into main
rsayegh Mar 19, 2025
db89754
Update and rename Option 1 - Read me.md to README.md
rsayegh Mar 19, 2025
12e4392
Update and rename CI CD Workspace - Read me.md to CICD_Workspace_READ…
rsayegh Mar 19, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
183 changes: 183 additions & 0 deletions accelerators/CICD/Git-base-deployments/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
# Microsoft Fabric CI/CD - Option 1 : Git based deployments


![ci_cd_option_1.png](./resources/git-based-deployment.png)

With this option, all deployments originate from the Git repository. Each stage in the release pipeline has a dedicated primary branch (in the diagram, these stages are Dev, Test, and Prod), which feeds the appropriate workspace in Fabric.

Once a PR to the Dev branch is approved and merged:

1. A release pipeline is triggered to update the content of the Dev workspace. This process also can include a Build pipeline to run unit tests, but the actual upload of files is done directly from the repo into the workspace, using Fabric Git APIs. You might need to call other Fabric APIs for post-deployment operations that set specific configurations for this workspace, or ingest data.
2. A PR is then created to the Test branch. In most cases, the PR is created using a release branch that can cherry pick the content to move into the next stage. The PR should include the same review and approval processes as any other in your team or organization.
3. Another Build and release pipeline is triggered to update the Test workspace, using a process similar to the one described in the first step.
4. A PR is created to Prod branch, using a process similar to the one described in step 2.
5. Another Build and release pipeline is triggered to update the Prod workspace, using a process similar to the one described in the first step.

<br />

When should you consider using option #1?
<br />

- When you want to use your Git repo as the single source of truth, and the origin of all deployments.

- When your team follows Gitflow as the branching strategy, including multiple primary branches.

- The upload from the repo goes directly into the workspace, as we don’t need build environments to alter the files before deployments. You can change this by calling APIs or running items in the workspace after deployment.

For more information about Microsoft Fabric CI/CD workflow options, please visit the official documentation at this page:
https://learn.microsoft.com/en-us/fabric/cicd/manage-deployment#option-1---git--based-deployments


## <u>Setup</u>


<br />

### <u>Prerequisites</u>

<br />

1. Create 3 Microsoft Fabric workspaces, one for each environment: DEV, TEST, PROD. Minor adjustments to the setup and scripts are necessary if additional environments are introduced.

The workspace identifiers will be required in the setup. To find a workspace id, check the url and extract the guid after "groups/" as shown in the picture below:

![ci_cd_option_1.png](./resources/workspace-url.png)


2. Create a Microsoft Fabric workspace for controling the CI/CD process. It will contain the pre and post deployment notebook. For more information about this workspace, see the documenation in the **cicd-workspace** folder. The user running the CI/CD process should be an Admin or Member of the workspaces

3. Create an azure devops project and a repository.

3.1 In this project, create the following structure:

![ci_cd_option_1.png](./resources/repository-structure.png)

3.2 From the folder **project-workspace** in this **git repo**, download the .py and .yml files and import them in the **pipeline-scripts** folder of your devops project/repository. In the screenshot above, the repository name is also called **project-workspace**.

3.3 Keep the workspace folder empty. It will be used to git enable the Microsoft Fabric DEV workspace.

3.4 Create 3 Variable Groups and their respective variables.

<br />

| Variable Group name | Variable Names | Variable Value | Comment |
| -------- | ------- | ------- | ------- |
| DynamicGroup | FeatureBranch | <empty> | |
| GroupDevOps | ConflictResolutionPolicy | PreferRemote | |
| GroupDevOps | InitializationStrategy | PreferRemote | |
| GroupDevOps | MappingConnectionsFileName | mapping_connections.json | This json file will hold the mapping of connections between different stages |
| GroupDevOps | OnelakeRolesFileName | onelake_roles.json | This json file will hold the list of roles you wish to create in the target lakehouses |
| GroupDevOps | OnelakeRulesFileName | onelake_rules.json | This json file will hold the rules applied to the lakehouse tables/shortcut/folders defined for a role |
| GroupDevOps | OnelakeEntraMembersFileName | onelake_entra_members.json | This json file will hold the Entra ID principles assigned to a role |
| GroupDevOps | OnelakeItemMembersFileName | onelake_item_members.json | This json file will hold the the lakehouse tables/shortcut/folders defined for a role |
| GroupDevOps | OrganizationName | <name of your organization> | |
| GroupDevOps | ProjectName | <name of your project> | |
| GroupDevOps | RepositoryName | <name of your repository> | |
| GroupDevOps | Stage1BrancheName | main | |
| GroupDevOps | Stage2BrancheName | <name of your test branch> | |
| GroupDevOps | Stage3BrancheName | <name of your prod branch> | |
| GroupFabricWorkspaces | CiCdLakehouseId | <lakehouse id in your ci/cd workspace> | |
| GroupFabricWorkspaces | CiCdWorkspaceId | <ci/cd workspace id> | |
| GroupFabricWorkspaces | Stage1WorkspaceId | <dev workspace id> | |
| GroupFabricWorkspaces | Stage2WorkspaceId | <test workspace id> | |
| GroupFabricWorkspaces | Stage3WorkspaceId | <prod workspace id> | |

<br />

The files **mapping_connections.json**, **onelake_roles.json**, **onelake_rules.json**, **onelake_entra_members.json** and **onelake_item_members.json** are all secure files uploaded in your project, at this location: Pipeline/Library/Secure Files.
A helper notebook called **nb_extract_lakehouse_access.ipynb** helps you extract the onelake roles defined in your lakehouses and generate the json files in your cicd lakehouse (see CI CD Workspace - Read me). Using your Onelake explorer, you can download the files and make the required modifications.
<span style="color: red; font-weight: bold;">The Yaml Pipelines expects the presence of these files even if empty (when you do not wish to change the connections in your Fabric items or create custom onelake roles in your target lakehouses).</span>


<br />

The variable group **DynamicGroup** requires additional permission as the variable **FeatureBranch** it contains will be updated by a pipeline execution.
Grant the "Administrator" permission to the Build Service as show in the following screenshot:


![ci_cd_option_1.png](./resources/variable-group-permission.png)

<br />

### <u>Git enable the DEV workspace</u>

<br />

In the settings of your DEV workspace, tab Git Integration, connect your workspace to the main branch of your Azure DevOps repository. Make sure to use a folder as shown in the following screenshot.

<br />

![ci_cd_option_1.png](./resources/workpace-git-enablement.png)

<br />

### <u>Create a branch policy on the **main** branch</u>

<br />

This policy is required to block commits on the main branch. It is also required to get the name of source branch used in a PR.

<span style="color: red; font-weight: bold;">This policy is required to get the feature branch name during the automatic trigger of the **ci-get-set-feature-branch** pipeline.</span>

The following screenshot shows the branch policy setup:

![ci_cd_option_1.png](./resources/main-branch-policy.png)

<br />


### <u>Create the required yaml pipelines</u>

From the Pipeline tab in Azure DevOps, create the following pipelines by selecting an existing yml file.

<br />

<u>ci-get-set-feature-branch</u>

To create this pipeline, select the **ci-get-set-feature-branch.yml** file located in <your-repository-name>/main/pipeline-scripts.
This pipeline is automatically triggered during the PR request (and before completing the PR) to merge from the feature branch to the main branch, to identify on the fly the feature branch name used by a Fabric developer.
The feature branch name is then stored in the variable **FeatureBranch** in the variable group **DynamicGroup**. The feature branch name is required as part of the **ci-update-workspace-dev** pipeline pre and post deployment steps.

PS: it is mandatory as part of this solution that the developer gives the branch a name that matches the FEATURE workspace name, as shown in the following screenshot:

<br />

![ci_cd_option_1.png](./resources/branch-out-new-workspace.png)

For a programmatical branch out experience, please check the **amazing work** done by [Nick Hurt](https://www.linkedin.com/in/nick-hurt/) here:
<br />
https://github.com/microsoft/fabric-toolbox/tree/main/accelerators/CICD/Branch-out-to-new-workspace



<br />

<u>ci-update-workspace-dev</u>

To create this pipeline, select the **ci-update-workspace-dev.yml** file located in <your-repository-name>/main/pipeline-scripts.
This pipeline is automatically triggered when the PR is completed. It will promote the content of the FEATURE workspace to the DEV workspace, and will apply a similar logic to the **cd-update-workspace-test-prod** pipeline, meaning it will run pre and post deployment steps before and after the Git update of the DEV workspace.

<br />

<u>cd-update-workspace-test-prod</u>

To create this pipeline, select the **cd-update-workspace-test-prod.yml** file located in <your-repository-name>/main/pipeline-scripts.
This pipeline is manually triggered after a PR is made between a source and a target branch.

<br />

To deploy DEV to TEST, proceed like the following:
- Make a PR between the **main** and **test** branch
- Manually run the pipeline by selecting the test branch as source and uncheck the **PROD** stage.

<br />

To deploy TEST to PROD, proceed like the following:
- Make a PR between the **test** and **prod** branch
- Manually run the pipeline by selecting the prod branch as source and uncheck the **TEST** stage.

The following screenshot shows the selection to deploy to TEST:

<br />

![ci_cd_option_1.png](./resources/deploy-to-test.png)
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# The CI/CD workspace

<br>

In this solution, the cicd-workspace folder serves as a placeholder for a collection of notebooks that execute logic within a CI/CD Fabric workspace.

<br>

The following notebooks should be imported in a Fabric workspace of your choice:

- nb_cicd_pre_deployment
- nb_cicd_pre_update_lakehouses
- nb_cicd_pre_update_warehouses
- nb_cicd_post_deployment
- nb_cicd_post_update_data_pipelines
- nb_cicd_post_update_notebooks
- nb_cicd_post_update_semantic_models
- nb_helper
- nb_extract_lakehouse_access
- nb_prepare_cicd_workspace


<br>

These notebooks will record their execution in a CI/CD lakehouse, which must be created in advance. Refer to the notebook **nb_prepare_cicd_workspace** for guidance on setting up the lakehouse.
Additionally, they will perform pre- and post-deployment activities essential to the CI/CD process.
The following sections provide a detailed explanation of these activities.

## <u>Pre deployment activities</u>

<br>

These activities are initiated by the YAML pipelines during the execution of the **Run pre deployment steps - Lakehouses & Warehouses** step. This step runs the **pre_deployment.py** python script, which then triggers the execution of the **nb_cicd_pre_deployment** notebook in the CI/CD workspace.

<br>

The **Run pre-deployment steps - Lakehouses & Warehouses** step retrieves the appropriate inputs from variable groups based on the scenario (CI or CD). During the execution of **pre_deployment.py**, these variables are properly formatted and passed as parameters (JSON body) when triggering the **nb_cicd_pre_deployment** notebook via the Fabric REST API (**jobs/instances?jobType=RunNotebook**).

<br>

The notebook **nb_cicd_pre_deployment** creates a DAG in which 2 other notebooks are consequently called using **mssparkutils.notebook.runMultiple** and in the following order of precedence:

- nb_cicd_pre_update_lakehouses
- nb_cicd_pre_update_warehouses

The notebook **nb_cicd_pre_update_lakehouses** performs the following activities:

- Creates the lakehouse(s) in the target workspace if required
- Identifies managed tables, shortcuts (in the table and file sections of the lakehouse), folders, OneLake access roles, sql objects created against the SQL Analytical endpoints of the lakehouse (views, functions, stored procedures, rls related objects like security policies and predicates) in the source lakehouse.
- Handles the seeding of tables in the target lakehouse in full or incremental mode. The incremental mode handles changes at the managed tables level (new tables, altered tables: new columns, deleted columns, altered data types )
- Handles the creation of shortcuts, folders, security roles in the target lakehouse
- Handles the creation of the sql objects

The notebook **nb_cicd_pre_update_warehouses** performs the following activities:

- Identifies changes in the source Warehouses
- Apply the changes on the target warehouses

This code executes only when an incremental change is deployed, meaning it is not required during an initial deployment.

<span style="color: red; font-weight: bold;">It is crucial that the pre-deployment step for the lakehouse is executed, as the subsequent step related to the Git update might fail if the Warehouse depends on the lakehouse.</span>

<br>

## <u>Post deployment activities</u>

<br>

These activities are triggered by the YAML pipelines during the execution of the **Run post-deployment steps - Notebooks & Data Pipelines & Semantic Models/Reports** step. This step runs the **post_deployment.py** python script, which in turn triggers the execution of the **nb_cicd_post_deployment** notebook in the CI/CD workspace.

The **Run post-deployment steps - Notebooks & Data Pipelines & Semantic Models/Reports** step retrieves the necessary inputs from the variables stored in different variable groups, depending on the scenario (CI or CD). These variables are properly formatted during the execution of **post_deployment.py** and passed as parameters (JSON body) when the **nb_cicd_post_deployment notebook** is executed via the Fabric REST API (**jobs/instances?jobType=RunNotebook**).
The notebook **nb_cicd_post_deployment** creates a DAG in which 2 other notebooks are consequently called using **mssparkutils.notebook.runMultiple** and in the following order of precedence:

<br>

The notebook **nb_cicd_post_deployment** creates a DAG in which 2 other notebooks are consequently called using **mssparkutils.notebook.runMultiple** and in the following order of precedence:

- nb_cicd_post_update_data_pipelines
- nb_cicd_post_update_notebooks
- nb_cicd_post_update_semantic_models

PS: configure the parallelism required for the notebook execution based on your capacity thresholds. More info about that in the official documentation:
https://learn.microsoft.com/en-us/fabric/data-engineering/spark-job-concurrency-and-queueing


<br>

- The notebook **nb_cicd_post_update_data_pipelines** performs the following activities: iterates over the existing list of data factory pipelines in the target workspace and changes the connections in each of them based on the mapping provided (source connection -> target connection).
- The notebook **nb_cicd_post_update_notebooks** performs the following activities: iterates over the existing list of notebook in the target workspace and changes the default lakehouse and known warehouses in the notebook definition.
- The notebook **nb_cicd_post_update_semantic_models** performs the following activities: iterates over the existing list of semantic models in the target workspace and changes the direct lake connection (when the semantic model is a default or custom semantic model with a direct lake mode), or changes the connections based on the mapping provided (source connection -> target connection) if the semantic model uses Direct Query or Import mode.

Each notebook performs the required activity only if at least one item of the required type is present in the target workspace.

<span style="color: red; font-weight: bold;">Without the post deployment activities, the items mentioned above will point to the lower environments (sql connections, lakehouses, warehouses, etc..).</span>


## <u>Helper notebooks</u>

- The **nb_helper** notebook contains a set of funtions required during the execution of the pre and post deployment notebooks listed above.

- The **nb_prepare_cicd_workspace** notebook can help setting up the CI/CD workspace and rebind the notebooks listed above to the CI/CD lakehouse. The steps described in the notebook can be performed manually.

- The **nb_extract_lakehouse_access** notebook can help extracting onelake roles defined in source lakehouses (DEV workspace), by generating 4 json files: onelake_roles.json, onelake_rules.json, onelake_entra_members.json, onelake_item_members.json. These files can be used as templates for customized roles in higher environment (TEST & PROD workspaces)

















Loading